Get featured on IndiaAI

Contribute your expertise or opinions and become part of the ecosystem!

Image generation is achieved using a special class of models called diffusion models. The process involves initially generating an image of pure Gaussian noise, from which the model gradually de-noises the image to create a sharp, attractive visual based on the description provided. In simpler terms, the diffusion model transforms the image from a basic Gaussian distribution to a complex distribution akin to actual images.

During this process, the U-net model is used for de-noising the image, and the text encoder generates prompt embeddings to guide the de-noising process. Popular embedding techniques used in image models include CLIP and ALIGN.

This process of image generation can be used to:

Generate new images

Edit only specific parts of an image (Inpainting)

So, in order to control the process of image generation these models provide us with some useful hyperparameters. This hyperparameter provides us the flexibility on how the image should be created, what component of the original image should last, and what information can be lost. Let us try to deep dive into these model hyperparameters.

Inference Steps:

This is an important hyperparameter that defines the number of steps the model should take in de-noising the image to form the actual image. So, if we increase the number of inference steps the de-noising process will be longer which will create a shaper image and more closer looking to the prompt, although this will account for more time and resources. However providing too high inference steps is also not good, as the image looks overly processed and the image generated looks far from real.

Guidance Scale:

This hyperparameter defines how closely the image generation process should follow the prompt for direction. The higher the guidance scale the higher the prompt fidelity, but lower will be the quality of the image.

Strength:

Strength determines how much noise should be added to the original image to generate new image from it. Now, why is this important?

In order to replace an object from the image, we need to provide noise in that area of the image. Now, this affects the final image as, too much noise will lead to a loss of information, and in the final generation the section will be completely changed. With too little noise, the new image generated will look very close to the actual image.

Negative Prompt:

This hyperparameter is quite intuitive. It defines what should not be followed while generating the image. Example: Negative prompt: Cartoonish image. (To make an image more realistic and less cartoonish)

Image segmentation

Now, in order to replace only a specific section of an image it is important that we understand how to segment the image and create masks that can then be replaced by a new object. This feature looks quite exciting and is not difficult either, just that we need to understand few details and models that help us perform this task. So, let us jump into it.

Image segmentation is a process of assigning labels to each pixel such that pixels with the same labels share common characteristics. Image segmentation helps us in

Identify local objects and boundaries in an image

Object detection

3D reconstruction

Image segmentation models:

SAM

SAM or Segment Anything Model is the model that is generally used to segment objects in the images. This model will take in the images and return the bounding boxes as output. This is then passed to the masked decoder which then shows the masked object detected.

FastSAM

FastSAM is a CNN-based Segment Anything Model, trained using 2% of the original data used. It achieves comparable performance at 32x32 images, with 50 times faster speed.

The advantage of this model is that it filters all the configurable masks in the image and then segments them into masks based on the prompts provided by the user.

Thus, combining the above image segmentation model, technique and image generation model we can build out solutions that can generate new images, edit an existing image, replace the background in the image, etc.

Sources of Article

https://arxiv.org/abs/2304.02643

Want to publish your content?

Publish an article and share your insights to the world.

Get Published Icon
MORE RECOMMENDED ON THIS TOPIC
IndiaAI Mission In Trend
AI for Citizen Safety: Join the IndiaAI CyberGuard AI Hackathon!

IndiaAI invites innovators, researchers, and entrepreneurs to be part of the CyberGuard AI Hackathon under the IndiaAI Application Development Initiative (IADI).About the IndiaAI CyberGuard AI HackathonIndiaAI, an...

IndiaAI Oct 16, 2024 4 Min
ALSO EXPLORE

Case Studies

Read the latest case studies in the field of AI

Recent Case Studies

Legal Administration

Revolutionizing Legal Document Review

Dec 16, 2024 3 Min
Image
Healthcare AI

Chronic Kidney Disease Prediction Using Boosting Techniques

Nov 18, 2024 3 Min
Image