Results for ""
Image generation is achieved using a special class of models called diffusion models. The process involves initially generating an image of pure Gaussian noise, from which the model gradually de-noises the image to create a sharp, attractive visual based on the description provided. In simpler terms, the diffusion model transforms the image from a basic Gaussian distribution to a complex distribution akin to actual images.
During this process, the U-net model is used for de-noising the image, and the text encoder generates prompt embeddings to guide the de-noising process. Popular embedding techniques used in image models include CLIP and ALIGN.
This process of image generation can be used to:
Generate new images
Edit only specific parts of an image (Inpainting)
So, in order to control the process of image generation these models provide us with some useful hyperparameters. This hyperparameter provides us the flexibility on how the image should be created, what component of the original image should last, and what information can be lost. Let us try to deep dive into these model hyperparameters.
Inference Steps:
This is an important hyperparameter that defines the number of steps the model should take in de-noising the image to form the actual image. So, if we increase the number of inference steps the de-noising process will be longer which will create a shaper image and more closer looking to the prompt, although this will account for more time and resources. However providing too high inference steps is also not good, as the image looks overly processed and the image generated looks far from real.
Guidance Scale:
This hyperparameter defines how closely the image generation process should follow the prompt for direction. The higher the guidance scale the higher the prompt fidelity, but lower will be the quality of the image.
Strength:
Strength determines how much noise should be added to the original image to generate new image from it. Now, why is this important?
In order to replace an object from the image, we need to provide noise in that area of the image. Now, this affects the final image as, too much noise will lead to a loss of information, and in the final generation the section will be completely changed. With too little noise, the new image generated will look very close to the actual image.
Negative Prompt:
This hyperparameter is quite intuitive. It defines what should not be followed while generating the image. Example: Negative prompt: Cartoonish image. (To make an image more realistic and less cartoonish)
Image segmentation
Now, in order to replace only a specific section of an image it is important that we understand how to segment the image and create masks that can then be replaced by a new object. This feature looks quite exciting and is not difficult either, just that we need to understand few details and models that help us perform this task. So, let us jump into it.
Image segmentation is a process of assigning labels to each pixel such that pixels with the same labels share common characteristics. Image segmentation helps us in
Identify local objects and boundaries in an image
Object detection
3D reconstruction
Image segmentation models:
SAM
SAM or Segment Anything Model is the model that is generally used to segment objects in the images. This model will take in the images and return the bounding boxes as output. This is then passed to the masked decoder which then shows the masked object detected.
FastSAM
FastSAM is a CNN-based Segment Anything Model, trained using 2% of the original data used. It achieves comparable performance at 32x32 images, with 50 times faster speed.
The advantage of this model is that it filters all the configurable masks in the image and then segments them into masks based on the prompts provided by the user.
Thus, combining the above image segmentation model, technique and image generation model we can build out solutions that can generate new images, edit an existing image, replace the background in the image, etc.
https://arxiv.org/abs/2304.02643
MIT researchers has modified an existing technique that could help artists, designers, and engineers create better 3D models.
Throughout industry and academia, AI has been used in applications to solve specific problems.