Here are the year's most intriguing research publications. It is a curated collection of the most recent accomplishments in artificial intelligence and data science, organized chronologically with a link to a more in-depth article and source code (if applicable).

Dall·e

Researchers: Aditya ramesh, Mikhail Pavlov, Gabriel Goh, Scott Gray, Chelsea Voss, Alec radford, Mark Chen, Ilya Sutskever

Historically, text-to-image generation has more accurate modelling assumptions for training on a fixed dataset. These assumptions may include sophisticated architectures, auxiliary losses, or additional information supplied during training, such as object part labels or segmentation masks. 

The researchers offer a straightforward method to this problem based on a transformer that models text and picture tokens as a single data stream using autoregressive modelling. When sufficient data and scale are available, their technique outperforms earlier domain-specific models in a zero-shot evaluation.

OpenAI trained a network capable of generating images from text captions successfully. It is comparable to GPT-3 and Image GPT and yields incredible results.

Article: DALL·E: zero-shot text-to-image generation from openAI

Code: Discrete VAE used for DALL·E

Vogue

Researchers: Kathleen M Lewis, Srivatsan Varadharajan, Ira Kemelmacher-Shlizerman

The researchers generated the target person in the provided outfit automatically using an image of a target person and a picture of another person wearing the garment. At its heart is a pose-conditioned StyleGAN2 latent space interpolation that seamlessly combines the areas of interest from each image, i.e., the target person's body shape, hair colour, and skin tone from the target person. In contrast, the garment's folds, material properties, and shape from the garment image. By automatically optimizing for interpolation coefficients per layer in the latent space, the researchers may merge the garment and target person seamlessly yet in a true-to-source manner. 

Their method enables clothing to conform to the body's shape while retaining design and material characteristics. Experiments indicate photorealistic results at a high resolution (512 512).

Google created an online fitting room utilizing a modified StyleGAN2 architecture in which you can try on whatever pants or shirts you choose using simply a photograph of yourself.

Article: VOGUE: Try-on by styleGAN interpolation optimization

Taming transformers

Researchers: Patrick esser, Robin rombach, Bjorn ommer

Transformers, which learn long-range interactions on sequential data, continue to achieve state-of-the-art performance on various applications. However, unlike CNN, they lack an inductive bias favouring local interactions. This interaction makes them expressive but computationally infeasible for extended sequences. The researchers illustrate how they can model and synthesize high-resolution images by combining the efficacy of CNNs' inductive bias with the expressivity of transformers. 

The authors demonstrate how to employ CNNs to develop a context-rich vocabulary of image constituents and then use transformers to simulate their composition inside high-resolution images efficiently. Their method is readily applicable to conditional synthesis tasks, in which both non-spatial information, such as object classes and spatial data, such as segmentation, can influence the resulting image. Finally, the researchers offer the first results on semantically guided megapixel image creation using transformers.

They coupled the efficiency of GANs and convolutional techniques with the expressivity of transformers. Moreover, to create a highly effective and time-efficient method for semantically guided high-quality image synthesis.

Article: Taming transformers for high-resolution image synthesis

Code: Taming Transformers

Want to publish your content?

Publish an article and share your insights to the world.

Get Published Icon
ALSO EXPLORE