Results for ""
OpenAI's GPT-3 was a milestone that demonstrated that deep-learning module could be trained to understand and interpret tasks by using language and studying huge amounts of text. DALL·E, a pun on the Pixar movie WALL.E and the surreal artist Salvador Dali, is a newer version of GPT-3 which combines textual understanding with the ability to generate images. The smaller version of GPT-3 has 12-billion parameters to generate images from text descriptions by accessing a dataset of the combined text-image descriptions.
"We’ve found that it has a diverse set of capabilities, including creating anthropomorphized versions of animals and objects, combining unrelated concepts in plausible ways, rendering text, and applying transformations to existing images," write the researchers on the official blog of OpenAI.
DALL·E has come up with interesting interpretations of the captions that researchers had fed to test its ability to work with novel concepts. Captions such as “an avocado armchair” and “an illustration of a baby daikon radish in a tutu walking a dog” resulted in DALL·E creating plausible images.
DALL·E considers text and images as one stream of data which allows DALL·E to create an image based on a caption and recreate an image that is in line with the caption. In addition, DALL·E can create images based on specifications based on various angles, zoomed, x-rayed, in 2D and 3D, optical distortions, etc.
However, DALL·E has limitations. When a caption is too extensive, as in too many objects are mentioned in it, the network struggles to keep up with what to create. Re-writing the same caption in different ways too produces different results. Then there's evidence that DALL·E kind of plagiarises images that it has observed online, instead of creating original images.