OpenAI's Dall-E 2 generates all kind of images from text input faster and better

Amazing and slightly scary stuff

By Rob Thubron April 7, 2022, 7:41

Serving tech enthusiasts for over 25 years.
TechSpot means tech analysis and advice you can trust.

In brief: Imagine being able to describe a picture to an AI and have it turned into a photorealistic image. That's one of the claims being made by an updated version of a program we first saw last year, and the results do look exciting.

DALL-E 2 comes from the San-Francisco-based OpenAI research lab behind artificial intelligence models like GPT-2 and GPT-3 that can write fake news and beat top human opponents in games such as DOTA 2.

DALL-E 2, a name that comes from a portmanteau of artist Salvador Dalí and Disney robot WALL-E, is the second iteration of the neural network we first saw in January last year, but this one offers higher resolution and lower latency than the original version. The images it generates are now a much better 1024 x 1024 pixels, a noticeable increase over the original's 256 x 256.

A 1990s Saturday morning cartoon as digital art in a steampunk style, apparently

Thanks to OpenAI's updated CLIP image recognition system, now called unCLIP, DALL-E 2 can turn user text into vivid images, even those that are surreal enough to rival Dali himself. Asking for a Koala playing basketball or a monkey paying taxes, for example, will see the AI create frighteningly realistic images of these descriptions.

The latest system has switched to a process called diffusion, which starts with a pattern of random dots and gradually alters that pattern towards an image when it recognizes specific aspects.

Variations of teddy bears in an ukiyo-e style and a quaint flower shop

DALL-E 2 can do more than create new pictures from text. It's also able to alter sections of images; you can, for example, highlight someone's head and tell it to add a funny hat. There's even an option to create variations of a single image, each with different styles, content, or angles.

"This is another example of what I think is going to be a new computer interface trend: you say what you want in natural language or with contextual clues, and the computer does it," said Sam Altman, CEO of OpenAI. "We can imagine an 'AI office worker' that takes requests in natural language like a human does."

These types of image generation AIs do come with an inherent risk of being misused. OpenAI has some safeguards in place, including not being able to generate faces based on a name and not allowing the uploading or generation of objectional material---family-friendly stuff only. Some of the prohibited subjects include hate, harassment, violence, self-harm, explicit/shocking imagery, illegal activities, deceptions such as fake news, political actors or situations, medical or disease-related imagery, or general spam.

Users must also disclose that an AI generated the images, and there will be a watermark indicating this fact on each one.

The Verge writes that researchers can sign up to preview the system online. It's not being released directly to the public, though OpenAI hopes to make it available for use in third-party apps at some point in the future.

2 comments 294 likes and shares

// Related Stories

Featured on TechSpot