What just happened? With other firms releasing text-to-video generative AIs, it should come as little surprise to learn the company that started the generative artificial intelligence revolution, OpenAI, has joined the club. Called Sora, the tool can generate movie-like 60-second 1080p clips from text prompts that in many cases look quite realistic.

Sora can generate complex scenes with multiple characters, specific types of motion, and accurate details of the subject and background, writes OpenAI. The tool is also able to create multiple shots within a single generated video.

As one would expect, the cherry-picked examples posted on OpenAI's announcement page look pretty convincing. There's a woman walking down a Tokyo street, historical footage of the California gold rush, a Dalmatian moving between window ledges, and others.

Sora Tokyo

Sora is able to take existing video clips and extend them or fill in missing frames, which sounds interesting. It can also generate videos in different styles, such as black and white and animated.

View post on imgur.com

It's impressive stuff, but you might notice the telltale signs that the clips were AI-generated on close inspection, like the way the dog's paws move or the unnatural looks and movements of background characters.

View post on imgur.com

OpenAI admits that the current Sora model can struggle with some elements, including accurately simulating physics and not understanding specific instances of cause and effect, such as a bitemark being left in food after someone takes a bite. It may also confuse spatial details and struggle with precise descriptions of events that take place over time, such as following a specific camera trajectory.

Safety is always a big concern with these sorts of technologies. OpenAI says it is working with experts in fields like misinformation, hate, and bias to test the Sora model. The company is also building tools, including a detection classifier, to help identify misleading content and determine when a video was generated by Sora. OpenAI said it plans to include C2PA metadata in the future if it deploys the model in a product.

 

There are some copyright/ethical questions about what data was used to train Sora, as is always the case with these technologies. OpenAI isn't very forthcoming with this information, beyond noting that it used around 10,000 hours of high-quality video.

Sora is currently in the research preview stage and being tested by select users; it's not yet available to the public due to the potential for misuse.

"We'll be engaging policymakers, educators and artists around the world to understand their concerns and to identify positive use cases for this new technology," OpenAI writes. "Despite extensive research and testing, we cannot predict all of the beneficial ways people will use our technology, nor all the ways people will abuse it. That's why we believe that learning from real-world use is a critical component of creating and releasing increasingly safe AI systems over time."

We've seen text-to-video generators in the past, including Runaway and Google's Lumiere. It'll be interesting to see how a competitor from ChatGPT/DALL-E-creator OpenAI fares against these tools.

While not yet generally available, OpenAI boss Sam Altman asked people on X to suggest ideas that will be turned into videos using Sora. Some of the results have been included in this article.