Stable Diffusion 3.0 update (almost) perfects typography in generated images

zohaibahd

Posts: 46   +1
Staff
Why it matters: AI image generation is leaving the uncanny valley behind. Stability AI is rapidly advancing, making fake visuals truly indistinguishable from reality with its latest project. However, as rivals like Dall-E and Midjourney also enhance their capabilities, it's evident that this isn't just about achieving the clearest text; it's about leading the next wave of AI innovation.

Stability AI is tantalizing AI art enthusiasts with an early preview of its next-generation text-to-image model, Stable Diffusion 3.0. The startup has opened a waitlist for early access to the upgraded AI system, which promises crisper images, improved multi-subject handling, and significantly enhanced text rendering.

Typography has long been an Achilles' heel for AI image generation models like Stable Diffusion, even as they've become nearly indistinguishable from reality in other aspects. However, Stability AI asserts that the new 3.0 edition will offer a substantial improvement in rendering legible text and ensuring accurate spellings within generated visuals.

One example highlighted in the press release particularly caught our eye: an image of a city bus that looks virtually impossible to distinguish from an actual photograph, complete with impeccable text rendering on the road sign and the vehicle's side. While there are still minor imperfections (the license plate appears distorted), the overall quality represents a quantum leap from the model's predecessors.

That may not sound surprising when considering that, under the hood, Stable Diffusion 3.0 represents a major architectural overhaul from its predecessors. It employs a new "diffusion transformer" approach, similar to OpenAI's recent Sora model – a stark departure from the original Stable Diffusion architecture, according to Stability AI CEO Emad Mostaque, who spoke with VentureBeat.

Stable Diffusion 3.0 also integrates other cutting-edge techniques like "flow matching" – a novel method for training AI systems to better model complex data distributions. The researchers behind flow matching claim it enables faster training, more efficient sampling, and improved overall performance compared to traditional diffusion methods.

The revamped model suite will span a range of 800 million to 8 billion parameters when it eventually sees a full release. But before that public launch, Stability AI is putting the model through its paces with a closed preview to gather feedback and strengthen safety guardrails. The startup has implemented numerous safeguards for this preview release, with more in development through collaboration with researchers, experts, and, of course, its own community.

Stability AI's ambitions don't stop here, though. Mostaque has hinted that the new Stable Diffusion model will underpin the company's forthcoming work in 3D modeling, video synthesis, and other novel AI visual capabilities.

Interested parties can sign up for the waitlist.

Permalink to story.

 
This is one of those rare times where I wish the government would be ahead of the curve and have a law requiring some type of watermarking or other technique to verify that an image is real or AI. The speed at which these technologies are advancing is astonishing.

I say this because I am continually baffled how easily humans are faked out with this stuff. As in the general population seems to have zero critical thinking skills. They see something on social media and instantly it must be true, even though we are continually made aware of AI's ability to fake things. The upcoming US presidential election is guaranteed to be full of of fake AI images and propaganda.

 
All the windows on the side of the bus are different, one side has two windshield wiper arms, other one. And what is the unsymmetric contraption on top of the bumper? And who would place the text to overlap with the wheel?

Photorealistic maybe, realistic not really.
 
This is one of those rare times where I wish the government would be ahead of the curve and have a law requiring some type of watermarking or other technique to verify that an image is real or AI. The speed at which these technologies are advancing is astonishing.

I say this because I am continually baffled how easily humans are faked out with this stuff. As in the general population seems to have zero critical thinking skills. They see something on social media and instantly it must be true, even though we are continually made aware of AI's ability to fake things. The upcoming US presidential election is guaranteed to be full of of fake AI images and propaganda.

- Yeah, Social Media was like the primer, and AI generated content will be the fissile material, we're primed for a societal nuke the likes of which we've never seen.

I've long advocated that the post WW2 "golden years" all the old timers look back to were underpinned strongly by the fact that everyone got their information from only a handful of sources, and those sources all aligned on the facts. CBS/NBC/ABC, didn't matter who you watched, you got the same set of facts and you were free to reach wildly different conclusions.

The proliferation of alternative/cable news outlets in the late 80's and 90's started the ball rolling on different segments of the population not just arriving at different conclusions, but literally operating of an alternate slate of facts than other parts of the population. Social media democratized this, and AI will basically rip any semblance of a shared reality to pieces.

from the 40's thru the 2020's we saw a world getting smaller and more interconnected. I think from the 2020's thru ??? we will see the world get big, isolated, and hyper localized again as all faith in our information sources slowly evaporates.
 
I've long advocated that the post WW2 "golden years" all the old timers look back to were underpinned strongly by the fact that everyone got their information from only a handful of sources, and those sources all aligned on the facts. CBS/NBC/ABC, didn't matter who you watched, you got the same set of facts and you were free to reach wildly different conclusions.

The proliferation of alternative/cable news outlets in the late 80's and 90's started the ball rolling on different segments of the population not just arriving at different conclusions, but literally operating of an alternate slate of facts than other parts of the population. Social media democratized this, and AI will basically rip any semblance of a shared reality to pieces.

Have you looked at why that's the case? Search up the Fairness Doctrine of the FCC.

Established in 1949
Abolished in 1987

Pretty much explains all of it.
 
All the windows on the side of the bus are different, one side has two windshield wiper arms, other one. And what is the unsymmetric contraption on top of the bumper? And who would place the text to overlap with the wheel?

Photorealistic maybe, realistic not really.

IMHO, at the moment, such images can only be used in fast moving material where the viewer won't notice.
 
This is one of those rare times where I wish the government would be ahead of the curve and have a law requiring some type of watermarking or other technique to verify that an image is real or AI. The speed at which these technologies are advancing is astonishing.

I say this because I am continually baffled how easily humans are faked out with this stuff. As in the general population seems to have zero critical thinking skills. They see something on social media and instantly it must be true, even though we are continually made aware of AI's ability to fake things. The upcoming US presidential election is guaranteed to be full of of fake AI images and propaganda.

The general human is ****ing stupid.. the general computer is not.
 
I say this because I am continually baffled how easily humans are faked out with this stuff. As in the general population seems to have zero critical thinking skills. They see something on social media and instantly it must be true, even though we are continually made aware of AI's ability to fake things. The upcoming US presidential election is guaranteed to be full of of fake AI images and propaganda.

The sad thing is that many people don't want to confirm or learn if something is fake, they just want a "fact" that aligns with their own personal views. It's really sad.
 
This is one of those rare times where I wish the government would be ahead of the curve and have a law requiring some type of watermarking or other technique to verify that an image is real or AI. The speed at which these technologies are advancing is astonishing.

I say this because I am continually baffled how easily humans are faked out with this stuff. As in the general population seems to have zero critical thinking skills. They see something on social media and instantly it must be true, even though we are continually made aware of AI's ability to fake things. The upcoming US presidential election is guaranteed to be full of of fake AI images and propaganda.
When you can just take a screenshot... watermarking will do nothing but catch a few less tech-savvy dum-dums. I seriously have no idea what can be done to prevent fakes.

The only thing I can think of is for the generator to insert a secret pixel pattern that covers the entire image, undetectable by humans, than can be read akin to a QR code by specialised software/hardware. (and even this can just be destroyed by lowering the quality of the original AI image which happens all the time online)
 
Back