OpenAI is struggling with ChatGPT-5 delays and rising costs

midian182

Posts: 10,633   +141
Staff member
In a nutshell: OpenAI is still working on ChatGPT-5, the next generation of the company's multimodal large language model, but the project is reportedly struggling. Not only is ChatGPT-5 behind schedule after failing to launch this year, it's also costing the company a fortune.

It was reported back in March that ChatGPT-5, which will supposedly offer plenty of enhanced, additional features over the current GPT-4o model, was being trained by OpenAI and set to launch soon.

The end of the year is just over a week away, but there's still no sight of the next-gen version of ChatGPT. The Wall Street Journal has shed some light on why.

GPT-5, codenamed Project Orion, has been in development for 18 months at Sam Altman's firm. Microsoft, OpenAI's biggest investor, expected it to be released in mid-2024.

The WSJ's sources say OpenAI has already conducted at least two training runs designed to improve the model by training it with huge quantities of data.

Those training runs have not gone too well, according to people close to the project. The initial run was said to be slower than expected, suggesting a larger full-scale training run would take an incredibly long time, pushing up the costs even further. It was concluded that more diverse, high-quality training data was neededs as the public internet didn't have enough to make GPT-5 noticeably "smarter" than its predecessor.

One solution OpenAI is trying is hiring people to write fresh code or solve math problems for Orion to learn from, essentially creating training data from scratch. It's a slow process: GPT-4 was trained on an estimated 13 trillion tokens. A thousand people writing 5,000 words per day would take months to produce a billion tokens.

OpenAI has also started developing synthetic data – data created by its current AI models – to train Orion. We've previously heard warnings about the sort of nonsensical garbage these AI feedback loops can create, though OpenAI believes the problems can be avoided by using data created by o1.

The internal turmoil at the company hasn't helped matters. CEO Altman was ousted before quickly returning in late 2023, and more than two dozen key executives have left OpenAI this year. Altman previously blamed the release of o1 for GPT-5's delay.

The billions of dollars being spent on all things AI-related continues to climb higher – investment that has yet to result in equal returns. OpenAI knows it needs to justify the expense of ChatGPT-5 by ensuring the model is a marked improvement over what came before, something that is proving more difficult as the internet's training data is being used up.

Permalink to story:

 
Sounds like they're actually going to have to work to tweak the algorithms instead of just throwing more data at the problem. Instead of brute forcing it by using as much data as possible they're going to have to hire some really smart people. It's not like they didn't have any warning, the "running out of data" problem has been in the news for well over a year now which means they've probably aware of this for far longer than that.
 
Nice irony. One sign of intelligence is the ability to determine an outcome with a limited set of available data. Saying they've run out of training data is basically admitting they've reached the limit of how good the system can be based on the current logic behind the data. They need to get smarter people in to work on that logic rather than expecting the existing 'brain' to magically be smarter while consuming the dross churned out by morons and trolls on social media.
 
This is why all the AI creating companies are pivoting to “agents” that will use existing models to do more stuff for you (typically through a browser): because the idea that more GPUs and data would keep making it smarter was wrong.
 
Yeah, well, the wall of LLM was always going to be hit.. seems to be about now. No matter how many tokens you feed into it.

Feeding these models the "depths of human knowledge", well, from the Internet, which is of course full of errors, bias, and general bullshit, will only get you so far.

The only way this will at all progress is when the current models are thrown out and a new way of generation is established.

AGI is a looong way off...
 
pp,840x830-pad,1000x1000,f8f8f8.jpg
 
Sounds like they're actually going to have to work to tweak the algorithms instead of just throwing more data at the problem. Instead of brute forcing it by using as much data as possible they're going to have to hire some really smart people. It's not like they didn't have any warning, the "running out of data" problem has been in the news for well over a year now which means they've probably aware of this for far longer than that.
All the really smart people in that company have already left because they all knew it was a great big con.
All the talk of dangerous super intelligent AI, and people leaving because of it, was just part of the con. They left to look for more realistic jobs, and obviously didn't like to be mocked by their friends: SF is one of the few places in America where there are no fanboys, because they are wise to all of the lies.
 
How it ran out of data ? Is unbelievable to think all those books, movies and even melodies out there wasn't enough. Did it try other languages, too ? What about scientific data ?

Hard to believe it ran out of data. Maybe the logic is not enough... Maybe just reading wasn't enough.
 
How it ran out of data ? [...]

It seems utterly incredible, doesn't it?... so much out there, and it's not enough.

I think that's the very problem.

Sure, there is an ever-growing, ongoing production of content. But that doesn't mean it's quality content. And here is where things quickly go awry: how does the LLM know what content is worth using, and what should be rejected outright? At the current state-of-the-art models, this is accomplished through humans, who tell the AI what to keep and what to reject (and why).

But here things grind to a halt: humans are slow, when taken as individuals. There is so much one isolated human can review, per unit of time. Thus, you need to have a lot of people doing that job. People are expensive, even when cheaply outsourced to companies providing that kind of service, of which there are several (and aye, this includes many different languages as well). Now, one of the largest companies in the market boasts of having nearly half a million collaborators — a claim which is actually not very far from the truth. What they don't say is that 90% of those are worthless and are pretty much excluded from the very beginning. The remainder barely qualifies as 'appropriate' for the task at hand and their work needs further (human) review before being 'useful'. Reviewers are also reviewed, and so forth, going through a complex hierarchy, just for a single task, which might merely be one question to the model and the (correct) answer. Sometimes, this human pipeline needs to bounce up and down the reviewer hierarchy twenty times (in exceptional cases, even more), all of which need costly human labour to complete. And when that finally happens — after hours or even days — the model is finally able to add one more question to its repertoire, and so on.

So, that's expensive and time-consuming, but it seems unavoidable. Or else, the models simply don't get it right.

To be able to process more data on the Internet, you need even more humans to select what data is worth keeping, and what has to be discarded. But there is a limit, namely, the number of humans willing to work on this area, and who have enough quality, common sense, discernment, or at least some form of education which enables them to correct the AI models when required. Perhaps, instead of tens of thousands, you need tens of millions; but that would drive the costs of training such AI models simply unbearable — therefore, throwing the whole business model off tracks. That's the key issue.

Or, in other words: Garbage In, Garbage Out. If all you can do (cheaply) is to feed the models on garbage, then all content it's asked to produce will naturally be garbage as well...
 
Back