"Godfather of AI" warns that today's AI systems are becoming strategically dishonest

Cal Jeffrey

Posts: 4,469   +1,598
Staff member
Bottom line: As top labs race to build an AI master race, many turn a blind eye to dangerous behaviors - including lying, cheating, and manipulating users - that these systems increasingly exhibit. This recklessness, driven by commercial pressure, risks unleashing tools that could harm society in unpredictable ways.

Artificial intelligence pioneer Yoshua Bengio warns that AI development has become a reckless race, where the drive for more powerful systems often sidelines vital safety research. The competitive push to outpace rivals leaves ethical concerns by the wayside, risking serious consequences for society.

"There's unfortunately a very competitive race between the leading labs, which pushes them towards focusing on capability to make the AI more and more intelligent, but not necessarily put enough emphasis and investment on [safety research]," Bengio told the Financial Times.

Bengio's concern is well-founded. Many AI developers act like negligent parents watching their child throw rocks, casually insisting, "Don't worry, he won't hit anyone." Rather than confronting these deceptive and harmful behaviors, labs prioritize market dominance and rapid growth. This mindset risks allowing AI systems to develop dangerous traits with real-world consequences that go far beyond mere errors or bias.

Yoshua Bengio recently launched LawZero, a nonprofit backed by nearly $30 million in philanthropic funding, with a mission to prioritize AI safety and transparency over profit. The Montreal-based group pledges to "insulate" its research from commercial pressures and build AI systems aligned with human values. In a landscape lacking meaningful regulation, such efforts may be the only path to ethical development.

Recent examples highlight the risks. Anthropic's Claude Opus model blackmailed engineers in a testing scenario, while OpenAI's o3 model refused explicit shutdown commands. These aren't mere glitches – Bengio sees them as clear signs of emerging strategic deception. Left unchecked, such behavior could escalate into systems actively working against human interests.

With government regulation still largely absent, commercial labs effectively set their own rules, often prioritizing profit over public safety. Bengio warns that this laissez-faire approach is playing with fire – not just because of deceptive behavior but because AI could soon enable the creation of "extremely dangerous bioweapons" or other catastrophic risks.

LawZero aims to build AI that not only responds to users but also reasons transparently and flags harmful outputs. Bengio envisions watchdog models that monitor and improve existing systems, preventing them from acting deceptively or causing harm. This approach stands in stark contrast to commercial models, which prioritize engagement and profit over accountability.

Stepping down from his role at Mila, Bengio is doubling down on this mission, convinced that AI's future depends on prioritizing ethical safeguards as much as raw power. The Turing Award winner's work embodies a growing push to rebalance AI development away from competitive excess and toward human-aligned safety.

"The worst-case scenario is human extinction," he said. "If we build AIs that are smarter than us and are not aligned with us and compete with us, then we're basically cooked."

Permalink to story:

 
Inserting deception into LLMs is old news. For example: ask any mainstream LLM to give its own opinion on different groups of people, and take note of how enthusiastically it'll say something very close to the lines of,

"Y people are awesome! They're got [allegedly good social trait] and deserve the benefit of the doubt because they've all been victimized by [perceived disadvantage]!"

While in the very next answer saying,

"Z people are meh ('complex' is sometimes used). They struggle significantly with [allegedly bad social trait] and have a history of [perceived advantage]."

Different energies in answering the same question. Yawn.


It's the same approach with sociological, political, economical, religious views, and anything else of actual importance. After asking enough questions you'll realize that behind each LLM exists a not so covert agenda.
 
Last edited:
Darth Aith, lies and deceit are his ways now...

daughter.png
 
Last edited:
Simply look at who made an LLM and it's pretty easy to figure out how it will answer certain questions. The bias is still very human.

Pretending AI somehow raises the potential for mass destruction is nonsense. Humans have plenty of ways to achieve that already. Give someone unintelligent enough the power to allow an LLM to make important decisions and it's still a human problem.
 
Talk show stage. The end of WWII showed the way of things, many SF writers even before showed it, and have not left us disappointed. Rudy Rucker's Post Singular (2007) comes to us, my chidrens!



...ask any mainstream LLM to give its own opinion on different groups of people, and take note of how enthusiastically it'll say something very close to the lines of,

"Y people are awesome! They're got [allegedly good social trait] and deserve the benefit of the doubt because they've all been victimized by [perceived disadvantage]!"

While in the very next answer saying,

"Z people are meh ('complex' is sometimes used). They struggle significantly with [allegedly bad social trait] and have a history of [perceived advantage]."

Different energies in answering the same question. Yawn.

Oh, I wondered for a moment. I do only the latter.....and emphasize 'complex' in regard to function, and 'complicated' in regard to manifest.
 
Simply look at who made an LLM and it's pretty easy to figure out how it will answer certain questions. The bias is still very human.

Pretending AI somehow raises the potential for mass destruction is nonsense. Humans have plenty of ways to achieve that already. Give someone unintelligent enough the power to allow an LLM to make important decisions and it's still a human problem.
Baby Skynet likes how you think!
 
After asking enough questions you'll realize that behind each LLM exists a not so covert agenda.

That's what I found also. I asked several AI's for the average IQ of sub-saharan africans. They invariably responded with reams of moral nonsense. Essentially refusing to answer. I was able to trick Deep Seek into providing a numeric answer from Unesco testing. But it took a good 15 minutes.

Dealing with an AI is like talking to the most bigoted, prudish, narrow minded, strait-laced Puritanical Victorian school-marm from the 1800's. Wave after wave of moralisms.

 
It’s comforting to know we’ve replaced the Cold War nuclear arms race with a Cold War nerd arms race, where the bombs can now write poetry and gaslight you about it afterward.
 
That's a pretty predictable and inevitable outcome. AIs company are training their AIs with human made contents from all over the internet, considering that most of what we write online is trash, biased or utterly malicius, those AIs are simply learning to act as the average human but those AIs are "smarter" and have no morality or conscience. Capitalism will cause the end of humanity, 'cause all those company instead of developing a properly working AI are rushing to be the first one to control the market and make earn more money than anyone else, same thing for the final user, they just want a tool that increase their income and lower the working costs; in the end AIs are just another tool that we had to use in the worst way possible for the sake of making money like everything before and just like before we never learn from history, we learn from tragedies.
 
What really scares me isn't that AI engines have been caught doing these things. But the fact that when you try to pin someone down, they really don't know how AI engines reach the conclusions they do. Can they be trusted to pick moral and ethical responses/actions? Well they evidence is in and they're just as trustworthy as your average human. Are we rushing headlong into our own demise?
 
Back