ChatGPT gets crushed at chess by a 1 MHz Atari 2600

Alfonso Maruccia

Posts: 1,790   +539
Staff
Editor's take: Despite being hailed as the next step in the evolution of artificial intelligence, large language models are no smarter than a piece of rotten wood. Every now and then, some odd experiment or test reminds everyone that so-called "intelligent" AI doesn't actually exist if you're living outside a tech company's quarterly report.

A cycle-exact emulation of the Atari 2600 CPU running at a meager 1.19 MHz is more than enough to utterly humiliate ChatGPT in a game of chess. Citrix engineer Robert Jr. Caruso conducted the "funny" little experiment over the weekend, pitting OpenAI's mighty chatbot against a virtual Atari 2600 console emulated by Stella. It didn't end well for the chatbot.

Caruso reportedly got the idea from ChatGPT itself, after chatting with the bot about the history of AI and chess. OpenAI's service volunteered to play "Atari Chess," which Caruso assumed referred to Video Chess – the only chess title ever released for the Atari 2600.

Despite being given a basic layout of the board to identify the pieces, ChatGPT struggled. The bot confused rooks for bishops, missed obvious pawn forks, and made a series of baffling blunders, according to Caruso. At one point, ChatGPT even blamed external factors like the abstract symbols used by Video Chess to depict the pieces for its inability to keep track of the game state.

"For 90 minutes, I had to stop it from making awful moves and correct its board awareness multiple times per turn," the engineer said about ChatGPT's performance against an emulated CPU console from the 70s.

The bot apparently kept asking to restart the game in hopes of improving its performance, but was ultimately defeated by an 8-bit chess engine. A 1 MHz CPU should, at best, be able to think one or two moves ahead, while ChatGPT relies on an endless army of modern, power-hungry GPUs to keep its chat service running. And yet, the 1 MHz CPU won, thrashing the chatbot at beginner level.

Also check out:
The Legends of Tech Series: Atari 2600 - The Atlantis of Game Consoles

Caruso's experiment is a useful reminder about what LLM models actually are: a complex, heuristics-based black box search engine designed to constantly please the final user with some sort of captivating result. They don't "know" anything, have no reasoning or deduction capabilities, and certainly they have no intelligence on their own.

And they absolutely suck at chess.

I never owned an Atari 2600 back in the day, though I did spend some glorious afternoons with my mighty Intellivision console. Next time, I'll try to humble ChatGPT by making it play a round of Battle Chess on an emulated replica of my first x86 machine: an 80286 running at a blazing 16 MHz.

Permalink to story:

 
Purpose-built software will always beat any generic AI.

For the AI to become competitive, it would need to use chess-specific AI models. But it wouldn't go much further, because such models would require significantly more computational resources to compete with highly optimized chess algorithms in the specialized software.
 
Last edited:
Despite being hailed as the next step in the evolution of artificial intelligence, large language models are no smarter than a piece of rotten wood.
That's a revelation. ;)

I don't know how anyone would be dumb enough to think that an LLM could play chess. Heck, it probably hallucinated that it was making moves. 🤣
 
If I understand correctly, he was some how giving ChatGPT images of the screen and giving that to ChatGPT. This required it to first use its image capability to figure out the pieces. I would assume that it failed miserably at this and thus anything else would have been pointless. I would be a lot more interested if he just gave it the locations of the pieces as text, then how would it have been different.
 
If I understand correctly, he was some how giving ChatGPT images of the screen and giving that to ChatGPT. This required it to first use its image capability to figure out the pieces. I would assume that it failed miserably at this and thus anything else would have been pointless. I would be a lot more interested if he just gave it the locations of the pieces as text, then how would it have been different.
This. Image recognition of chatgpt has yet to improve a lot.
 
After looking at his Linkedin page this seems more like:

"Citrix engineer that's out of a job since December 2024 uses AI-hype to bring attention to his page and thus increase the likelihood someone contacts him for work."
 
GenAI is incredibly good at mimicking language and sometimes reasoning through common patterns, but they don’t actually understand the world or apply consistent logic across time.
 
Haha 😂

Still waiting for True A.I. - not some behind the scenes heavily human moderated encyclopedia britannica that returns results like a human.
 
But is A I like a person it can be only as good as what its been taught, you think if it specialized in everything it would be to big.
 
Last edited:
In the early 1990's I used to play those little pocket chess computers. They had slow processors (0.6MHz) and fairly weak programs but they used to thrash me at the time. A modern program running at 1MHz would beat anyone in your local chess club. So it's not surprising that such a program would beat a chatbot playing at beginner level.

If anyone wants a laugh then I had a go at writing a chess program in Java. Google for "bikes and kites fun chess" if you want to give it a game. It's fairly easy to use and plays at a decent club player level. It was surprising difficult writing a program to play a decent game. It was even hard just writing one that played like a beginner :)
 
Purpose-built software will always beat any generic AI.

For the AI to become competitive, it would need to use chess-specific AI models. But it wouldn't go much further, because such models would require significantly more computational resources to compete with highly optimized chess algorithms in the specialized software.
So, if Atari 2600's were specifically programmed for world domination?? hmmm....
 
LLM's can provide chess insight. I've used them on positions sometimes where they actually can identify and recognize brilliant moves. However, since they usually generate and since chess is an exact thing, a tiny error will have enormous consequence and over the course of a game that is deadly. Instead, there have been developed neural networks that are brilliant at chess; for instance google's Alpha Zero taught itself to play chess better than any chess engine or human in record time. I've played against one, Matthew Lai's Giraffe, and it played absolutely humanlike at a high level. It convinced me that such a thing as creativity is absolutely within the grasp of AI, and we've seen plenty of that lately, especially with LLMs creating art and music etc. But tell an LLM to make a realistic chessboard with pieces and it will almost always induce some type of error. I tried it on I believe it was Dalle 3, and it made a perfect chessboard but the pawns on d2 and e2 were missing, which I didn't notice at first because the king and queen standing right behind the pawns blocked the view partially. In itself that could be considered an artistic painting or perception-related art.
 
LLM's can provide chess insight. I've used them on positions sometimes where they actually can identify and recognize brilliant moves. However, since they usually generate and since chess is an exact thing, a tiny error will have enormous consequence and over the course of a game that is deadly. Instead, there have been developed neural networks that are brilliant at chess; for instance google's Alpha Zero taught itself to play chess better than any chess engine or human in record time. I've played against one, Matthew Lai's Giraffe, and it played absolutely humanlike at a high level. It convinced me that such a thing as creativity is absolutely within the grasp of AI, and we've seen plenty of that lately, especially with LLMs creating art and music etc. But tell an LLM to make a realistic chessboard with pieces and it will almost always induce some type of error. I tried it on I believe it was Dalle 3, and it made a perfect chessboard but the pawns on d2 and e2 were missing, which I didn't notice at first because the king and queen standing right behind the pawns blocked the view partially. In itself that could be considered an artistic painting or perception-related art.
Alpha Zero was provided basic information about chess and then it basically plays games against itself and "learns" from there. However chess engines are not intelligent but actually very stupid. They are only strong when they can calculate millions of possible move combinations. Take that away and they mostly suck. Few years ago I trounced Stockfish with following settings/rules: both have maximum 3 seconds per move and Stockfish has only 2 depth. When taken away ability to calculate 50 moves ahead, it just don't have enough "intelligence" to beat even amateur like me. Also evident when playing against online bots, like on chess.com. I have beaten Maximum engine ("3200 ELO") multiple times.

And yeah, I even don't have ELO rating, so as for chess "intelligence" engines have long way to go. (Friend around 2400 ELO estimated my ELO around 1600 for general chess play, on openings I suck because I really don't remember them.)
 
... there have been developed neural networks that are brilliant at chess; for instance google's Alpha Zero taught itself to play chess better than any chess engine or human in record time. I've played against one, Matthew Lai's Giraffe, and it played absolutely humanlike at a high level.

Thanks for that interesting post. Back in 2015, Popular Mechanics wrote an article about Giraffe.

"While it can't really compete against top-of-the-line engines, and sits somewhere in the middle rankings of the most active chess engines, it does stack up well against contemporaries."

https://www.popularmechanics.com/technology/robots/a17339/chess-engine-plays-against-itself/
 
It convinced me that such a thing as creativity is absolutely within the grasp of AI, and we've seen plenty of that lately, especially with LLMs creating art and music etc.

If you talk about the "creativity" of generative AI, it's non-existent. It merely barfs up the creativity from the works of art it was trained on. These models are inherently deceptive, and people turning towards them with this naive wanderlust is making the situation even more dire.
 
Back