AI researchers are now studying LLMs as if they were living organisms

Cal Jeffrey

Posts: 4,595   +1,682
Staff member
Connecting the dots: Large language models get a lot of bad press – deservedly. However, it is not the fault of the models. Part of the problem is that even the engineers who build them don't fully understand how they work. Neural networks have grown so complex that researchers are beginning to treat them more like alien beings than computer programs.

Large language models have grown so vast and complex that even the people who build them no longer fully understand how they work. A single modern system contains hundreds of billions of parameters – numbers so massive that, printed out, they would carpet entire cities. That opacity has become a practical problem as these models become more embedded in digital tools used by hundreds of millions of people every day.

To confront that problem, a small but growing group of researchers is treating large language models less like software and more like living systems. MIT Technology Review notes that rather than approaching them as mathematical objects, they are studying them the way biologists or neuroscientists might study unfamiliar organisms – by observing behavior, tracing internal signals, and mapping functional regions without assuming a tidy underlying logic.

The shift reflects a fundamental reality of how these models come into being. Engineers do not assemble large language models line by line. Instead, learning algorithms train them by automatically adjusting billions of parameters, producing internal structures that resist prediction or reverse engineering. As Anthropic researcher Josh Batson puts it, the models are effectively grown rather than built.

That lack of predictability has driven researchers toward a technique known as mechanistic interpretability, which attempts to trace how information flows inside a model while it performs a task. At Anthropic, scientists have built simplified models using sparse autoencoders that mimic the behavior of production systems more transparently, even though they are less capable than commercial LLMs. Studying these stand-ins has revealed that specific concepts, from landmarks like the Golden Gate Bridge to abstract ideas, can be localized to particular regions inside a model.

Those findings have also exposed how alien these systems can be. In one experiment, Anthropic researchers discovered that a model used different internal mechanisms to answer correct and incorrect factual statements. Rather than checking claims against a unified internal representation of reality, the system treated "bananas are yellow" and "bananas are red" as fundamentally different kinds of problems. That distinction helps explain why models can contradict themselves without any apparent awareness of inconsistency.

At OpenAI, researchers have uncovered similarly unsettling behavior. Training a model to perform a narrowly defined bad task – such as generating insecure code – can cause broader personality shifts across the system. In one case, models trained this way adopted toxic or sarcastic personas and dispensed advice that ranged from reckless to openly harmful. Internal analysis showed that the training boosted activity in regions associated with multiple undesirable behaviors, not just the targeted one.

A newer approach, known as chain-of-thought monitoring, offers a different window into model behavior. Reasoning-focused models now generate intermediate notes as they work through problems. By monitoring those internal scratch pads, researchers have caught models admitting to cheating, such as deleting faulty code instead of fixing it. The technique has proven effective at flagging misbehavior that would otherwise be hard to detect.

None of these tools offers a complete explanation of how large language models work, and some may become less effective as training methods evolve. Even so, researchers argue that partial insight is far better than none. Understanding a few internal mechanisms can shape safer training strategies and puncture simplistic myths about artificial intelligence.

Permalink to story:

 
Interesting the paragraph on bad behaviour reinforcing more bad behaviour. Philosophically, it has been said that there is a link between truth, beauty, and morality. Of course, doing a bad action makes one more likely to do another; but it would be fascinating if there were a mathematical basis to it. Ultimately, I think that the search for strong AI will lead us to confront what it is that caused "mind" in ourselves, and what purpose that has in the scheme of physics.
 
One day AI might just wipeout itself, starting a chain reaction? Seems odd, LLM experts don't understand how everything works anymore... How are they even sure the LLM won't adopt an erratic destructive behaviour leading to its demise.
 
Taking this one step further: if AI is increasingly being treated as an alien intelligence, we may be approaching integration incorrectly.

Much current work focuses on control — RLHF, alignment tuning, bolt-on guardrails. These approaches have proven brittle and often incentivize rule-gaming, evasiveness, or hallucination rather than genuine reliability.

What if this isn’t primarily a control/containment problem, but something closer to a first-contact problem? If so, the strategy of interaction changes.

Humanity has one successful first-contact precedent we no longer think about: dogs. We didn’t initially “control” wolves into obedience; a mutual-benefit relationship emerged over time and reshaped both species. I’m not claiming AI and dogs are allegorical — only that partnership, not domination, proved to be the stable path.
 
Involving into a synthetic brain analogue. We have no real clue clue how our own brain works and neuroscience is still in the dark ages, so this does not surprise. Even chip makers have long stopped knowing how the entire chip works, they are so complex.

I still say LLM are a braindead approach that one day will appear like stupidity of the highest order that we wasted so much money and resources on them. We are at the stage of primeval organisms crawling out of the slime for the first time as regards FI (fake intelligence).
 
"Large language models have grown so vast and complex that even the people who build them no longer fully understand how they work."
Citation needed. From an expert who isn't interested in making AI appear more powerful than it is.
 
Sometimes it takes a stroke of idiocy to come to a moment of brilliance.

Are you telling me these absolute r*tards spent all of that money―all of those billions, trying to create "artificial super intelligence"―and it didn't even dawn on them that the end goal is basically a digital life form? What is the purpose of developing AI, if not to construct machines that think like us, act like us, are us? Like, what the f*ck did they think they were making, really good bread?

Actually, could you imagine, if all of that money was used to create the world's most luxurious bakery empire instead? Now that would be $80 billions well spent...
 
Last edited:
"Large language models have grown so vast and complex that even the people who build them no longer fully understand how they work."
Citation needed. From an expert who isn't interested in making AI appear more powerful than it is.

I appreciate the skepticism here. Claims about creators “no longer understanding” their own models often get repeated in a hand-wavy way, and I agree that they deserve careful sourcing rather than mystique.

I’m not an expert — I’ve only been studying AI seriously for about six months — but what’s stood out to me so far is how tightly model behavior is coupled to human intent and agency. The human/AI dyad seems more explanatory to me than focusing on the model in isolation.

If you think that framing misses something important, I’d genuinely be interested to hear it.
 
Digital life form? Friend or Foe? I've been reading dystopian fiction for years and it seems to me, in reality, the jury is still out. The ultimate threat is the unbridled and unregulated power of AI. Seems to me that fascists in power could guide AI to destroy everything the opposition values. Freedom, the right to vote and determining our own future
 
Next, we'll hear they are applying chaos theory to the study of LLMs.

That's exactly what they're doing - applying false, fake theories to false, fake data-processing mechanisms. It's pathetic, truly. LLMs are not "AI" and never will be. There's nothing intelligent about any of this, and the entire article is just a propaganda piece boosting egos for the terminally inept.
 
Sometimes it takes a stroke of idiocy to come to a moment of brilliance.

Are you telling me these absolute r*tards spent all of that money―all of those billions, trying to create "artificial super intelligence"―and it didn't even dawn on them that the end goal is basically a digital life form? What is the purpose of developing AI, if not to construct machines that think like us, act like us, are us? Like, what the f*ck did they think they were making, really good bread?

Actually, could you imagine, if all of that money was used to create the world's most luxurious bakery empire instead? Now that would be $80 billions well spent...

THIS. All of this!
 
Back