Anthropic unveils new powerful AI that finds software flaws, but says it's too dangerous to release

Claude Mythos Preview has already identified thousands of high-severity bugs across major systems and software

By Skye Jacobs April 8, 2026, 11:40 23 comments Add TechSpot

Anthropic unveils new powerful AI that finds software flaws, but says it's too dangerous to release

Serving tech enthusiasts for over 25 years.
TechSpot means tech analysis and advice you can trust.

In brief: Anthropic is framing its latest cybersecurity effort as a race against time, using an unreleased frontier model to help some of the world's largest tech and finance companies identify software flaws before attackers equipped with similar AI can exploit them.

In Project Glasswing, announced Tuesday, the company is giving a select group of major tech and financial firms access to Claude Mythos Preview, a frontier model that has already uncovered thousands of previously unknown software vulnerabilities. Anthropic says the model is too dangerous to release to the general public.

"We do not plan to make Claude Mythos Preview generally available due to its cybersecurity capabilities," Newton Cheng, Frontier Red Team Cyber Lead at Anthropic, told VentureBeat. "However, given the rate of AI progress, it will not be long before such capabilities proliferate, potentially beyond actors who are committed to deploying them safely. The fallout – for economies, public safety, and national security – could be severe."

Those partners – including Amazon Web Services, Apple, Broadcom, Cisco, CrowdStrike, Google, JPMorgan Chase, the Linux Foundation, Microsoft, Nvidia, and Palo Alto Networks – will use the model to hunt for vulnerabilities across critical infrastructure and share their findings. More than 40 additional organizations that build or maintain key software will gain access for targeted scans.

Anthropic's pitch is straightforward: AI has reached a point where it can outperform all but the very best human specialists at finding and exploiting security bugs in code, and the only responsible approach is to give defenders an early lead. The company says Mythos Preview has already identified thousands of high-severity zero-day vulnerabilities across every major operating system, web browser, and other critical software.

Anthropic's own examples show that Mythos can outperform conventional tools. According to the company, the model autonomously discovered a 27-year-old remote-crash vulnerability in OpenBSD – a system long regarded as one of the most security-hardened operating systems and commonly used to run firewalls and other critical infrastructure.

It also discovered a 16-year-old bug in FFmpeg, the widely deployed video encoding and decoding library, in a line of code that automated tests had exercised five million times without detecting the issue. Separately, it chained Linux kernel vulnerabilities to escalate from ordinary user access to full system control.

The flaws have been disclosed to maintainers and patched; for other issues still being fixed, Anthropic says it is publishing cryptographic hashes now and will share technical details after patches are released.

Anthropic reports that the model scores 83.1% on the CyberGym vulnerability benchmark, compared with 66.6% for Claude Opus 4.6, its next-best model. On coding tasks, Mythos achieves 93.9% on SWE-bench Verified and 77.8% on SWE-bench Pro, versus 80.8% and 53.4%, respectively, for Opus 4.6.

The harder problem is managing thousands of bug reports once an AI system generates them. Cheng said Anthropic has built a triage pipeline to prevent overwhelming open-source maintainers, many of whom volunteer their time. The company also throttles the rate of reports: "We do not submit large volumes of findings to a single project without first reaching out in an effort to agree on a pace the maintainer can sustain."

When it has access to source code, Anthropic aims to attach a model-generated candidate patch to each report, clearly labeled to indicate whether it was written or reviewed by a model, and offers to work with maintainers on producing a production-ready fix.

The company says it follows coordinated vulnerability disclosure practices, typically waiting 45 days after a patch is available before publishing full technical details. Anthropic reserves the right to shorten that window if details are already public or if early disclosure would materially help defenders, or to extend it if patch deployment is unusually complex or widespread.

Money and compute resources are another part of the story. Anthropic is committing up to $100 million in usage credits for Mythos Preview across Project Glasswing, plus $4 million in direct donations to open-source security groups. During the research preview, those credits will cover most usage; afterward, participants will pay $25 per million input tokens and $125 per million output tokens, with access through the Claude API, Amazon Bedrock, Google Cloud Vertex AI, and Microsoft Foundry.

– sui (@birdabo) April 8, 2026

The company has described Mythos as a large, compute-intensive model that is expensive to serve. Cheng said Anthropic plans to implement new safeguards first on an upcoming Claude Opus model that "does not pose the same level of risk as Mythos Preview." Security professionals whose legitimate work may be affected by those safeguards can apply to a Cyber Verification Program.

The Glasswing announcement comes amid scrutiny of Anthropic's own security practices. A misconfigured content management system left a draft Mythos blog post and roughly 3,000 other internal assets publicly searchable, and a separate npm packaging error briefly exposed what appeared to be Claude Code's complete original source to anyone running npm install.

"Security is central to how we build and ship," Cheng said. "These two incidents, a blog CMS misconfiguration and an npm packaging error, were human errors in publishing tooling, not breaches of our security architecture. We've made changes to prevent these from happening again, and we'll continue to improve our processes."

– Nina Schick (@NinaDSchick) April 7, 2026

Anthropic stresses that neither incident involved its model weights, training infrastructure, or API systems. However, for a company asking governments and Fortune 500 firms to trust a model capable of autonomously chaining Linux kernel exploits, even operational missteps carry significant reputational risk.

All of this is happening on a tight timeline. "Frontier AI capabilities are likely to advance substantially over just the next few months," Cheng said. "Given the rate of AI progress, it will not be long before such capabilities proliferate, potentially beyond actors who are committed to deploying them safely."

Anthropic says it will publish its findings from the initiative within 90 days and has suggested the possibility of an independent third-party body as a potential long-term home for large-scale AI-driven cybersecurity efforts.

See more TechSpot in Google Add us as a preferred source and our reporting shows up first when you search.

Add TechSpot

// Related Stories

Featured on TechSpot