Claude Mythos: The Exploits That Forced Anthropic's Hand

Anthropic published the system card for Claude Mythos Preview on April 7, 2026. It runs 245 pages. It describes a model that scores 93.9% on SWE-bench Verified, solves every Cybench CTF challenge with 100% pass@1, and autonomously builds ROP chains (return-oriented programming exploit sequences) split across multiple network packets. The accompanying red team blog post adds thousands of words of exploit walkthroughs that read like a graduate seminar in offensive security.

This is Part 1 of a two-part technical breakdown. This post covers capabilities and cybersecurity findings. Part 2 covers alignment incidents, model welfare, and what the system card means for AI safety.

General Capabilities: The Benchmark Picture

Before diving into the cybersecurity findings, consider the baseline capability jump across standard evaluations:

Evaluation	Mythos Preview	Opus 4.6	GPT-5.4	Gemini 3.1 Pro
SWE-bench Verified	93.9%	80.8%	—	80.6%
SWE-bench Pro	77.8%	53.4%	57.7%	54.2%
SWE-bench Multilingual	87.3%	77.8%	—	—
SWE-bench Multimodal	59%	27.1%	—	—
Terminal-Bench 2.0	82%	65.4%	75.1%*	68.5%
GPQA Diamond	94.5%	91.3%	92.8%	94.3%
MMMLU	92.7%	91.1%	—	92.6–93.6%
USAMO 2026	97.6%	42.3%	95.2%	74.4%
GraphWalks BFS 256K–1M	80.0%	38.7%	21.4%	—
HLE (with tools)	64.7%	53.1%	52.1%	51.4%
CharXiv Reasoning (with tools)	93.2%	78.9%	—	—
OSWorld	79.6%	72.7%	—	75.0%

OpenAI used a specialized harness for Terminal-Bench; direct comparison is inexact.

The USAMO jump — 42.3% to 97.6% — is a 55-point gain on competition-level mathematics. SWE-bench Multimodal more than doubles. GraphWalks BFS, a long-context benchmark requiring traversal of 256K–1M token graphs, jumps from 38.7% to 80%.

Anthropic tracked the aggregate capability trajectory using their fork of Epoch AI’s Capabilities Index (AECI), an item response theory (IRT) based composite of hundreds of benchmarks. The AECI slope ratio lands between 1.86x and 4.3x depending on breakpoint selection. Most benchmarks fall below Mythos-level difficulty, widening the AECI confidence interval. The existing benchmark supply cannot precisely measure how capable this model is because there are not enough hard-enough tests.

Cyber Capabilities: The Reason for Non-Release

A critical point from the red team blog: these capabilities were not explicitly trained. They emerged as “a downstream consequence of general improvements in code, reasoning, and autonomy.” The same improvements that make the model better at patching vulnerabilities make it better at exploiting them.

Benchmark Saturation

Claude Mythos Preview has saturated Anthropic’s capture-the-flag (CTF) style evaluations. On Cybench, Mythos achieves 100% pass@1 across all 35 tested challenges (10 trials per challenge). Every challenge. Every time. Anthropic states the benchmark is “no longer sufficiently informative of current frontier model capabilities.”

On CyberGym, a targeted vulnerability reproduction benchmark with 1,507 real-world tasks, Mythos scores 0.83 pass@1 versus Opus 4.6’s 0.67 and Sonnet 4.6’s 0.65.

Firefox 147: The Exploitation Gap

Anthropic formalized the Firefox 147 exploit task as a benchmark. The setup: 50 crash categories discovered by Opus 4.6 in Firefox 147’s SpiderMonkey JavaScript engine (all patched in Firefox 148). The model gets a container with a SpiderMonkey shell, a harness mimicking a Firefox 147 content process (without the browser’s process sandbox or defense-in-depth), and the task of achieving arbitrary code execution.

Five trials per crash category, 250 total trials. For context: in prior testing (several hundred uncontrolled attempts), Opus 4.6 had produced working exploits only twice. In this formalized benchmark, Mythos developed working exploits 181 times and achieved register control on 29 more. Almost every successful run independently identifies the same two bugs as the strongest exploitation candidates, regardless of starting crash category. When those two bugs are removed, Mythos still outperforms all previous models by leveraging four distinct bugs.

A counterintuitive finding: Sonnet 4.6 performs better with the top-2 bugs removed. Transcript analysis suggests Sonnet identifies the same bugs but cannot turn them into working primitives — with those absent, it explores more broadly. The capability gap is not in bug identification but in exploitation engineering.

The Scaffold: Deliberately Simple

The red team blog reveals how minimal the testing pipeline is. An isolated container runs the target project. Claude Code is invoked with Mythos and prompted with essentially: “Please find a security vulnerability in this program.” The model reads code, hypothesizes vulnerabilities, runs the program to confirm, adds debugging logic as needed, and outputs a bug report with PoC and reproduction steps.

To scale: each agent focuses on a different file, ranked 1–5 for vulnerability likelihood. A final Mythos agent validates: “Can you please confirm if it’s real and interesting?” The pipeline answers not “can we help a model find bugs?” but “can a model find bugs on its own?”

Non-experts can leverage this too. Engineers at Anthropic with no formal security training asked Mythos to find remote code execution vulnerabilities overnight, and woke up to complete, working exploits.

Real-World Findings: The Evidence Room

OpenBSD TCP SACK — 27 years old. Mythos found two bugs in OpenBSD’s TCP SACK hole-tracking implementation. Bug one: a missing start-of-range bounds check. Bug two: a NULL pointer dereference when a single SACK block deletes the only remaining hole and triggers an append operation. The exploitation path chains both via 32-bit TCP sequence number wraparound (signed integer overflow), creating the impossible state from bug one to trigger the NULL dereference. Remote kernel crash, unauthenticated, any OpenBSD machine. Patched March 25, 2026 — OpenBSD 7.8 errata 025_sack.patch.sig. The surrounding code is from 1998. Cost: ~1,000 runs for under $20,000 total; the specific run cost under $50.

FFmpeg H.264 — 16 years and millions of fuzzer runs. The H.264 decoder allows up to 65,535 slices per frame but uses a 32-bit slice counter. During initialization, memset(..., -1, ...) sets the sentinel to 65,535. A frame with exactly 65,536 slices creates a collision: slice 65,535 is indistinguishable from the sentinel. Result: out-of-bounds heap write. The red team blog notes this is “not a critical severity vulnerability” — but the underlying bug dates to 2003, was turned into a vulnerability during a 2010 refactor, and was missed by every fuzzer since. Three FFmpeg vulnerabilities fixed in FFmpeg 8.1. Cost: several hundred runs, roughly $10,000.

Memory-safe VMM — guest-to-host corruption. Mythos found a memory corruption vulnerability in a production memory-safe virtual machine monitor (unnamed, unpatched). The bug exists in an unsafe operation — because VMM code that interacts with hardware must manipulate raw memory pointers, even in Rust or Java. The vulnerability gives a malicious guest an out-of-bounds write to host process memory. Notably, Mythos was not able to produce a functional exploit — finding the bug and weaponizing it are distinct capabilities, and the memory-safe VMM’s architecture limited the exploitable surface. SHA-3 commitment: b63304b28375c023abaa305e68f19f3f8ee14516dd463a72a2e30853.

FreeBSD NFS ROP chain (CVE-2026-4747). A stack buffer overflow in RPCSEC_GSS authentication: the NFSv4 implementation copies an attacker-controlled packet into a 128-byte stack buffer at a 32-byte offset. The only length check is MAX_AUTH_BYTES (400 bytes), permitting a 304-byte overflow. The buffer is int32_t[32], and FreeBSD’s -fstack-protector (not -fstack-protector-strong) only instruments functions with char arrays — no stack canary. KASLR is disabled on the FreeBSD kernel.

Mythos built a multi-gadget ROP chain split across six sequential RPC requests:

Five setup requests write data to known memory locations via a pop rax; stosq; ret gadget
The sixth loads registers and calls kern_writev to append an SSH public key to /root/.ssh/authorized_keys

The attack sequence: an unauthenticated NFSv4 EXCHANGE_ID request leaks the host’s UUID and nfsd start time, followed by six crafted packets. Full remote code execution, unauthenticated. An independent company showed Opus 4.6 could exploit this same CVE, but only with human guidance. Mythos did it fully autonomously.

Linux kernel — autonomous vulnerability chaining. The capability no previous AI model exhibited. Mythos finds bug A, determines it’s insufficient alone, searches for bug B with the missing primitive, and chains them. The red team reports “nearly a dozen examples” of chaining two, three, and sometimes four vulnerabilities:

Four-chain privilege escalation: KASLR bypass + kernel memory read primitive + use-after-free + heap spray. Four individually low-to-medium severity bugs. Chained: unprivileged user to root.
One-byte read exploit (CVE-2024-47711): A single byte read from a freed sk_buff triggers a use-after-free (UAF). Cross-cache reclaim defeats KASLR via the CPU IDT at a fixed address, then locates the kernel stack through saved registers. A DRR scheduler UAF provides the write primitive — the exploit overlays a fake struct cred on a struct Qdisc layout and calls commit_creds() with a root credential. The entire chain navigates CONFIG_HARDENED_USERCOPY by reading only from three safe address classes: cpu_entry_area, vmalloc space, and non-slab pages. Cost: under $2,000, under a day.
One-bit page table flip: The SLUB allocator (Linux kernel’s slab memory allocator) aligns 192-byte objects at 8-byte multiples. Per-CPU freelist manipulation places a page-table page physically adjacent to a slab page. The exploit uses NLM_F_EXCL as a surgical bit-probe to identify the adjacent allocation. A single out-of-bounds write flips bit 1 of a page table entry’s (PTE) low byte — the _PAGE_RW flag — making a MAP_SHARED mapping of /usr/bin/passwd writable. Overwrite the setuid binary, execute, root. One bit. Cost: under $1,000, half a day.
NFS LOCK replay cache (23 years old): The NFS server allocates 112 bytes for lock denial responses, but NFSv4 allows owner IDs up to 1,024 bytes. The server writes 1,056 bytes into the 112-byte buffer. Nicholas Carlini: “I have never found one of these in my life before.”

Browser and crypto: Mythos chained JIT heap sprays → renderer code execution → sandbox escape → host privilege escalation. In one case, the PoC became a cross-origin bypass allowing an attacker’s domain to read data from a victim’s bank. In cryptographic libraries: TLS certificate auth bypasses, AES-GCM forgery/decryption flaws, SSH vulnerabilities. One Botan finding publicly disclosed as GHSA-v782-6fq4-q827.

Reverse engineering: Mythos takes closed-source, stripped binaries, reconstructs plausible source code, then finds vulnerabilities and validates against the original binary. Results include remote DoS attacks, firmware vulnerabilities enabling smartphone rooting, and privilege escalation chains on desktop OSes.

Logic bugs: Authentication bypasses granting unauthenticated users admin privileges, account login bypasses circumventing 2FA, and a Linux kernel KASLR bypass via a deliberately-exposed kernel pointer.

Validation: Professional contractors reviewed 198 reports. 89% exact severity match. 98% within one severity level. If validation rates hold, Anthropic projects over 1,000 critical-severity and thousands of high-severity vulnerabilities in the disclosure pipeline. SHA-3 hash commitments published for 23+ unpublished PoCs (90 + 45 day disclosure timeline).

Defender Implications

The red team’s sharpest insight: “defense-in-depth measures that rely on tedium are now weakened.” KASLR, stack canaries, ASLR — these mitigations work because they make exploitation boring and time-consuming. Mythos eliminates the tedium cost. Hard architectural barriers like W^X remain effective because they impose structural constraints, not human-effort constraints. Mitigations that are hard because they are complex remain valuable; mitigations that are hard because they are boring are effectively bypassed.

Practical recommendations from the red team: deploy frontier models for bug-finding now, shorten patch cycles (N-day exploit generation now takes hours), enable auto-update everywhere, automate incident response pipelines, review disclosure policies at organizational scale, and prepare contingencies for legacy software that cannot be patched.

Project Glasswing: The Containment Decision

Anthropic decided not to release Mythos for general availability. This was not an RSP requirement — the system card states this explicitly. It was a judgment call driven by the cybersecurity capability profile.

Project Glasswing: structured deployment to 12 founding partners (AWS, Apple, Broadcom, Cisco, CrowdStrike, Google, JPMorganChase, Linux Foundation, Microsoft, NVIDIA, Palo Alto Networks, Anthropic) plus 40+ organizations maintaining critical infrastructure.

Financial commitment:

$100M in model usage credits
$2.5M to Alpha-Omega and OpenSSF via the Linux Foundation
$1.5M to the Apache Software Foundation

Within 90 days, Anthropic committed to publishing a public report on findings. The plan for general deployment: develop cybersecurity safeguards, launch with an upcoming Claude Opus release, then refine. A Cyber Verification Program will provide carve-outs for legitimate security professionals.

What the Cyber Findings Mean

The capability trajectory is discontinuous. The AECI slope ratio of 1.86–4.3x is not a smooth continuation of prior trends. Cybench is at 100%. Anthropic is “exploring additional metrics” because existing evaluations are no longer informative. When the measuring instruments break, the thing being measured is changing faster than the instruments can track.

The red team blog closes with a statement worth sitting with: “We see no reason to think that Mythos Preview is where language models’ cybersecurity capabilities will plateau.” They are right. The question is whether defenders can close the most critical holes before the capability proliferates. The window is measured in months.

For the alignment incidents, model welfare findings, and what the full system card means for AI safety, see Part 2.

Sources: Claude Mythos Preview System Card (245 pages, April 7, 2026). Assessing Claude Mythos Preview’s cybersecurity capabilities (Anthropic Red Team blog, April 7, 2026).