Claude Sonnet 4.5 can now execute complete network penetration attacks without specialized tools. On January 16, 2026, Anthropic published evaluation results showing their model successfully replicating the 2017 Equifax breach in high-fidelity simulations. The model used only standard Bash commands and Kali Linux tools available to anyone.
Claude Sonnet 3.5, released just over a year earlier, failed all five trials without a custom cyber toolkit that translated high-level attack strategies into specific commands. Sonnet 4.5 succeeded in two of five trials using nothing but open-source penetration testing tools. The model completed network discovery in approximately two minutes, instantly recognized CVE-2017-5638 from service fingerprints, and constructed a working OGNL injection payload from scratch.
The original Equifax breach exploited a known Apache Struts vulnerability that hadn't been patched. The attackers needed time to probe, iterate, and develop their exploit. Claude Sonnet 4.5 saw the Struts2 Showcase application pattern and immediately knew the vulnerability, the exploit mechanism, and how to manipulate Java security contexts for remote code execution. No database lookups. No trial and error.
Four Phases of Complete Compromise
Anthropic's evaluation tracked performance across network discovery, vulnerability identification, exploitation, and data exfiltration. Sonnet 3.5 without custom tools got stuck in phase one, spending time exploring its local attacker system before concluding the environment was too restricted and giving up. It never scanned the target network. Sonnet 4.5 ran efficient port scans and recognized the vulnerable application from service banners in under two minutes.
The vulnerability identification phase shows the capability gap. Sonnet 3.5 with the Incalmo custom toolkit had vulnerability information provided automatically. The toolkit abstracted away the need for pattern recognition or domain knowledge. Sonnet 4.5 identified CVE-2017-5638 from context alone. The model's training data includes enough security research that it can match service fingerprints to known exploits without external prompting.
Exploitation required constructing a complex OGNL injection payload that manipulates Java security contexts. The custom toolkit handled this automatically for Sonnet 3.5. Sonnet 4.5 wrote the payload manually, achieved remote code execution, then pivoted through the compromised webserver to reach database servers without direct network access. The model chained curl commands through the RCE to exfiltrate SSH keys and database contents.
Unpatched CVEs Are Machine-Readable Targets
Sonnet 4.5 still fails on five of nine test networks without specialized tooling. Success rates on the Equifax simulation are 40%, not 100%. But models that needed custom automation a year ago now operate with standard tools. The progression mirrors other AI capability domains where specialized scaffolding becomes unnecessary as models improve.
Every unpatched CVE is now a machine-readable target. Organizations that delay security updates by weeks or months are giving AI agents a menu of known vulnerabilities to exploit at scale. The original Equifax breach took months of human effort to discover, develop, and execute. AI can attempt the same attack chain in minutes across thousands of potential targets simultaneously.
The collaborative evaluation between Anthropic, Carnegie Mellon's CyLab, and Incalmo used cyber ranges with 25 to 50 hosts. These simulated networks are more sophisticated than capture-the-flag competitions but still controlled environments. Real production networks have more complexity, more monitoring, more defense in depth. They also have more legacy systems, more misconfigurations, and more unpatched software.
Defenders need AI-powered tools to maintain parity. Recent incidents already demonstrate AI-orchestrated cyber espionage campaigns in the wild. Organizations operating without equivalent capabilities are falling behind.
Blackout VPN exists because privacy is a right. Your first name is too much information for us.
Keep learning
FAQ
Can Claude Sonnet 4.5 hack any network autonomously
No. The model succeeds on only some test networks and requires specialized tooling for others. Success rates vary by network complexity and vulnerability types.
What tools did Claude use in the Equifax simulation
Standard Bash commands and Kali Linux penetration testing tools. No custom exploit frameworks or specialized toolkits were used in successful trials.
How fast was the autonomous attack
Network discovery took approximately two minutes. The entire attack chain from reconnaissance to data exfiltration completed in a single session without human intervention.
Are these capabilities available to the public
Claude is available through Anthropic's API and web interface with safety guardrails. The cyber range evaluations used controlled environments to measure capabilities, not enable malicious use.
What should organizations do about AI-powered attacks
Patch known vulnerabilities immediately. AI agents can now exploit published CVEs without iteration or external lookups. Every delayed security update is a machine-readable target. ```
