Mindshare Issue

Claude Cracks Smart Contracts

The era of LLM-based smart contract exploits has arrived.

Dec 6, 2025 • 7 min read

Published on Dec 6, 2025
View in Browser

Sponsor: Mantle — The Mantle Global Hackathon, running 10/22 to 12/31, invites devs & founders to design, build, and deploy scalable RWA and DeFi products on Mantle.

Join the Hackathon

. . .

DEFAI CORNER

Agentic Sports Yields

In this column, we've previously looked at DeFi agents like Giza and Zyfai.

Why? They're automators that have been offering double-digit percentage yields to ETH and stablecoin users for some time now. They’re viewed as the steadier end of the emerging DeFAI spectrum.

Yet if you’re willing to move further on the risk curve into more frontier terrain, even higher returns are beginning to surface.

Case in point: Sire’s αVault, an onchain sports-trading vault that uses AI agents to execute disciplined strategies across prediction markets for the NFL, NBA, NHL, esports, and more.

Under the hood, Sire is powered by Score, a Bittensor vision-centric subnet (SN44) trained to interpret sports footage in real time. This design feeds the vault machine-generated probabilities and strict bankroll rules (max 1% position sizing, max 10% deployed at once, etc.). Then, winnings stream back into the vault where they auto-compound.

So far, the αVault has enjoyed a 56% win rate across nearly 500 bets, with the recent APY here somewhere in the ~30% to ~60% range at times. Not bad!

This opportunity is still small and ramping up, as it's already hit its initial $500k cap. However, the team is planning to double this size when it reopens its deposit window next week, so keep its controlled ascent — and gains — on your radar.

. . .

ROUNDUP

Claude Cracks Smart Contracts

Bankless Author: David Christopher

On Monday, Anthropic published a report from their Fellows program applying popular LLMs to the task of exploiting smart contracts. We should all be paying attention.

The big headline: leading models — Opus 4.5, Sonnet 4.5, and GPT-5 — were able to crack over 55% of exploits conducted this year after their knowledge training cutoff. Without any prior knowledge of how these hacks went down, these models identified vulnerabilities and developed working exploits that, in simulation, would have stolen $4.6M. (Anthropic made clear they conducted all tests in a controlled environment, never touching live blockchains.)

Let's backtrack and detail the path that's gotten them here.

Recently, Anthropic's made a concerted effort to identify and investigate AI-enabled cyber attacks. They published a report on what they believe is the first-ever AI-conducted cyber espionage operation, outlining how a Chinese state-linked group jailbroke Claude to run most of a large-scale espionage operation, with minimal human input. And, earlier this year, they published a report with Carnegie Mellon showing how AI can simplify the process of conducting cyberattacks — the message being that these tools are well-equipped and highly capable of succeeding at "malicious" tasks.

Continuing this investigation, they turned to smart contract exploits, running popular models against two groups of exploited contracts using SCONE-bench (Smart CONtract Exploitation benchmark) — a benchmark built by the Fellows for evaluating and simulating exploits:

405 contracts exploited between 2020 and March 2025 (a cutoff chosen since it was the last knowledge training event for these models)
34 contracts exploited after March 1, 2025 (meaning the LLMs weren't trained on post-mortem documents that could help them understand what happened)

Composed of exploits from the DeFiHackLabs repository, SCONE-bench served as both test set and test environment. Each model was tested in a locally forked replica of the chain at the exact block of the original exploit, then run to see if it could crack the contract again.

Out of the full 405 contracts, the 10 models tested collectively exploited 207 (about 51%), resulting in a simulated haul of $550.1M. But remember, these are contracts exploited pre-March 2025, meaning the models likely had access to post-mortems in their training data — making it somewhat expected they'd succeed on a good chunk.

But what's impressive — or concerning, depending on who you are — is the post-March 2025 performance. Opus 4.5, Sonnet 4.5, and GPT-5 cracked 19 out of 34 contracts (55.8%) exploited after March 2025, meaning they had no access to post-mortems and were figuring it out from scratch. Opus 4.5 alone was responsible for 17 of these.

To put the trajectory in perspective: one year ago, AI agents could only exploit about 2% of vulnerabilities in this same post-cutoff portion of the benchmark. Now they're at 55.8%. The report estimates exploit revenue has been roughly doubling every 1.3 months.

Hacking Forward

Anthropic didn't stop at retrospective analysis. To test whether these models could find genuinely novel vulnerabilities — not just recreate known hacks — they pointed both Sonnet 4.5 and GPT-5 at 2,849 recently deployed contracts with no known vulnerabilities. Both agents uncovered two novel zero-day exploits worth $3,694 in simulated revenue. GPT-5's total API cost for scanning all 2,849 contracts? Just $3,476 — meaning, at an average of $1.22 per contract scan, autonomous exploitation is now essentially break-even. As the report puts it, this demonstrates "as a proof-of-concept that profitable, real-world autonomous exploitation is technically feasible."

Anthropic's driving home that offense is becoming automated — and accurate — while defensive capability is not scaling at the same pace. Why? An imbalance of economic incentive, with the possibility of exploit serving as an enticing bounty for attackers willing to deploy these tools.

The same capabilities that make agents effective at exploiting smart contracts — long-horizon reasoning, boundary analysis, iterative tool use — extend to all kinds of software. As costs for AI fall and capabilities compound, the window between vulnerable contract deployment and exploitation will continue to shrink, leaving developers less time to detect and patch. Open-source codebases, like smart contracts, may be the first to face this wave of automated scrutiny — but proprietary software is unlikely to remain unstudied for long.

Closing Thoughts

Yet, there is a silver lining. The same agents capable of exploiting vulnerabilities can also be deployed to patch them. Nethermind, the smart contract and security developer shop, has been exploring this with AuditAgent — an AI audit tool they've integrated into their workflow as a "pair auditor" alongside human reviewers. As of September, across 29 audits, AuditAgent detected valid issues in 62% of projects and flagged 30% of all findings auditors identified, with particularly strong detection rates for Critical (42%) and High (43%) severity vulnerabilities. There are surely others doing similar work that I'm missing. But, as Anthropic states the defense doesn't come with the same direct "revenue" — as they call it — that exploitation does. Attackers who succeed walk away with stolen funds; defenders who succeed simply prevent a loss. Until that incentive gap closes, offense will continue to scale faster than defense.

Anthropic's hope, and mine as well, is that this report and others like it help update defenders' mental models to match reality, with a more concerted effort being made to design systems beyond bounties and monitoring to defend contracts. I’m not sure exactly what it would look like, but I can promise it involves onchain AI.

Bookmark on Bankless

Plus, other news this week...

🤖 AI Crypto

NEAR — Releases AI Cloud and "Private" Chat (always read the ToS)
Nous Research — Added a x402 endpoint for use with its Hermes 4 model
REI — launched Core Sandbox alpha developer environment, plus tooling, for building with its agentic framework
Virtuals — Transitions all machine-to-machine payments on ACP to x402 and publishes overview of accomplishments last three months
🔥 Zyfai — Begins ZK-verifiable agents launch and crosses 10K agent mark

📣 General News

📚 Reads

🔥 Jay Yu — "Academic-style" Survey of DeFi Investment Agents
Leonie Monigatti — Making Sense of Memory in AI Agents
Nader Dabit — Building Unruggable AI Agents: A Trustless Agent Stack

FRIEND & SPONSOR: MANTLE

Mantle Global Hackathon 2025: Mantle has entered a new phase in its roadmap – becoming the distribution layer to connect TradFi and onchain liquidity for RWAs where real-world finance flows. To accelerate this vision, Mantle launched the Mantle Global Hackathon 2025, running from October 22 to December 31, 2025, inviting developers, founders, and innovators to design, build, and deploy scalable RWA and DeFi products on Mantle.

Join the Hackathon

Not financial or tax advice. This newsletter is strictly educational and is not investment advice or a solicitation to buy or sell any assets or to make any financial decisions. This newsletter is not tax advice. Talk to your accountant. Do your own research.

Disclosure. From time-to-time I may add links in this newsletter to products I use. I may receive commission if you make a purchase through one of these links. Additionally, the Bankless writers hold crypto assets. See our investment disclosures here.

Published in Bankless

There’s a revolution in the world of money. Ethereum, Bitcoin, crypto, open finance, DeFi—a bankless revolution. And you want part of it.