OpenAI and Paradigm Introduce 'EVMbench' for AI Agent Benchmarking
OpenAI and Paradigm today introduced EVMbench, a benchmark evaluation that measures how AI agents detect, patch, and exploit high-severity Ethereum Virtual Machine (EVM) smart contract vulnerabilities.
What's the Scoop?
- New Benchmark: EVMbench draws on 120 curated vulnerabilities from 40 audits (most sourced from open code audit competitions) and includes several vulnerability scenarios inspired by the security auditing process for the Paradigm-backed Tempo blockchain.
- Numerical Score: EVMbench assigns agents with a percentage-based performance score that is intended to encapsulate how well they can audit smart contracts, patch vulnerabilities while preserving functionality, and exploit vulnerable contracts.
- Limitations: While the vulnerabilities tested by EVMbench are realistic and high-severity, the benchmark's developer disclaims that the text, "does not represent the full difficulty of real-world smart contract security."
What's the Take?
EVMbench supplies the crypto industry with a standardized way to measure how well AI can reason about real-world smart contract risk. While no test is perfect, this one establishes a measurable baseline that can be used to objectively evaluate emerging crypto-enabled AI agents.
Introducing EVMbench—a new benchmark that measures how well AI agents can detect, exploit, and patch high-severity smart contract vulnerabilities. https://t.co/op5zufgAGH
— OpenAI (@OpenAI) February 18, 2026