Key Insights:
- EVMbench evaluates AI agents on detecting, patching, and exploiting high-severity smart contract vulnerabilities.
- GPT-5.3-Codex scored 72.2% in exploit tests, while coverage for detection remains incomplete.
- The tool provides a controlled environment for developers to test smart contracts without risking real funds.

OpenAI and crypto investor Paradigm have released EVMbench, a benchmark that measures how AI agents handle vulnerabilities in EVM-based systems, including Ethereum. The tool uses 120 high-severity vulnerabilities collected from 40 audits and includes tests on networks like the Tempo blockchain.
EVMbench allows AI agents to operate in different modes, such as identifying risks, updating contracts to reduce issues, and simulating exploit attempts in a controlled blockchain environment. OpenAI said,
“as AI agents improve at reading, writing, and executing code, it becomes increasingly important to measure their capabilities in economically meaningful environments.”
AI Performance in Initial Tests
Early tests show varied results across models. GPT-5.3-Codex scored 72.2% in the exploit mode, while GPT-5 scored 31.9%. OpenAI noted that “coverage for vulnerability detection and patching remains incomplete,” showing that some tasks still require more development.
The benchmark is built to test AI in realistic conditions without using live funds. OpenAI added that the system can simulate “end-to-end fund-draining attacks against deployed contracts,” offering a safe way to evaluate risks.
Recent DeFi Security Incidents
The launch comes after several recent attacks on DeFi protocols. Moonwell, a lending platform, suffered an exploit this month linked to vulnerabilities in its contracts. CrossCurve, a cross-chain liquidity platform, also experienced a breach, losing around $3 million across multiple networks.
These events highlight the ongoing challenges in securing smart contracts. EVMbench provides a controlled environment for testing AI agents’ ability to find and fix vulnerabilities before deployment in live systems.
Controlled Testing for Developers
The benchmark draws on audit data from open competitions and Tempo’s security reviews. OpenAI emphasized that AI agents could help secure assets but noted that “coverage for vulnerability detection and patching remains incomplete.”
EVMbench is designed to help developers evaluate smart contracts in a safe setting. It provides a way to measure potential risks and test improvements before contracts are used in real blockchain environments.
DISCLAIMER: The information on this website is provided as general market commentary and does not constitute investment advice. We encourage you to do your own research before investing.









