Using LLMs to identify vulnerabilities in smart contracts
LLMs can be powerful tools in the arsenal of security researchers.
Using LLMs to identify vulnerabilities in smart contracts
The rapid growth of Web3 has unfortunately been accompanied by a concerning trend: over $4.7 billion lost to hacks since 2020. This staggering figure not only represents significant financial losses but also damages the reputation and credibility of the entire industry. While comprehensive security audits could prevent many of these issues, they remain prohibitively expensive—often costing up to $100,000 per audit. Even more concerning, even audited protocols can still get breached as vulnerabilities slip through initial reviews or code changes introduce new weaknesses.
Web3's Unique Security Challenges
Unlike traditional web applications, Web3 poses distinct security risks for several critical reasons:
With billions already lost and limited resources for comprehensive auditing, the industry urgently needs better tools to analyze contracts, generate test cases, and assist overburdened security researchers.
How LLMs Can Help
While large language models (LLMs) can't independently identify novel exploits yet, they can significantly enhance the security workflow in several ways:
Parsing Security Tool Outputs
Static analyzers, fuzzers, and test generators produce massive outputs riddled with false positives. LLMs can help surface potentially problematic findings for human review. With context windows now up to 200,000 tokens on models like Claude 2, they can analyze entire codebases and potentially catch subtle bugs from distant interactions.
Accelerating Test Generation
LLMs can rapidly generate unit tests to achieve higher coverage and find edge cases, complementing existing fuzzing techniques. Fine-tuning on domain data can further improve performance, leveraging resources like Smart Contract VulnDB with its 26,000 examples of vulnerabilities.
Summarizing Findings
LLMs can digest results from multiple tools and produce integrated summaries, reducing manual reporting needs.
Modular Detection Models
Recent multi-task learning models show promise in detecting and classifying different bug types. Open models like Llama 2 enable community-led efforts to fine-tune specialized detectors on web3 data.
Introducing Robocop: An LLM-Powered Auditing Tool
To demonstrate the potential of LLMs in auditing, I built a proof-of-concept tool called Robocop using Claude 2. It analyzes contracts, surfaces suspicious areas, generates unit tests, and summarizes its findings. While basic, this prototype achieved promising results detecting seeded vulnerabilities, showcasing how LLMs could enhance audits today.
How Robocop Works
Robocop has several components:
The workflow follows these steps:
Results and Limitations
Robocop shows promise, but false positives remain a challenge. In testing across 50 bugs from the Web3Bugs repository, it successfully identified 9 vulnerabilities. While the results are preliminary, they suggest LLMs can be valuable components in security workflows.
The Competitive Landscape
Several companies are already building in this space:
Looking Forward
The goal isn't to create a magical tool that produces perfect audit reports without human involvement. Security audits are high-stakes and require a human-in-the-loop approach to verify outputs, assess impact, and refine mitigation strategies.
However, LLMs present an exciting opportunity to amplify human efforts in securing web3 protocols. By integrating these tools into development workflows, we can make security analysis a continuous process rather than a one-time event, potentially preventing many of the costly exploits we've seen in recent years.
As hackers continue finding new ways to exploit contracts, the industry urgently needs more efficient tools to secure protocols. LLMs are poised to accelerate audits and unit test generation, reducing risks for users and making web3 safer for everyone.