Using LLMs to identify vulnerabilities in smart contracts

September 28, 2023

LLMs can be powerful tools in the arsenal of security researchers.

Using LLMs to identify vulnerabilities in smart contracts



The rapid growth of Web3 has unfortunately been accompanied by a concerning trend: over $4.7 billion lost to hacks since 2020. This staggering figure not only represents significant financial losses but also damages the reputation and credibility of the entire industry. While comprehensive security audits could prevent many of these issues, they remain prohibitively expensive—often costing up to $100,000 per audit. Even more concerning, even audited protocols can still get breached as vulnerabilities slip through initial reviews or code changes introduce new weaknesses.

Web3's Unique Security Challenges

Unlike traditional web applications, Web3 poses distinct security risks for several critical reasons:

  • Immutability: Smart contracts cannot be easily patched once deployed, allowing vulnerabilities to persist indefinitely.

  • Public exposure: All code is visible on-chain, giving attackers unlimited time to study potential weaknesses.

  • Financial incentives: Cryptocurrencies create direct financial motivation for hackers, who can steal funds rather than just targeting user data.
  • With billions already lost and limited resources for comprehensive auditing, the industry urgently needs better tools to analyze contracts, generate test cases, and assist overburdened security researchers.

    How LLMs Can Help

    While large language models (LLMs) can't independently identify novel exploits yet, they can significantly enhance the security workflow in several ways:

    Parsing Security Tool Outputs


    Static analyzers, fuzzers, and test generators produce massive outputs riddled with false positives. LLMs can help surface potentially problematic findings for human review. With context windows now up to 200,000 tokens on models like Claude 2, they can analyze entire codebases and potentially catch subtle bugs from distant interactions.

    Accelerating Test Generation


    LLMs can rapidly generate unit tests to achieve higher coverage and find edge cases, complementing existing fuzzing techniques. Fine-tuning on domain data can further improve performance, leveraging resources like Smart Contract VulnDB with its 26,000 examples of vulnerabilities.

    Summarizing Findings


    LLMs can digest results from multiple tools and produce integrated summaries, reducing manual reporting needs.

    Modular Detection Models


    Recent multi-task learning models show promise in detecting and classifying different bug types. Open models like Llama 2 enable community-led efforts to fine-tune specialized detectors on web3 data.

    Introducing Robocop: An LLM-Powered Auditing Tool

    To demonstrate the potential of LLMs in auditing, I built a proof-of-concept tool called Robocop using Claude 2. It analyzes contracts, surfaces suspicious areas, generates unit tests, and summarizes its findings. While basic, this prototype achieved promising results detecting seeded vulnerabilities, showcasing how LLMs could enhance audits today.

    How Robocop Works

    Robocop has several components:

  • UI: Built with Streamlit, providing an intuitive interface to collect parameters and display outputs

  • LLM Integration: Leverages Langchain to easily integrate with Claude 2

  • Code Analysis: Takes in Solidity files and generates an "audit report" for selected vulnerability types
  • The workflow follows these steps:

  • Load code from GitHub: Crawls through repositories to select all Solidity files

  • User selects targets: Users choose which contracts and vulnerability types to analyze

  • Prompt engineering: Constructs detailed prompts combining hard-coded instructions and dynamically generated inputs

  • Verification: Uses a separate "discriminator" LLM to verify findings and reduce false positives

  • Report generation: Outputs structured reports with markdown formatting
  • Results and Limitations

    Robocop shows promise, but false positives remain a challenge. In testing across 50 bugs from the Web3Bugs repository, it successfully identified 9 vulnerabilities. While the results are preliminary, they suggest LLMs can be valuable components in security workflows.

    The Competitive Landscape

    Several companies are already building in this space:

  • AuditWizard: An integrated auditing platform that streamlines workflow with visualization, manual review, and ML-powered insights

  • Metatrust: Uses a hybrid approach combining LLMs with program analysis for automated vulnerability detection

  • Olympix: Embeds LLM guidance directly into developer workflows by scanning code as it's written

  • Salus: Building LLM-powered tools for automated pre-audit analysis and end-to-end security reviews
  • Looking Forward

    The goal isn't to create a magical tool that produces perfect audit reports without human involvement. Security audits are high-stakes and require a human-in-the-loop approach to verify outputs, assess impact, and refine mitigation strategies.

    However, LLMs present an exciting opportunity to amplify human efforts in securing web3 protocols. By integrating these tools into development workflows, we can make security analysis a continuous process rather than a one-time event, potentially preventing many of the costly exploits we've seen in recent years.

    As hackers continue finding new ways to exploit contracts, the industry urgently needs more efficient tools to secure protocols. LLMs are poised to accelerate audits and unit test generation, reducing risks for users and making web3 safer for everyone.