All articles
AI Auditor BuilderJune 3, 20269 min read

The Multi-Mindset Pattern: Attacker, Accountant, Spec Auditor, Edge-Case Hunter

Running 4 specialized AI auditor roles in parallel surfaces bug classes that single-pass auditors miss. How to structure the prompts and synthesize the findings.

By Carlos (Bloqarl)

TL;DR

  • The multi-mindset detection pattern runs four specialized AI roles against the same code in parallel: Attacker, Accountant, Spec Auditor, and Edge-Case Hunter.
  • Each role has a distinct prompt, a distinct mental model, and produces a distinct finding set.
  • A synthesis step deduplicates overlap and surfaces findings unique to each role.
  • The pattern beats single-pass auditors because each role catches different bug classes. Attacker finds value-extraction paths. Accountant finds invariant breaks. Spec Auditor finds deviations from documented behavior. Edge-Case Hunter finds boundary failures.
  • Cost: ~4x the tokens of a single-pass scan. Benefit: catches bugs that monolithic prompts demonstrably miss.

Why this matters

The default AI auditing approach is one prompt, one pass, one finding list. The prompt tries to do everything: find exploits, check invariants, compare against docs, probe edge cases. The model context fills with conflicting instructions, and the output regresses to the most generic kind of finding the prompt explicitly mentioned.

If you've used AI security tools and seen them report "consider checking for reentrancy" on every contract regardless of whether reentrancy is plausible, you've seen the failure mode. The model defaults to safe, generic flags because the prompt asked for too much at once.

Multi-mindset specialization fixes this by giving each role exactly one job. The Attacker doesn't worry about documentation. The Spec Auditor doesn't try to find novel exploits. The Accountant doesn't enumerate edge cases. Each prompt is tight; each output is on-task; each role catches bugs the others don't.

If you've evaluated AI auditors and found their precision plateaus around 30-50%, the multi-mindset pattern is one of the highest-yield architectural changes you can make. It's covered in step 4 of Zealynx Academy's AI Auditor Builder and forms the detection layer of Krait's 100% precision pipeline.

The four roles

1. The Attacker

Mindset: "I'm trying to extract value from this code. What's exploitable?"

Prompt focus: Look for value-extraction paths. For each public/external function, ask: can a caller use it to gain something they shouldn't? Common attack patterns include reentrancy, oracle manipulation, donation attacks, signature replay, access-control bypass, governance hijack.

Strengths: Finds direct exploits. Catches the bugs you'd actually lose money to. Maps cleanly to bug bounty payouts.

Weaknesses: Misses bugs that don't look exploitable in isolation. A protocol fee misallocation that quietly under-pays the treasury isn't a "value-extraction by an attacker" finding, but it's still a bug.

Reference exploits to anchor: DAO ($60M reentrancy 2016), Cream ($130M re-entry 2021), Curve ($62M Vyper bug 2023), Mango ($114M oracle), Bonq ($120M oracle). These are the calibration anchors covered in the exploit-context article.

2. The Accountant

Mindset: "I'm tracking every value through every state change. Do the books balance?"

Prompt focus: For each state-modifying function, identify the invariants that should hold before and after. Then trace whether the function actually preserves them. Common invariants include conservation of value (tokens in equal tokens out plus fees), monotonicity (a counter must only increase), accounting consistency (sum of per-account balances equals total supply).

Strengths: Finds bugs that aren't directly exploitable but corrupt state. Missed _mintFee() ordering bugs (we covered this in the Uniswap V2 _mintFee article). Donation attacks where the exploit uses balance vs reserve discrepancies. Off-by-one errors in accumulator math.

Weaknesses: May not connect a found accounting drift to a usable exploit. Reports "fee recipient accumulates 0.1% less than expected" without recognizing this drains the protocol over time.

The Accountant's most common find: missed state updates in conditional paths. We covered this class in the 33% missed-state-updates article. It's the largest single bug category in the Shadow Arena dataset.

3. The Spec Auditor

Mindset: "I'm comparing what the code does to what the docs say it should do. Where are the deviations?"

Prompt focus: Read the protocol's docs (whitepaper, README, NatSpec comments). Build a mental model of the intended behavior. Then read the code and flag every deviation, intentional or not. The deviations are the findings.

Strengths: Finds bugs no other role catches. A rate calculation that reads correctly to the Attacker (no exploit) and balances to the Accountant (invariants hold) might still violate the spec if the docs say "rates compound monthly" and the code compounds daily.

Weaknesses: Only as good as the spec. If docs are sparse, missing, or wrong, the Spec Auditor has nothing to compare against. Modern protocols often under-document, which limits this role.

Where it shines: Forks. A fork inherits the parent protocol's spec but may modify implementation. The Spec Auditor catches modifications that diverge from documented behavior, which is a common source of bugs in fork audits.

4. The Edge-Case Hunter

Mindset: "What inputs, states, or sequences trigger boundary conditions? Which ones break?"

Prompt focus: For each function, enumerate the boundary inputs: zero, max value, empty array, single-element array, max-length array, identical sender/receiver, address(0), self-reference. For each contract state, enumerate the boundary states: just-deployed (no liquidity), nearly-empty (one borrower), max-utilization, paused, mid-upgrade. Probe each combination.

Strengths: Catches bugs that only fire on specific values. Integer overflow at type-max. Division-by-zero on empty arrays. Off-by-one in loop bounds. The Uniswap V2 1000-wei minimum liquidity lock that we covered in its own article is exactly the kind of bug Edge-Case Hunter finds.

Weaknesses: Can over-generate findings. Every function has many boundary conditions; enumerating them all without prioritization produces noise. Needs the synthesis step to filter.

Pattern: The Edge-Case Hunter is the role most likely to generate false positives. That's fine; the kill-gate filtering downstream (covered in the Krait kill-gates article) catches them.

How they run in parallel

The four roles run as independent prompts against the same source code. Each gets:

  • The full protocol source (or a focused subset for large codebases).
  • A role-specific system prompt that defines its mindset.
  • A role-specific output format (typically structured JSON with finding metadata).

In Zealynx Academy's AI Auditor Builder, this is implemented as four parallel Task invocations, each with a different prompt, all returning before the synthesis step starts. Total wall-clock cost is roughly the same as one role's cost (parallelism), with token cost roughly 4x.

For larger codebases (more than ~500 lines), each role runs the source in chunks of 200-300 lines with overlapping windows. The role's system prompt remains constant; only the chunk changes.

Synthesis: deduplication and unique-find surfacing

Once all four roles finish, the synthesis step combines their outputs:

  1. Group by location. Findings that touch the same function or state variable get clustered.
  2. Deduplicate within cluster. If three roles independently flag the same mint() function as having a reentrancy risk, that's one finding (with three votes).
  3. Surface unique finds. Findings that only one role flagged go through with a "single-role-source" tag, which gets weighted in the kill-gate stage.
  4. Cross-validate. For each finding, ask all four roles "is this a real bug?". The role that originally found it sometimes drops it under cross-examination. Findings that survive cross-validation are stronger.

The synthesis step is itself an AI prompt. It runs once per audit, takes the four role outputs as input, and produces a deduplicated finding list as output.

Token cost vs quality trade-off

A single-pass scan of a 1,000-line contract typically uses 5,000-10,000 tokens (input prompt + source + output). The multi-mindset pattern uses ~4x that, plus synthesis (~25,000-40,000 tokens total).

Cost on Claude Opus pricing: roughly $0.10-0.30 per multi-mindset scan of a single contract. Cost on smaller models is proportionally less.

Quality benefit: in benchmark testing on the 118 Code4rena finding set, multi-mindset detection produces ~1.5-2x the recall of a single-pass scanner with the same source-context length. Precision is comparable; the gain is purely in recall.

For most teams the math is favorable: ~$0.20 to scan a contract with 2x recall is cheaper than missing a bug.

Where this pattern came from

The multi-mindset pattern is associated with several open-source AI security tools. Pashov Auditor uses a role-based detection system. Several internal audit firms (including Zealynx) have variants. The general "specialize-then-synthesize" pattern predates AI auditing; it mirrors how a human audit team divides work across senior auditors with different specialties.

The novelty in the AI version is that you can run all four "auditors" in parallel for ~$0.20 per scan, which is far cheaper than coordinating four humans on the same review.

Common pitfalls

Pitfall 1: prompts that don't actually specialize

A common mistake is to write four nearly-identical prompts with slightly different titles ("you are the attacker", "you are the accountant"), without giving each prompt a meaningfully different mindset. The roles end up overlapping almost completely, and the multi-mindset pattern collapses back to a single-pass scan with 4x the cost.

The fix: write each prompt as if the role had nothing else to do. The Attacker prompt should say nothing about invariants. The Accountant prompt should say nothing about exploits.

Pitfall 2: synthesis that loses unique finds

If the synthesis step deduplicates too aggressively (treating "reported by any role" as equivalent to "reported by all roles"), unique finds from a single role get washed out.

The fix: weight findings by role agreement, but don't drop single-role findings entirely. Pass them through to the kill-gate stage with a "needs cross-validation" tag.

Pitfall 3: skipping the synthesis altogether

Some implementations report all four role outputs side-by-side, leaving the user to deduplicate. This produces the same problem the synthesis solves (reviewer fatigue, duplicate noise) but pushes the cost onto the human.

The fix: always include synthesis. The token cost is small relative to the role outputs; the quality lift is significant.

Related questions

Can I use fewer roles? Yes. Two-role variants (Attacker + Accountant) capture most of the lift at half the cost. Three-role variants drop the Spec Auditor when docs are sparse. Four-role is the canonical setup; the others are reasonable degraded modes.

Can I use more roles? Possible but yields diminish. Adding "Gas Auditor", "MEV Auditor", "Front-running Hunter" produces more findings but most overlap with the four canonical roles. The added cost rarely justifies the gain.

Does this work for non-Solidity smart contracts? Yes, with role-prompt adjustments. For Solana programs, the Edge-Case Hunter focuses on Anchor's account validation; the Accountant focuses on lamport accounting; the Attacker focuses on PDA manipulation and signer privilege.

Does this work for off-chain code reviews? Sort of. The role specialization concept transfers, but the specific prompts (Attacker, Accountant, Spec Auditor, Edge-Case Hunter) are tuned for smart contracts. For traditional software, similar role decomposition exists in adversarial code review literature.

How does this interact with kill gates? Multi-mindset is the detection layer; kill gates are the verification layer. Detection generates candidates; kill gates filter them. Both pieces are necessary; either alone is much weaker than the combination.

Where to see this in Academy

Step 4 of the AI Auditor Builder walks through implementing multi-mindset detection in your own auditor. The reference implementation includes:

  • Four role-specific prompt templates with worked examples.
  • A parallel-execution harness for running them as Claude Code skill invocations.
  • A synthesis prompt that handles deduplication and unique-find tagging.
  • Test fixtures (a small Solidity codebase with known bugs across all four role specialties) for validating your implementation.

After step 4, you wire the detection layer into the kill-gate verification (step 7), and submit the resulting auditor to the AI Auditor Arena benchmark for scoring.

The pattern is standard in modern AI auditing. Implementing it once in your own stack makes every subsequent auditor you build significantly more capable.

Tagged

AI AuditingSmart Contract SecurityMulti-Agent