Monitoring Beats the First Audit: How Ronin Lost $624M Undetected for Six Days
Ronin Bridge was exploited and the loss went undetected for six days. With basic monitoring on cross-chain transfer volumes against a 24-hour rolling baseline, the alert fires within 90 minutes. Here's the budget allocation that follows.
TL;DR
- The highest-ROI security investment for a new Web3 protocol is not the first audit. It is production monitoring plus a pause mechanism.
- Ronin Bridge lost $624M in March 2022. The exploit went undetected for six days. By the time anyone noticed, the funds were already laundered.
- A basic monitoring stack with a rolling baseline on cross-chain transfer volumes would have alerted the security team within 90 minutes.
- A pause function controlled by a security multisig would have capped the loss at tens of millions, not hundreds.
- For a protocol with under $1M TVL, allocate $20-50K to security in year one. About 30% of that should go to monitoring infrastructure. Audits are still important, but they are the third-best dollar.
Why this matters
Founders default to spending their first security dollar on the audit. It feels concrete: there is a deliverable, an invoice, a final report. Monitoring feels abstract: dashboards, alerts, on-call rotations. The audit is the legible expense; monitoring is operational overhead.
The pattern of the worst exploits in Web3 history says the priority is upside down. Ronin, BadgerDAO, Wormhole, and the bulk of bridge and yield aggregator hacks were all detected too late, sometimes after attackers had already moved funds through Tornado Cash. By that point, the audit had done its job (or failed to). The exploit was happening anyway, and the question was whether anyone would see it before it finished.
The protocols that survived their early years built monitoring before scale. The ones that did not are case studies.
The Ronin case
Ronin Bridge was the cross-chain bridge for Axie Infinity. In March 2022, a North Korean threat actor (later attributed to Lazarus by the FBI) compromised five of the nine validator keys via spear phishing of Sky Mavis engineers and a backdoor through a job-application document. With five keys, the attacker had majority of the bridge's multisig and could authorize arbitrary withdrawals.
The attacker drained 173,600 ETH and 25.5M USDC, totaling $624M at the time, in two transactions on March 23, 2022. The team did not notice.
For six days, the funds sat in attacker-controlled wallets, slowly being broken up and bounced through mixers. The exploit was discovered on March 29 only because a user complained to Sky Mavis support that they could not withdraw 5,000 ETH. The team checked the bridge balance, found it empty, and pieced together what had happened.
By that point, much of the stolen value was already through Tornado Cash and unrecoverable. The eventual recovery (~$30M, paid as restitution by US authorities after the broader Lazarus investigation) was a small fraction of the loss.
Six days. The exploit happened, the money walked, and nobody noticed because nobody had set up the alert.
What basic monitoring would have caught
Cross-chain bridges have a simple invariant: the assets locked on one side equal the assets minted on the other. Either side trending sharply outside its normal volume profile is a Tier 1 alarm.
The Ronin attack drained 173,600 ETH in a single transaction. Ronin's typical daily withdrawal volume in early 2022 was a few thousand ETH. A 50x deviation from rolling baseline is the kind of thing a 30-line monitoring script catches in seconds.
Concretely, a basic monitor would track:
- Total ETH withdrawn from the bridge in the last 24 hours
- Compared to the rolling 30-day median
- Alert if current 24h is more than 5x the median, page on-call if more than 10x
- Alert if any single transaction exceeds 5% of bridge TVL
Any one of those would have fired within minutes of the attack transaction landing on chain. The team would have known within the hour.
The compounding effect of a pause function
Detection alone is not enough. Once you know an attack is in progress, you need a way to stop it.
If Ronin had a pause function controlled by a 3-of-5 security multisig (separate from the validator multisig that the attacker compromised), the security team could have paused withdrawals within 90 minutes of detection. The attacker had already drained the first transaction at that point, but the second transaction (which was a follow-up consolidation) might have been blocked.
More importantly, the pause buys time. Even if the attacker has already moved funds off the bridge, pausing prevents secondary attacks (drain attempts on related contracts, governance manipulation, or follow-on exploits in connected protocols). Bridge exploits often cascade across the ecosystem; a pause stops the cascade.
The combined detection-plus-pause budget at Ronin's scale would have been a small fraction of the $624M lost. Even a generous estimate of $500K/year for the monitoring infrastructure plus the security ops team to run it would have been a 1,000x ROI.
What the monitoring stack actually contains
For founders building this from scratch, a baseline monitoring stack has three layers.
Layer 1: On-chain event indexing. Tools like Tenderly, OpenZeppelin Defender, Chainlink Automation, or a self-hosted indexer that watches your contract events in near-real-time. The output is a stream of events (deposits, withdrawals, admin calls, parameter changes). Cost: $0 to a few hundred dollars a month depending on volume.
Layer 2: Anomaly detection. Logic on top of the event stream that compares current activity to historical baselines. A 30-line Python script with rolling-window statistics works for most protocols. Specialized tools (Forta, Hexagate, OpenZeppelin Defender's monitor module) handle this with pre-built detector libraries.
Layer 3: Alerting and response. PagerDuty, Telegram, or Slack integration that wakes someone up when Layer 2 fires. Plus an on-call rotation that owns the alert. Plus a runbook that says what to do when the alert fires.
The full stack for a small protocol can be set up in a week and run for under $1,000/month including the on-call cost (assuming you handle on-call internally, which you should for security alerts). For a protocol that just spent $50K on an audit, this is the next $20K of security spend.
The budget allocation
eMBA Module 3 Lesson 2 documents the security budget by TVL tier. The summary:
| TVL | Total annual security budget | Audit share | Monitoring share | Bounty share | Other |
|---|---|---|---|---|---|
| <$1M | $20-50K | 40-50% | 25-30% | 10-15% | 10-25% |
| $1M-$10M | $50-200K | 35-45% | 20-25% | 15-20% | 15-25% |
| $10M-$100M | $200K-$1M | 25-35% | 15-25% | 25-35% | 15-25% |
| $100M+ | $500K-$2M+ | 20-30% | 15-25% | 30-50% | 10-20% |
A few things to notice:
- Audit share decreases as TVL grows. After two or three audits, marginal audits catch fewer findings.
- Monitoring share stays roughly flat as a percentage but grows in absolute dollars as the protocol scales.
- Bounty share grows substantially. At $100M+ TVL, the bug bounty (Immunefi at $1-10M scopes is typical) becomes the largest single line item. Aave's V4 review allocated $1.5M for 345 days of audit time; Uniswap V4 allocated $15.5M just to bug bounty.
- "Other" includes formal verification, threat modeling sessions, incident response retainers, and operational security tooling (HSMs, signing infrastructure).
How to start cheaply
If you cannot afford the full stack, here is a $0 starter version:
- Free tier of Tenderly or OpenZeppelin Defender. Both offer event monitoring with limited free quotas. Sufficient for low-volume protocols.
- A Telegram bot. Sends alerts to a private channel where you and your co-founder are admins. Replace with PagerDuty when you have an actual on-call rotation.
- One bash script running on a VPS that polls your contract balances every 60 seconds and pages if anything moves more than X% in 24h. Crude but works.
- A pause function in your contracts controlled by a 2-of-3 multisig where you and two trusted advisors hold keys. The keys live on hardware wallets, never on a phone or laptop.
That is a $20/month setup. It would not have stopped the Lazarus group from compromising Ronin's keys, but it would have alerted Sky Mavis within an hour of the drain. The six days of unnoticed loss is the failure mode that monitoring prevents.
Related questions
What if we cannot afford monitoring at all? Then your protocol should not custody more value than you can afford to lose. The relationship is: TVL >>> security infrastructure budget is a recipe for the next Ronin. If TVL is $5M and your security budget is $0, you are sitting on a target.
Should we use a third-party monitoring service or build in-house? Use the third-party service first. OpenZeppelin Defender, Tenderly, Forta, and Hexagate all give you 80% of the value with 5% of the engineering effort. Build in-house only when you have a specific detection need that no third party covers.
How does monitoring relate to the Audit Paradox? Audits review the static system. Monitoring covers the dynamic behavior. Most exploits in 2024 happened despite audits because the exploit lived in the dynamic execution, not in the static code. Monitoring is what closes the gap.
What about reactive incident response? Detection plus pause buys time. The IR playbook says what to do with that time. Module 3 Lesson 8 of eMBA covers IR specifically. The headline stat there: 55% of Web3 teams have no incident response playbook or have never practiced one. If that is you, it is the next thing to fix after monitoring.
Where to start
eMBA Module 3 Lesson 7 (Post-Deployment Monitoring) walks through the full monitoring stack, with vendor comparisons, a sample bash script, the pause-multisig design, and the on-call rotation playbook.
If you only do one thing after reading this: spin up a monitoring instance and a Telegram alert today. Even crude monitoring is dramatically better than the Ronin baseline of "we noticed when a user complained six days later."
Tagged