Crypto Training
The Auditor Operating System: Repeatable Results in a Hostile Codebase
Auditing is a craft with artifacts: models, maps, experiments, and writeups. This post is an 'operating system' for doing web3 security work that scales beyond vibes.
The biggest difference between a beginner audit and a professional audit is not “knowledge of Solidity”.
It is output quality under pressure:
- short time
- huge codebase
- adversarial environment
- ambiguous specs
- politics
If your process is “read code and hope”, you will get random results.
If your process is “produce artifacts that constrain my search space”, you will get repeatable results.
This post is an auditor operating system: a set of work products that move you from uncertainty to defensible claims.
The four artifacts#
Good audits produce four artifacts, in order:
- Model: what is protected, what is trusted, what is assumed
- Map: entry points, roles, and external call graph
- Experiments: tests, fuzzing, invariants, PoCs
- Writeup: minimal reproduction, impact, and fix guidance
If you skip an artifact, you pay for it later.
Before you start: control your environment#
This sounds mundane, but it is the difference between “audit” and “random reading”.
I do these before I read code:
| Setup step | Why it matters |
|---|---|
| pin compiler / tool versions | avoids "works on my machine" findings |
| run the full test suite once | establishes baseline and exposes flaky tests |
| build a local graph of the repo | quickly find owners of state, roles, and core flows |
| locate config and deployment scripts | tells you how the system is actually used |
If the repo does not have tests, you still want a harness that can:
- deploy contracts
- call state-changing entry points
- simulate adversarial behavior (weird tokens, reentrancy)
Artifact 1: a threat model that fits on one screen#
I do not start by reading 10k lines.
I start by writing down the protocol in English.
Example format:
| Item | Answer |
|---|---|
| assets | what is valuable? (funds, shares, debt, admin power, liveness) |
| trust | what do we trust? (admins, oracles, tokens, keepers, sequencers) |
| entry points | which public functions change state? |
| invariants | what must always remain true? |
| failure mode | what does “unsafe” look like? (loss, freeze, inflation) |
This table is not “docs”.
It is a weapon: it tells you which lines of code can matter.
Threat model in practice: write down the attacker budget#
Most DeFi failures happen when a team assumes the attacker has a constraint they do not.
Examples of budgets attackers often have:
- flash liquidity (one-block capital)
- private ordering (bundles)
- contract-based accounts (batched calls)
- ability to revert and retry until favorable rounding
So I explicitly write:
- can the attacker borrow the capital for one block?
- can the attacker choose ordering?
- can the attacker loop an action cheaply?
If the answer is “yes”, your threat model must treat iteration as part of the attack surface.
Artifact 2: the external-call graph#
When you are lost, find the edges.
Every serious exploit crosses an edge:
- token transfer
- oracle read
- callback/hook
- low-level call
- delegatecall
A tiny but powerful habit:
For each edge, write what the external party can lie about.
Token edge lies:
- return value is nonsense
- balance changes outside the transfer
- callback reenters you
Oracle edge lies:
- value is manipulable for one block
- value is stale
- value is expensive to update in a way you did not model
Once you do this, “the scary parts” become obvious.
Mapping entry points without missing the weird ones#
There are two common misses:
- “indirect entry points” through callbacks (hooks, receivers, token callbacks)
- “privileged entry points” through upgradeability or role delegation
So when I map entry points, I categorize:
| Category | Examples | What I check |
|---|---|---|
| permissionless | swaps, deposits, mints | external calls inside accounting, rounding, DoS |
| role-gated | parameter updates | role escalation, reentrancy into admin paths |
| upgrade paths | UUPS, transparent proxy | initializer correctness, upgrade auth |
| callback paths | hooks, ERC-777, receivers | phase correctness, reentrancy, stuck states |
If your protocol uses hooks (Uniswap-style), the “callback paths” are first-class entry points.
Artifact 3: experiments as filters#
Humans are bad at exhaustive reasoning.
Use experiments to filter the search space.
Three experiments that pay off quickly:
1) invariants (even if you only write two)#
Invariants are constraints that survive sequences.
The best invariants in DeFi are usually:
- accounting conservation (no free mint)
- solvency (collateral >= debt under defined prices)
- liveness (users can withdraw under defined conditions)
2) fuzz the public API, not the helpers#
Attackers do not call your internal functions.
They call public entry points with weird sequences.
Your fuzzer should do the same.
3) adversarial mocks#
Replace:
- ERC-20 tokens with weird tokens
- oracles with adversarial oracles
- hooks with reentrant hooks
If you only test happy paths, you are testing your own beliefs.
Tooling as a workflow (not as a checkbox)#
Static analysis tools are good at finding:
- missing access control
- unchecked return values
- reentrancy hazards (shallow)
- dangerous low-level calls
They are not good at finding:
- protocol-specific invariants
- MEV and ordering dependence
- rounding policy mistakes
So I use tools as early filters, then I move into protocol reasoning.
Example loop:
- run a linter/static tool to find low-hanging fruit
- map the external call graph
- pick 2-3 invariants
- fuzz those invariants
- write PoCs for any invariant breaks
The key is that every tool output should feed an artifact.
If the tool output does not change your model/map/experiments, it is noise.
A small severity rubric that improves writeups#
Instead of arguing severity subjectively, I use a rubric:
| Severity | What breaks | Typical evidence |
|---|---|---|
| Critical | direct loss of funds or permanent loss of control | PoC drains or upgrade takeover |
| High | loss of funds under realistic conditions, or protocol insolvency | exploit path with plausible assumptions |
| Medium | bounded loss, partial DoS, griefing with constraints | costed griefing, limited damage |
| Low | best-practice gap, hard-to-exploit edge | missing checks, unsafe defaults |
| Informational | clarity, hardening, documentation | improves comprehension |
This makes reports readable by engineers and leadership.
How to write a finding that gets fixed#
Most “bad findings” fail because they are not actionable.
An actionable finding has:
- a minimal reproduction path
- the impact stated as a broken invariant
- a fix direction that preserves the protocol design goals
Here is a template that tends to work:
### [H-01] Hook reentrancy lets an attacker bypass fee accounting
**Impact**
An attacker can pay less fees than intended by reentering `afterSwap` through a token callback, breaking the invariant:
"feeGrowth increases by at least the protocol fee for every swap."
**Root cause**
The hook updates `feeGrowth` after calling `token.transfer`, allowing reentrancy into a path that reads stale state.
**Exploit sketch**
1. Swap with a callback token.
2. Token reenters into `afterSwap`.
3. Second execution observes stale `feeGrowth` and settles without the intended fee increment.
**Recommendation**
Update accounting before external calls, or enforce a phase-based reentrancy guard.
Prefer settling based on balance deltas rather than return values.
This style works because it connects:
- a broken invariant
- a concrete mechanism
- a fix that matches the threat model
A note on modern account behavior (and why it matters for audits)#
The line between “EOA” and “contract” behavior keeps blurring:
- account abstraction
- routers batching calls
- signature-based authorization
- delegated behaviors (emerging proposals)
Practically, this means:
- you cannot assume
msg.senderis a user - you cannot assume “one tx = one action”
This is why invariants and sequence-based reasoning matter.
One of the easiest audit mistakes is to miss an exploit path that requires calling the same function twice in one transaction through a router.
The "variant analysis" mindset (the fastest way to scale)#
When you find one bug, you should assume there are variants.
Variant analysis is simply:
- identify the pattern (e.g., "external call before accounting update")
- search for the pattern across the codebase
- test each match under the same attacker model
This is how you avoid the worst audit failure mode:
report a single instance while missing 5 more copies of the same bug.
What I optimize for as an auditor#
There is a temptation to optimize for:
- number of findings
- number of lines read
- tool outputs
I optimize for:
- invariants captured
- attack surfaces mapped
- high-impact paths tested adversarially
If you do this well, the “finding count” becomes a side effect.
A realistic audit cadence (how I spend time)#
Every engagement is different, but a cadence like this prevents you from spending 80% of time in the wrong place.
Day 1: establish truth#
- run tests
- identify deployment config
- write the one-screen threat model
- map entry points and roles
Day 2: map edges and scary paths#
- draw the external-call graph
- find the price-critical paths (oracle reads)
- find callback paths (hooks, receivers)
- pick 2-3 invariants
Day 3+: break things on purpose#
- sequence fuzz the permissionless entry points
- write one or two adversarial mocks (weird token, reentrant hook)
- build minimal PoCs for anything that looks like value creation or liveness failure
The point is not the day numbers. The point is that you move from:
- reading
to:
- experiments
as quickly as possible.
Communication is part of security work#
The fastest way to waste an audit is to deliver a report that the team cannot act on.
Two habits help:
- ask clarifying questions early when assumptions matter (oracle choice, upgrade authority, pause powers)
- share one high-risk hypothesis mid-audit so the team can confirm or deny the design intent
This is not “being nice”. It is reducing uncertainty.
If you discover the team intended an invariant that the code does not enforce, that is often the highest value finding you can deliver.
Upgradeability and admin risk: the boring part that breaks protocols#
Even if a protocol is “mathematically correct”, upgradeability and admin actions can break it.
So I always answer these questions explicitly:
| Question | Why it matters |
|---|---|
| Who can upgrade? | a single key can be a single point of failure |
| Is there a timelock? | converts instant takeover into detectable takeover |
| Are initializers protected? | uninitialized implementations are a recurring incident class |
| Are roles revocable and enumerable? | role mistakes are hard to recover from |
If the project says “we are not upgradeable”, I still check:
- emergency pause powers
- parameter setters
- external module registries
Those are upgrades in disguise.
Notes that scale: write what you would need to prove it later#
When you are mid-audit, it is easy to write vague notes like:
- “looks safe”
- “probably ok”
Those notes are useless.
Write notes as claims you could defend:
- “function X is permissionless and makes an external call to token Y before updating Z”
- “oracle price is read from pool P spot price and can be moved with one-block liquidity”
- “rounding differs between mint path and redeem path”
This style makes it easier to:
- turn notes into tests
- turn tests into findings
Audit taste: calibrating what matters#
If you are training your taste (what matters, what is noise), this is a good calibration read:
And if you want a structured “audit brain” to compare against:
A template you can reuse#
When I open a new repo, I create a single file with these headings:
# Threat model
## Assets
## Trust assumptions
## Invariants
## Entry points (state changing)
## External calls (edges)
# Attack surfaces
## MEV
## Oracles
## Reentrancy / callbacks
## Rounding / precision
# Experiments
## Fuzz targets
## Invariants
## PoCs
# Findings
The point is not the headings. The point is that you can always answer: “where am I in the audit”.
Further reading#
- Auditor discipline and what to optimize for: https://www.zellic.io/blog/the-auditooor-grindset/
- Audit reporting quality: https://www.dylandavis.net/blog/2022/06/12/the-ideal-audit-report/
- Uniswap hook threat modeling notes: https://docs.openzeppelin.com/uniswap-hooks