Crypto Training

The Auditor Operating System: Repeatable Results in a Hostile Codebase

Auditing is a craft with artifacts: models, maps, experiments, and writeups. This post is an 'operating system' for doing web3 security work that scales beyond vibes.

Crypto Training2026-02-1411 min read

Auditor OS diagram

The biggest difference between a beginner audit and a professional audit is not “knowledge of Solidity”.

It is output quality under pressure:

  • short time
  • huge codebase
  • adversarial environment
  • ambiguous specs
  • politics

If your process is “read code and hope”, you will get random results.

If your process is “produce artifacts that constrain my search space”, you will get repeatable results.

This post is an auditor operating system: a set of work products that move you from uncertainty to defensible claims.

The four artifacts#

Good audits produce four artifacts, in order:

  1. Model: what is protected, what is trusted, what is assumed
  2. Map: entry points, roles, and external call graph
  3. Experiments: tests, fuzzing, invariants, PoCs
  4. Writeup: minimal reproduction, impact, and fix guidance

If you skip an artifact, you pay for it later.

flowchart LR M[Model<br/>assets, assumptions, invariants] --> A[Map<br/>entry points, roles, call graph] A --> E[Experiments<br/>fuzz, invariants, PoCs] E --> W[Writeup<br/>impact, repro, fix guidance]

Before you start: control your environment#

This sounds mundane, but it is the difference between “audit” and “random reading”.

I do these before I read code:

Setup stepWhy it matters
pin compiler / tool versionsavoids "works on my machine" findings
run the full test suite onceestablishes baseline and exposes flaky tests
build a local graph of the repoquickly find owners of state, roles, and core flows
locate config and deployment scriptstells you how the system is actually used

If the repo does not have tests, you still want a harness that can:

  • deploy contracts
  • call state-changing entry points
  • simulate adversarial behavior (weird tokens, reentrancy)

Artifact 1: a threat model that fits on one screen#

I do not start by reading 10k lines.

I start by writing down the protocol in English.

Example format:

ItemAnswer
assetswhat is valuable? (funds, shares, debt, admin power, liveness)
trustwhat do we trust? (admins, oracles, tokens, keepers, sequencers)
entry pointswhich public functions change state?
invariantswhat must always remain true?
failure modewhat does “unsafe” look like? (loss, freeze, inflation)

This table is not “docs”.

It is a weapon: it tells you which lines of code can matter.

Threat model in practice: write down the attacker budget#

Most DeFi failures happen when a team assumes the attacker has a constraint they do not.

Examples of budgets attackers often have:

  • flash liquidity (one-block capital)
  • private ordering (bundles)
  • contract-based accounts (batched calls)
  • ability to revert and retry until favorable rounding

So I explicitly write:

  • can the attacker borrow the capital for one block?
  • can the attacker choose ordering?
  • can the attacker loop an action cheaply?

If the answer is “yes”, your threat model must treat iteration as part of the attack surface.

Artifact 2: the external-call graph#

When you are lost, find the edges.

Every serious exploit crosses an edge:

  • token transfer
  • oracle read
  • callback/hook
  • low-level call
  • delegatecall

A tiny but powerful habit:

For each edge, write what the external party can lie about.

Token edge lies:

  • return value is nonsense
  • balance changes outside the transfer
  • callback reenters you

Oracle edge lies:

  • value is manipulable for one block
  • value is stale
  • value is expensive to update in a way you did not model

Once you do this, “the scary parts” become obvious.

Mapping entry points without missing the weird ones#

There are two common misses:

  1. “indirect entry points” through callbacks (hooks, receivers, token callbacks)
  2. “privileged entry points” through upgradeability or role delegation

So when I map entry points, I categorize:

CategoryExamplesWhat I check
permissionlessswaps, deposits, mintsexternal calls inside accounting, rounding, DoS
role-gatedparameter updatesrole escalation, reentrancy into admin paths
upgrade pathsUUPS, transparent proxyinitializer correctness, upgrade auth
callback pathshooks, ERC-777, receiversphase correctness, reentrancy, stuck states

If your protocol uses hooks (Uniswap-style), the “callback paths” are first-class entry points.

Artifact 3: experiments as filters#

Humans are bad at exhaustive reasoning.

Use experiments to filter the search space.

Three experiments that pay off quickly:

1) invariants (even if you only write two)#

Invariants are constraints that survive sequences.

The best invariants in DeFi are usually:

  • accounting conservation (no free mint)
  • solvency (collateral >= debt under defined prices)
  • liveness (users can withdraw under defined conditions)

2) fuzz the public API, not the helpers#

Attackers do not call your internal functions.

They call public entry points with weird sequences.

Your fuzzer should do the same.

3) adversarial mocks#

Replace:

  • ERC-20 tokens with weird tokens
  • oracles with adversarial oracles
  • hooks with reentrant hooks

If you only test happy paths, you are testing your own beliefs.

Tooling as a workflow (not as a checkbox)#

Static analysis tools are good at finding:

  • missing access control
  • unchecked return values
  • reentrancy hazards (shallow)
  • dangerous low-level calls

They are not good at finding:

  • protocol-specific invariants
  • MEV and ordering dependence
  • rounding policy mistakes

So I use tools as early filters, then I move into protocol reasoning.

Example loop:

  1. run a linter/static tool to find low-hanging fruit
  2. map the external call graph
  3. pick 2-3 invariants
  4. fuzz those invariants
  5. write PoCs for any invariant breaks

The key is that every tool output should feed an artifact.

If the tool output does not change your model/map/experiments, it is noise.

A small severity rubric that improves writeups#

Instead of arguing severity subjectively, I use a rubric:

SeverityWhat breaksTypical evidence
Criticaldirect loss of funds or permanent loss of controlPoC drains or upgrade takeover
Highloss of funds under realistic conditions, or protocol insolvencyexploit path with plausible assumptions
Mediumbounded loss, partial DoS, griefing with constraintscosted griefing, limited damage
Lowbest-practice gap, hard-to-exploit edgemissing checks, unsafe defaults
Informationalclarity, hardening, documentationimproves comprehension

This makes reports readable by engineers and leadership.

How to write a finding that gets fixed#

Most “bad findings” fail because they are not actionable.

An actionable finding has:

  1. a minimal reproduction path
  2. the impact stated as a broken invariant
  3. a fix direction that preserves the protocol design goals

Here is a template that tends to work:

MD
### [H-01] Hook reentrancy lets an attacker bypass fee accounting

**Impact**
An attacker can pay less fees than intended by reentering `afterSwap` through a token callback, breaking the invariant:
"feeGrowth increases by at least the protocol fee for every swap."

**Root cause**
The hook updates `feeGrowth` after calling `token.transfer`, allowing reentrancy into a path that reads stale state.

**Exploit sketch**
1. Swap with a callback token.
2. Token reenters into `afterSwap`.
3. Second execution observes stale `feeGrowth` and settles without the intended fee increment.

**Recommendation**
Update accounting before external calls, or enforce a phase-based reentrancy guard.
Prefer settling based on balance deltas rather than return values.

This style works because it connects:

  • a broken invariant
  • a concrete mechanism
  • a fix that matches the threat model

A note on modern account behavior (and why it matters for audits)#

The line between “EOA” and “contract” behavior keeps blurring:

  • account abstraction
  • routers batching calls
  • signature-based authorization
  • delegated behaviors (emerging proposals)

Practically, this means:

  1. you cannot assume msg.sender is a user
  2. you cannot assume “one tx = one action”

This is why invariants and sequence-based reasoning matter.

One of the easiest audit mistakes is to miss an exploit path that requires calling the same function twice in one transaction through a router.

The "variant analysis" mindset (the fastest way to scale)#

When you find one bug, you should assume there are variants.

Variant analysis is simply:

  • identify the pattern (e.g., "external call before accounting update")
  • search for the pattern across the codebase
  • test each match under the same attacker model

This is how you avoid the worst audit failure mode:

report a single instance while missing 5 more copies of the same bug.

What I optimize for as an auditor#

There is a temptation to optimize for:

  • number of findings
  • number of lines read
  • tool outputs

I optimize for:

  • invariants captured
  • attack surfaces mapped
  • high-impact paths tested adversarially

If you do this well, the “finding count” becomes a side effect.

A realistic audit cadence (how I spend time)#

Every engagement is different, but a cadence like this prevents you from spending 80% of time in the wrong place.

Day 1: establish truth#

  • run tests
  • identify deployment config
  • write the one-screen threat model
  • map entry points and roles

Day 2: map edges and scary paths#

  • draw the external-call graph
  • find the price-critical paths (oracle reads)
  • find callback paths (hooks, receivers)
  • pick 2-3 invariants

Day 3+: break things on purpose#

  • sequence fuzz the permissionless entry points
  • write one or two adversarial mocks (weird token, reentrant hook)
  • build minimal PoCs for anything that looks like value creation or liveness failure

The point is not the day numbers. The point is that you move from:

  • reading

to:

  • experiments

as quickly as possible.

Communication is part of security work#

The fastest way to waste an audit is to deliver a report that the team cannot act on.

Two habits help:

  1. ask clarifying questions early when assumptions matter (oracle choice, upgrade authority, pause powers)
  2. share one high-risk hypothesis mid-audit so the team can confirm or deny the design intent

This is not “being nice”. It is reducing uncertainty.

If you discover the team intended an invariant that the code does not enforce, that is often the highest value finding you can deliver.

Upgradeability and admin risk: the boring part that breaks protocols#

Even if a protocol is “mathematically correct”, upgradeability and admin actions can break it.

So I always answer these questions explicitly:

QuestionWhy it matters
Who can upgrade?a single key can be a single point of failure
Is there a timelock?converts instant takeover into detectable takeover
Are initializers protected?uninitialized implementations are a recurring incident class
Are roles revocable and enumerable?role mistakes are hard to recover from

If the project says “we are not upgradeable”, I still check:

  • emergency pause powers
  • parameter setters
  • external module registries

Those are upgrades in disguise.

Notes that scale: write what you would need to prove it later#

When you are mid-audit, it is easy to write vague notes like:

  • “looks safe”
  • “probably ok”

Those notes are useless.

Write notes as claims you could defend:

  • “function X is permissionless and makes an external call to token Y before updating Z”
  • “oracle price is read from pool P spot price and can be moved with one-block liquidity”
  • “rounding differs between mint path and redeem path”

This style makes it easier to:

  • turn notes into tests
  • turn tests into findings

Audit taste: calibrating what matters#

If you are training your taste (what matters, what is noise), this is a good calibration read:

And if you want a structured “audit brain” to compare against:

A template you can reuse#

When I open a new repo, I create a single file with these headings:

MD
# Threat model

## Assets

## Trust assumptions

## Invariants

## Entry points (state changing)

## External calls (edges)

# Attack surfaces

## MEV

## Oracles

## Reentrancy / callbacks

## Rounding / precision

# Experiments

## Fuzz targets

## Invariants

## PoCs

# Findings

The point is not the headings. The point is that you can always answer: “where am I in the audit”.

Further reading#