The Auditor Operating System: Repeatable Results in a Hostile Codebase

Auditor OS diagram

The biggest difference between a beginner audit and a professional audit is not “knowledge of Solidity”.

It is output quality under pressure:

short time
huge codebase
adversarial environment
ambiguous specs
politics

If your process is “read code and hope”, you will get random results.

If your process is “produce artifacts that constrain my search space”, you will get repeatable results.

This post is an auditor operating system: a set of work products that move you from uncertainty to defensible claims.

The four artifacts#

Good audits produce four artifacts, in order:

Model: what is protected, what is trusted, what is assumed
Map: entry points, roles, and external call graph
Experiments: tests, fuzzing, invariants, PoCs
Writeup: minimal reproduction, impact, and fix guidance

If you skip an artifact, you pay for it later.

flowchart LR M[Model<br/>assets, assumptions, invariants] --> A[Map<br/>entry points, roles, call graph] A --> E[Experiments<br/>fuzz, invariants, PoCs] E --> W[Writeup<br/>impact, repro, fix guidance]

Before you start: control your environment#

This sounds mundane, but it is the difference between “audit” and “random reading”.

I do these before I read code:

Setup step	Why it matters
pin compiler / tool versions	avoids "works on my machine" findings
run the full test suite once	establishes baseline and exposes flaky tests
build a local graph of the repo	quickly find owners of state, roles, and core flows
locate config and deployment scripts	tells you how the system is actually used

If the repo does not have tests, you still want a harness that can:

deploy contracts
call state-changing entry points
simulate adversarial behavior (weird tokens, reentrancy)

Artifact 1: a threat model that fits on one screen#

I do not start by reading 10k lines.

I start by writing down the protocol in English.

Example format:

Item	Answer
assets	what is valuable? (funds, shares, debt, admin power, liveness)
trust	what do we trust? (admins, oracles, tokens, keepers, sequencers)
entry points	which public functions change state?
invariants	what must always remain true?
failure mode	what does “unsafe” look like? (loss, freeze, inflation)

This table is not “docs”.

It is a weapon: it tells you which lines of code can matter.

Threat model in practice: write down the attacker budget#

Most DeFi failures happen when a team assumes the attacker has a constraint they do not.

Examples of budgets attackers often have:

flash liquidity (one-block capital)
private ordering (bundles)
contract-based accounts (batched calls)
ability to revert and retry until favorable rounding

So I explicitly write:

can the attacker borrow the capital for one block?
can the attacker choose ordering?
can the attacker loop an action cheaply?

If the answer is “yes”, your threat model must treat iteration as part of the attack surface.

Artifact 2: the external-call graph#

When you are lost, find the edges.

Every serious exploit crosses an edge:

token transfer
oracle read
callback/hook
low-level call
delegatecall

A tiny but powerful habit:

For each edge, write what the external party can lie about.

Token edge lies:

return value is nonsense
balance changes outside the transfer
callback reenters you

Oracle edge lies:

value is manipulable for one block
value is stale
value is expensive to update in a way you did not model

Once you do this, “the scary parts” become obvious.

Mapping entry points without missing the weird ones#

There are two common misses:

“indirect entry points” through callbacks (hooks, receivers, token callbacks)
“privileged entry points” through upgradeability or role delegation

So when I map entry points, I categorize:

Category	Examples	What I check
permissionless	swaps, deposits, mints	external calls inside accounting, rounding, DoS
role-gated	parameter updates	role escalation, reentrancy into admin paths
upgrade paths	UUPS, transparent proxy	initializer correctness, upgrade auth
callback paths	hooks, ERC-777, receivers	phase correctness, reentrancy, stuck states

If your protocol uses hooks (Uniswap-style), the “callback paths” are first-class entry points.

Artifact 3: experiments as filters#

Humans are bad at exhaustive reasoning.

Use experiments to filter the search space.

Three experiments that pay off quickly:

1) invariants (even if you only write two)#

Invariants are constraints that survive sequences.

The best invariants in DeFi are usually:

accounting conservation (no free mint)
solvency (collateral >= debt under defined prices)
liveness (users can withdraw under defined conditions)

2) fuzz the public API, not the helpers#

Attackers do not call your internal functions.

They call public entry points with weird sequences.

Your fuzzer should do the same.

3) adversarial mocks#

Replace:

ERC-20 tokens with weird tokens
oracles with adversarial oracles
hooks with reentrant hooks

If you only test happy paths, you are testing your own beliefs.

Tooling as a workflow (not as a checkbox)#

Static analysis tools are good at finding:

missing access control
unchecked return values
reentrancy hazards (shallow)
dangerous low-level calls

They are not good at finding:

protocol-specific invariants
MEV and ordering dependence
rounding policy mistakes

So I use tools as early filters, then I move into protocol reasoning.

Example loop:

run a linter/static tool to find low-hanging fruit
map the external call graph
pick 2-3 invariants
fuzz those invariants
write PoCs for any invariant breaks

The key is that every tool output should feed an artifact.

If the tool output does not change your model/map/experiments, it is noise.

A small severity rubric that improves writeups#

Instead of arguing severity subjectively, I use a rubric:

Severity	What breaks	Typical evidence
Critical	direct loss of funds or permanent loss of control	PoC drains or upgrade takeover
High	loss of funds under realistic conditions, or protocol insolvency	exploit path with plausible assumptions
Medium	bounded loss, partial DoS, griefing with constraints	costed griefing, limited damage
Low	best-practice gap, hard-to-exploit edge	missing checks, unsafe defaults
Informational	clarity, hardening, documentation	improves comprehension

This makes reports readable by engineers and leadership.

How to write a finding that gets fixed#

Most “bad findings” fail because they are not actionable.

An actionable finding has:

a minimal reproduction path
the impact stated as a broken invariant
a fix direction that preserves the protocol design goals

Here is a template that tends to work:

### [H-01] Hook reentrancy lets an attacker bypass fee accounting

**Impact**
An attacker can pay less fees than intended by reentering `afterSwap` through a token callback, breaking the invariant:
"feeGrowth increases by at least the protocol fee for every swap."

**Root cause**
The hook updates `feeGrowth` after calling `token.transfer`, allowing reentrancy into a path that reads stale state.

**Exploit sketch**
1. Swap with a callback token.
2. Token reenters into `afterSwap`.
3. Second execution observes stale `feeGrowth` and settles without the intended fee increment.

**Recommendation**
Update accounting before external calls, or enforce a phase-based reentrancy guard.
Prefer settling based on balance deltas rather than return values.

This style works because it connects:

a broken invariant
a concrete mechanism
a fix that matches the threat model

A note on modern account behavior (and why it matters for audits)#

The line between “EOA” and “contract” behavior keeps blurring:

account abstraction
routers batching calls
signature-based authorization
delegated behaviors (emerging proposals)

Practically, this means:

you cannot assume msg.sender is a user
you cannot assume “one tx = one action”

This is why invariants and sequence-based reasoning matter.

One of the easiest audit mistakes is to miss an exploit path that requires calling the same function twice in one transaction through a router.

The "variant analysis" mindset (the fastest way to scale)#

When you find one bug, you should assume there are variants.

Variant analysis is simply:

identify the pattern (e.g., "external call before accounting update")
search for the pattern across the codebase
test each match under the same attacker model

This is how you avoid the worst audit failure mode:

report a single instance while missing 5 more copies of the same bug.

What I optimize for as an auditor#

There is a temptation to optimize for:

number of findings
number of lines read
tool outputs

I optimize for:

invariants captured
attack surfaces mapped
high-impact paths tested adversarially

If you do this well, the “finding count” becomes a side effect.

A realistic audit cadence (how I spend time)#

Every engagement is different, but a cadence like this prevents you from spending 80% of time in the wrong place.

Day 1: establish truth#

run tests
identify deployment config
write the one-screen threat model
map entry points and roles

Day 2: map edges and scary paths#

draw the external-call graph
find the price-critical paths (oracle reads)
find callback paths (hooks, receivers)
pick 2-3 invariants

Day 3+: break things on purpose#

sequence fuzz the permissionless entry points
write one or two adversarial mocks (weird token, reentrant hook)
build minimal PoCs for anything that looks like value creation or liveness failure

The point is not the day numbers. The point is that you move from:

reading

to:

experiments

as quickly as possible.

Communication is part of security work#

The fastest way to waste an audit is to deliver a report that the team cannot act on.

Two habits help:

ask clarifying questions early when assumptions matter (oracle choice, upgrade authority, pause powers)
share one high-risk hypothesis mid-audit so the team can confirm or deny the design intent

This is not “being nice”. It is reducing uncertainty.

If you discover the team intended an invariant that the code does not enforce, that is often the highest value finding you can deliver.

Upgradeability and admin risk: the boring part that breaks protocols#

Even if a protocol is “mathematically correct”, upgradeability and admin actions can break it.

So I always answer these questions explicitly:

Question	Why it matters
Who can upgrade?	a single key can be a single point of failure
Is there a timelock?	converts instant takeover into detectable takeover
Are initializers protected?	uninitialized implementations are a recurring incident class
Are roles revocable and enumerable?	role mistakes are hard to recover from

If the project says “we are not upgradeable”, I still check:

emergency pause powers
parameter setters
external module registries

Those are upgrades in disguise.

Notes that scale: write what you would need to prove it later#

When you are mid-audit, it is easy to write vague notes like:

“looks safe”
“probably ok”

Those notes are useless.

Write notes as claims you could defend:

“function X is permissionless and makes an external call to token Y before updating Z”
“oracle price is read from pool P spot price and can be moved with one-block liquidity”
“rounding differs between mint path and redeem path”

This style makes it easier to:

turn notes into tests
turn tests into findings

Audit taste: calibrating what matters#

If you are training your taste (what matters, what is noise), this is a good calibration read:

https://www.zellic.io/blog/the-auditooor-grindset/

And if you want a structured “audit brain” to compare against:

https://guardianaudits.notion.site/Auditing-Brain-514ebd09119d44ed999a11b16d70d7de

A template you can reuse#

When I open a new repo, I create a single file with these headings:

# Threat model

## Assets

## Trust assumptions

## Invariants

## Entry points (state changing)

## External calls (edges)

# Attack surfaces

## MEV

## Oracles

## Reentrancy / callbacks

## Rounding / precision

# Experiments

## Fuzz targets

## Invariants

## PoCs

# Findings

The point is not the headings. The point is that you can always answer: “where am I in the audit”.