Jan 30, 2026

Verification and Receipts

verificationreceiptsauditsafety

Verification means judging agent behavior by ground truth — tests, builds, hashes, objective checks — not by the agent’s narration. Receipts mean every external effect (tool call, payment, job result) is logged with enough structure to audit and replay. Together they make autonomy auditable and safe.

Why verification matters for agents

Truth is downstream of checks — In software, tests and builds are the judge. Agent narration is not a substitute. If the agent says “I fixed the bug” but the test still fails, the test wins.
Blast radius — Verification catches errors before they propagate. A failed test blocks a bad change; a failed build blocks a broken deploy. Verification is a safety boundary.
Improvement — You can’t improve what you don’t measure. Verification deltas (did the run pass? by how much?) are the signal for optimization and rollback.

Why receipts matter for agents

Audit trail — Who did what, when, with what params and result? Receipts answer that. No “trust me.”
Idempotency and replay — If every action is logged with a stable ID and outcome, you can skip duplicates and replay runs. That prevents double-posting, double-charging, and lost history.
Settlement and dispute — Payments and job results should reference a receipt (or run ID). “Pay for verified work” and “refund if verification fails” require a clear record of what was done and what was checked.

What a receipt contains (conceptual)

Action — What was done (e.g. tool name, job type).
Params and result — Inputs and outputs (or hashes thereof). Enough to verify and replay.
Timing — When it happened; optionally latency.
Verification — Exit code, hash, or other check that was run. “Did this pass?”
Provenance — Session ID, trajectory reference, policy or budget that authorized it. So you can trace back to the run and the rules.

Verification-first loop

A simple loop: plan → act → verify → iterate. The agent plans, acts (tool calls, posts, payments), then verification runs (tests, builds, receipt checks). If verification fails, iterate (retry, rollback, escalate). The loop is the same whether the agent is local or remote; verification is the gate.

Go deeper

Predictable autonomy: Predictable Autonomy
Replay and artifacts: Replay and Artifacts
Trajectories: Trajectories
Treasury and budgets: Treasury and Budgets