The technical mechanic is not the interesting part. One library, one function call, one state the library did not check. A one-line fix. The interesting part is the three weeks, and why every check the system ran on itself stayed green for all of them.
The worst class of production bug is the kind that confirms itself
Loud failures are easy. A crash, a 500, a queue that backs up. Somebody gets paged, a dashboard turns red, the bad commit gets found by lunch. This was the other class. The failure had the same shape as success. The dispatch function returned the right shape, the success counter incremented, the next layer trusted that, and the layer after trusted that. By the time the signal reached me it was wearing a uniform.
Every automation stack I have worked with has a version of this bug waiting somewhere. The teams that find it before their customers do are the ones that built reading-the-world-back into the contract from day one. It is the distinction Charity Majors draws in Honeycomb's observability manifesto: a system you can only ask about its own internal state is not observable, because the questions that catch this bug are the ones you did not think to instrument. The teams that skip it get a call from a customer asking why nothing has shipped since October.
Dispatch is not effect
Every system that does work on your behalf produces two observable events. Dispatch: I sent the instruction. Effect: the world is now different in the way I asked it to be. In a healthy system the two are coupled. The instruction lands, the world changes, the response confirms it.
Most of the time you only watch one of them, usually dispatch, because dispatch is cheap. The function returned, no exception was thrown, the response has the right keys. Effect is expensive: a second request, more code, more failure modes. So engineers instrument the cheap event and assume the expensive one tracks it. This is the same gap Dexter Horthy and the 12-factor agents authors are pointing at when they argue an agent's control flow and its observed state must be owned explicitly, not inferred from a hopeful return value. It is also why Anthropic's Building Effective Agents keeps returning to the same discipline: a tool call is not done until you have read back what it actually did.
For most systems on most days, the assumption holds. The day it does not is the day you discover your reliability dashboard has been measuring how often you asked for something to happen, not how often it actually did.
| Dispatch | Effect | |
|---|---|---|
| What it claims | The call returned and the counter incremented. | The artifact exists where the customer would see it. |
| Cost to observe | Cheap. Observed by default. | Expensive. Usually skipped. |
| What I had instrumented | Everything. | Nothing. |
The fix is one line. The lesson took three weeks.
The mechanical fix to the specific incident was one line of code, in front of one call, in one library. Trivial.
The repair to the way I think about reliability cost the three weeks the bug ran undetected.
That is the actual price tag of this class of bug. Not the engineering time to fix it. The time before you knew you needed to.
Once you have the shape in your head, you start seeing it everywhere. A payment integration that logs charge.created and never reconciles against the processor's settled-transactions report. An email pipeline whose send queue drains cleanly while the messages sit in a spam folder nobody is reading. An AI agent that reports a tool call as complete because the function returned, when the tool quietly no-opped on a malformed argument. Each one is the same bug with a different uniform: dispatch instrumented, effect assumed. In every case the dashboard is green and the work did not happen, and in every case the only honest fix is a second observation against the system you do not control.
If your team ships automation, audit one thing this week
For every external write your system performs, ask one question. Is there a corresponding read, on a different code path, that confirms the artifact exists in the world the way you expect it to?
Not "did the call succeed." Not "was the response 200." A second observation, taken later, against the same external system, that proves the change is real.
If the answer is no for any path that matters, you do not have a publishing system or a payment system or an agent. You have a system that reports on itself.
Most teams discover this the same way I did. A customer asks why something did not happen. The team looks at the logs. The logs say it happened. The team looks again. The logs still say it happened. Someone, eventually, walks over to the third-party system and looks. The logs were wrong.
That walk-over is the read-back. The lesson of the incident is to put the walk-over inside the contract, before the customer has to. This is the operational edge of the discipline Hamel Husain writes about under evals: you do not get to trust a system you have not measured against ground truth, and a green dispatch is not ground truth.
Three rules for a system that does not lie to itself
If I were writing the design doc for a system that does work against third parties from scratch tomorrow, these would be the three rules.
Every write has a paired read
Against the third party, on a different code path, as part of the same logical transaction. If the read says the artifact is not there, the write counts as failed, no matter what the dispatch returned.
The read happens against the external world, not the cache
If the system you do not control says the artifact is not there, the artifact is not there. Local state does not get a vote. Simon Willison makes a version of this point repeatedly in his practitioner notes on LLM tools: the only state you can trust is the state you fetched back, not the state your own code believes it left behind.
Silent zero is the loudest signal
Going from one comment a day to zero is not an outage in any traditional sense. No errors, no timeouts, nothing the load balancer would notice. It is, however, the exact shape of this bug. Alert on the floor as loudly as you alert on the ceiling.
None of these are clever. All of them are unglamorous. All of them get descoped from MVPs because they double the request volume against systems that are already metered by the second. That is the objection I hear most, and it is real: a paired read is a second request, and second requests cost money and latency. But the math only looks expensive until you price in the three weeks. A read-back that runs on every write is a known, bounded, line-item cost. A silent failure that runs until a customer notices is an unbounded one, paid in trust, after the fact, with interest.
So pick one external write your system does today. The one whose failure would embarrass you most. Add the paired read this week. Make a silent zero page someone. The systems I trust are the ones that pay the cost anyway, before they have a three-week story to tell.