Essays

Operating notes from
my production agents.

Architecture decisions, failure taxonomies, and the bugs I have actually paid for, from running my production agents every day. The system has run daily since October 2025; the current shape, Claude in the loop with n8n as its primary tool, since the May 2026 rewrite. The numbers are pulled from the running system. Where I am extrapolating I say so.

On framing: earlier essays use a fleet-of-agents shape ("500 agents", "300+ workers"). Through 2025 the field's consensus moved toward one agent in a loop with many tools, articulated independently by Anthropic, Cognition, and Simon Willison. The older articles are kept for the operational data; the headlines have not aged with the position.

Latest

If a model edits a system you run, make it show you the diff first If a model edits a system you run, make it show you the diff first I let an AI rewrite the automations that run my house. The day it quietly deleted a step and I found out three days later, I stopped trusting it to just do the work. Now it has to show me a plain-English preview of every change, and nothing goes live until I say yes. Architecture
JUN 2026 · 1 min
My agent has a terminal. It doesn't need MCP. My agent has a terminal. It doesn't need MCP. The standard way to give an AI agent its tools loads every tool's full description into the model's memory at the start of every session, used or not. Anthropic's own engineers watched that menu hit 134,000 tokens. My agent skips it: it gets a command line and reads the manual only when it needs it, about 200 tokens versus 14,000 for the same tool. It runs that way a few hundred times a day. The one place that genuinely can't open a terminal already has a door I built in thirty lines. Architecture
JUN 2026 · 4 min
Cookie transplant: how my agent posts as a real account Cookie transplant: how my agent posts as a real account A website can tell a robot is driving the browser before you type a single word, and it quietly refuses. Instead of disguising the robot, I let it borrow a session a real person already logged into by hand. One trick gets through the door for three platforms. The catch: a post can report success and still not exist, so nothing counts until a second check walks back and finds it on the page. Failure Postmortem
JUN 2026 · 1 min
A five-dollar AI scored 76. The professionals scored 84. A five-dollar AI scored 76. The professionals scored 84. There is a little robot in my group chat that runs on a five-dollar-a-month AI. I gave it one real task from each of forty-four jobs, graded against the professionals' own work, and it came within eight points of the pros. Then I ran the same model through my own homemade bot and it dropped six points, with the model never changing. This is what an AI evaluation is, and why the same model can be worth three different numbers. Field notes
JUN 2026 · 4 min
Cognitive surrender happens at the approval gate Cognitive surrender happens at the approval gate When a Wharton study put a wrong AI answer in front of 1,372 people, they agreed 73 percent of the time, and felt more confident doing it. I run a system that asks me to approve real actions all day. This is what I learned about the moment that actually goes wrong: not the agent's decision, but mine. Position Response
JUN 2026 · 1 min
Your group chat is dying. I built a member that won't let it. Your group chat is dying. I built a member that won't let it. A lot of friendships live in a group chat now, and that is also where a lot of them quietly end. No one is paid to keep yours loud, so when people get busy it goes silent and stays silent. I gave one group chat a member that fights that: it reads everything and says almost nothing, and when the room goes quiet for a week it reaches into the chat's own history, finds a day worth remembering, and hands the room back its own words. Live in one chat with 86,000 messages and almost four years of history. Field notes
JUN 2026 · 4 min

By topic

A five-dollar AI scored 76. The professionals scored 84. A five-dollar AI scored 76. The professionals scored 84. There is a little robot in my group chat that runs on a five-dollar-a-month AI. I gave it one real task from each of forty-four jobs, graded against the professionals' own work, and it came within eight points of the pros. Then I ran the same model through my own homemade bot and it dropped six points, with the model never changing. This is what an AI evaluation is, and why the same model can be worth three different numbers. Field notes
JUN 2026 · 4 min
How I keep one face consistent across AI-generated portraits How I keep one face consistent across AI-generated portraits I wanted professional photos of myself in different outfits and settings without a photographer, and every AI tool gave me a face that was a few percent off and instantly read as a stranger. The fix was a rule: never let the model draw the face. Generate the outfit, the room, and the light around an empty space, then paste a real photo of my face into it and finish in code. Field notes on the method and why my eye was right every time. Method
JUN 2026 · 4 min
Work with the agent until it works. Then make it a workflow. Work with the agent until it works. Then make it a workflow. I run one AI agent that drafts emails, publishes essays, and trades a small account every day. When it solves a task the same way twice, I freeze those steps into a fixed routine and stop letting it improvise that job. The agent is how you learn. The routine is what you keep. Method
JUN 2026 · 4 min
The model is the orchestrator, the workflow engine is the hands The model is the orchestrator, the workflow engine is the hands One automated assistant has run in production since October, doing 6,442 jobs for about $205 a month. It works because exactly one part decides and everything else only acts. When I tested it, the brain mattered 37 times more than the tools around it. So I add hands, never a second brain. Position Response
JUN 2026 · 4 min
The control plane became the product: 5,079 jobs of operational truth The control plane became the product: 5,079 jobs of operational truth I kept a 37-day log of one AI system: 5,079 jobs. The hard part was never the model. It was all the unglamorous work that kept the model honest. Architecture
MAR 2026 · 4 min
My agent has a terminal. It doesn't need MCP. My agent has a terminal. It doesn't need MCP. The standard way to give an AI agent its tools loads every tool's full description into the model's memory at the start of every session, used or not. Anthropic's own engineers watched that menu hit 134,000 tokens. My agent skips it: it gets a command line and reads the manual only when it needs it, about 200 tokens versus 14,000 for the same tool. It runs that way a few hundred times a day. The one place that genuinely can't open a terminal already has a door I built in thirty lines. Architecture
JUN 2026 · 4 min
Your group chat is dying. I built a member that won't let it. Your group chat is dying. I built a member that won't let it. A lot of friendships live in a group chat now, and that is also where a lot of them quietly end. No one is paid to keep yours loud, so when people get busy it goes silent and stays silent. I gave one group chat a member that fights that: it reads everything and says almost nothing, and when the room goes quiet for a week it reaches into the chat's own history, finds a day worth remembering, and hands the room back its own words. Live in one chat with 86,000 messages and almost four years of history. Field notes
JUN 2026 · 4 min
How my agent earns the word done: evals and certification gates How my agent earns the word done: evals and certification gates My own agent made my daughter's first-birthday book and put her name on the cover, in gold, spelled wrong. The kind of error a spellchecker passes, because it is a correctly spelled name, just not hers. So I stopped letting the agent decide when its own work was done, and made a second model sign off first. Here is how it works and what it costs. Architecture
JUN 2026 · 4 min
LetMeCheckThatBot: a group-chat agent that remembers LetMeCheckThatBot: a group-chat agent that remembers I run a Telegram bot that lives inside a group chat and answers when you call it. The part that took the most work was not its personality. It was a memory that silently saves every photo, voice note, and link people drop, and hands the right one back weeks later when someone asks. The personality was a day of work. The memory was the whole project. Architecture
JUN 2026 · 4 min
One offhand message, a research paper by dinnertime One offhand message, a research paper by dinnertime I sent one offhand message and got a finished research paper back in an afternoon, with no research assistants. Three choices made it work: the agent broke the job into pieces itself, ran each piece on the model that fit it, and let a second model reject the result. The gate is the part most people skip, and it is the part that keeps the work honest. Architecture
JUN 2026 · 4 min
Inside a Claude Code setup running 6,442 jobs: the completion gate Inside a Claude Code setup running 6,442 jobs: the completion gate An AI assistant ran 6,442 jobs for me and never marked a single one done on its own. The rule that made it safe to trust: every piece of work has to pass a second model before it counts as finished. Architecture
MAR 2026 · 4 min
Inside one production AI agent: routing and the failure log Inside one production AI agent: routing and the failure log One AI assistant ran 5,252 jobs from a home server in Philadelphia and finished 93.7% of them. The surprise is in the failures: not one was the AI saying something wrong. They were all plumbing. Part one of two. Architecture
MAR 2026 · 4 min