Field notes
JUN 2026
Your group chat is dying. I built a member that won't let it.
Most bots are on-call: summoned, they answer, they vanish. The harder half is ambient: present without being summoned, and choosing almost always not to speak. letmecheckbot does both, and the ambient half has one job that matters most to me. When a group chat goes quiet, it reaches into the room's own past, finds the day worth remembering, and hands it back. Field notes on the mechanism, and on why a member that remembers is the only kind that can keep a room alive.
Architecture
JUN 2026
How my agent earns the word done: evals and certification gates
My own agent made my daughter's first-birthday book and misspelled her name on the cover, the kind of error a spellchecker passes because it is a correctly spelled name, just not hers. The eval layer and per-type certification gates that came out of it, with the production numbers behind them.
Architecture
JUN 2026
LetMeCheckThatBot: a group-chat agent that remembers
LetMeCheckThatBot is a second production agent I run, a different shape from Penny: multi-user, conversational, real-time. It sits in a Telegram thread, wakes on a keyword, and does real work through a tool loop, a single cheap chat model with no fallback, and a dedicated vision model. The part that took the most work was the part nobody expects: a memory that silently ingests every artifact dropped in the chat and recalls it by meaning weeks later.
Architecture
JUN 2026
Auto-decomposition, multi-model review, and quality gates
A real architecture pattern for handing a large task to one agent: let it decompose the work into a dependency graph, route the pieces across model families, and gate the output with a review pass run by a different model. The pattern that made a 2,102-record corpus job ship in one afternoon.
Architecture
MAY 2026
Your agent needs routines, not just skills
Model, agent, orchestrator, framework, skill, workflow engine. Six words the field uses as one. They are not the same thing, and the confusion costs you architecture.
Architecture
MAR 2026
Inside a Claude Code setup running 6,442 jobs: the completion gate
How I wired Claude Code to run as a job queue instead of a chat: a thin router, forked workers, injectable skills, and a completion gate that refuses to call work done after a single pass.
Architecture
MAR 2026
Inside one production AI agent: routing and the failure log
What one production AI agent actually looks like after 5,252 jobs: multi-model routing, an explicit fallback chain, a durable SQLite job engine, and the five failure classes that account for the breakage. Part one of two.