Field notes
JUN 2026
A five-dollar AI scored 76. The professionals scored 84.
There is a little robot in my group chat that runs on a five-dollar-a-month AI. I gave it the real work of 44 jobs, graded against the professionals' own work, and it came within eight points of the pros. This is what an evaluation is, what the setup around a model changes and what it doesn't, and why the same model can be worth three different numbers depending on who measures it.
Method
JUN 2026
How I keep one face consistent across AI-generated portraits
Every text-to-image path I tried invented a face that was close but not mine, and my eye caught the drift every time. The breakthrough was a rule: never let the model render the identity-bearing pixels. Generate the outfit, the background, the light; lift the face itself from a real photo; composite and grade in code. Field notes on the pipeline and the principles behind it.
Method
JUN 2026
Work with the agent until it works. Then make it a workflow.
An agent is how you explore a task. A workflow is the result you keep. I work with an agent until it reliably works, then distill that into a deterministic workflow. One operator's field notes on the agent and workflow line Anthropic drew, the research it sits in, and why the workflows are the part worth sharing.
Position Response
JUN 2026
The model is the orchestrator, the workflow engine is the hands
Harrison Chase says n8n is a workflow builder, not an agent. He is right, and that is exactly why I run it. The model is the orchestrator that decides. The workflow engine is the hands that act. Decoupling those two, not adding a second agent, is the split that survived 6,442 production jobs.
Milestone
MAY 2026
From a fleet of agents to one agent: what changed and why
I stopped designing fleets of specialized agents and committed to one agent in a loop with many tools. Here is what I used to believe, what changed my mind, and what I publish from here on.
Study
APR 2026
6,442 jobs later: model selection beats harness choice 37 to 1
Across 6,442 production jobs and a 197-run, 19-model benchmark: swapping the model moves quality 37x more than swapping the harness. The harness wins optionality, not performance.
Architecture
MAR 2026
The control plane became the product: 5,079 jobs of operational truth
Across one 37-day operating log of 5,079 jobs, the hard part of production AI was never the model. It was the control plane wrapped around it.