Writing  /  Topic

The harness thesis

The spine of the argument: one agent, many tools, and the freedom to route models is the moat. Start here if you want the position and the proof.

Field notes
A five-dollar AI scored 76. The professionals scored 84. There is a little robot in my group chat that runs on a five-dollar-a-month AI. I gave it the real work of 44 jobs, graded against the professionals' own work, and it came within eight points of the pros. This is what an evaluation is, what the setup around a model changes and what it doesn't, and why the same model can be worth three different numbers depending on who measures it. Read · 7 min
Method
How I keep one face consistent across AI-generated portraits Every text-to-image path I tried invented a face that was close but not mine, and my eye caught the drift every time. The breakthrough was a rule: never let the model render the identity-bearing pixels. Generate the outfit, the background, the light; lift the face itself from a real photo; composite and grade in code. Field notes on the pipeline and the principles behind it. Read · 7 min
Method
Work with the agent until it works. Then make it a workflow. An agent is how you explore a task. A workflow is the result you keep. I work with an agent until it reliably works, then distill that into a deterministic workflow. One operator's field notes on the agent and workflow line Anthropic drew, the research it sits in, and why the workflows are the part worth sharing. Read · 8 min
Position Response
The model is the orchestrator, the workflow engine is the hands Harrison Chase says n8n is a workflow builder, not an agent. He is right, and that is exactly why I run it. The model is the orchestrator that decides. The workflow engine is the hands that act. Decoupling those two, not adding a second agent, is the split that survived 6,442 production jobs. Read · 7 min
Milestone
From a fleet of agents to one agent: what changed and why I stopped designing fleets of specialized agents and committed to one agent in a loop with many tools. Here is what I used to believe, what changed my mind, and what I publish from here on. Read · 6 min
Study
6,442 jobs later: model selection beats harness choice 37 to 1 Across 6,442 production jobs and a 197-run, 19-model benchmark: swapping the model moves quality 37x more than swapping the harness. The harness wins optionality, not performance. Read · 18 min
Architecture
The control plane became the product: 5,079 jobs of operational truth Across one 37-day operating log of 5,079 jobs, the hard part of production AI was never the model. It was the control plane wrapped around it. Read · 5 min

Other topics

Tooling reviewsFailure postmortemsArchitecture deep-divesContext and memory engineeringStart here