Essays

Field notes from
production AI.

Architecture decisions, failure taxonomies, and operational patterns from 7,600+ autonomous jobs. Every article is grounded in data from the running system. No hypotheticals.

Featured

Harness Architecture

The Harness Is the Moat: 409 Runs, 22 Models, One Finding

Model selection drives 37x more variance than harness choice. But without the harness, you're locked to one vendor's pricing. The moat isn't quality - it's optionality.

Read Essay
ANY MODEL HARNESS LAYER Tool Loop / Cache / Checkpoints Files Terminal Browser THE MOAT IS THE ARCHITECTURE

All Essays

Benchmark Study

The Harness Is the Moat: 6,442 Jobs Later

293 scored runs across 19 models. Same model, different harness: 0.19 point delta. Different models, same harness: 37x variance. The harness enables model choice—the real cost driver.

Read
Claude Code Deep Dive

Claude Code in Production: What Nobody Tells You After 5,000 Jobs

Lessons from running Claude Code agents across 5,000+ real production jobs. The failures, fixes, and patterns nobody documents.

Read
AI Engineering

How I Run 500 AI Agents

Inside the orchestration system running hundreds of autonomous AI workers 24/7. Architecture, failures, and what actually scales.

Read
AI Code Tools

Claude Code vs Codex vs Open Source: A Practitioner's Honest Breakdown

A working practitioner's comparison of Claude Code, Codex CLI, and open-source alternatives. Benchmarks from real production use.

Read
Model Selection

The Models Were the Easy Part

After 6,000+ production jobs, the hardest problems aren't model quality. They're orchestration, memory, cost management, and failure recovery.

Read
Harness Architecture

Build Your Own AI Harness

The architecture that turns any model into a production coding agent. Tool loops, caching strategy, checkpoint systems, and the design that leaked.

Read
AI Strategy

The Harness Is the Product

Model selection matters 37x more than harness quality. But only if your harness lets you select. The claim piece—data and architecture in the full study.

Read
Head-to-Head

6,363 Jobs Later: Claude Code vs Codex vs Open-Source

A practitioner's comparison of Claude Code, OpenAI Codex, and open-source coding agents after running them all in the same production job queue. Real numbers, real failures, real recommendations.

Read
System Architecture

Anatomy of a Production Claude Code Setup

A real Claude Code system running 500+ workers, multi-model review pipelines, LanceDB memory, and cron schedules. Built for daily production use, not a demo.

Read
Context Engineering

Context Engineering in Practice: 3-Tier Memory for AI Agents

The architecture, code patterns, and production numbers from building a 3-tier memory system for AI agents. 230K vectors, 6,370 jobs, $0.003/month embedding cost.

Read
Case Study

Six Agents, One Instruction: Automated Corpus Analysis at Scale

One Telegram message triggered six coordinated AI agents that collected, embedded, and analyzed 2,102 NYT obituaries in under two hours. The orchestration architecture behind it.

Read
"Every article here is grounded in data from the running system. Real job counts, real failure rates, real cost figures. The production floor, not the conference stage."
— D. Nakhla