The Harness Is the Moat
293 scored runs across 19 models. Model selection creates 37x more variance than harness choice. The moat isn't better quality - it's the freedom to route to the right model.
Read293 scored runs across 19 models. Model selection creates 37x more variance than harness choice. The moat isn't better quality - it's the freedom to route to the right model.
ReadLessons from running Claude Code agents across 5,000+ real production jobs. The failures, fixes, and patterns nobody documents.
ReadInside the orchestration system running hundreds of autonomous AI workers 24/7. Architecture, failures, and what actually scales.
ReadA working practitioner's comparison of Claude Code, Codex CLI, and open-source alternatives. Benchmarks from real production use.
Read"8,100+ jobs: context loss beats model quality every time. Ground truth from production: 27% of failures are agents forgetting what they're doing. Not capability. Orchestration."— Danny Nakhla
The single largest failure class in this deployment. Agents lose track of what they're doing mid-task. No vendor talks about this.
One throttled request triggers a retry storm that burns through your budget before you notice.
The compute your system spends managing itself. Memory, scheduling, error recovery. Every cost model misses this. 11% of failures fall outside these three classes.
A 24/7 orchestration system coordinating research, trading, content production, and operational tasks across 20+ concurrent workers.
SQLite job queue with priority routing, model-specific fallback chains, and three-layer persistent memory.
Failure taxonomy from real operations: context loss (27%), rate limit cascades (22%), optimistic completion (19%).