OpenClaude Benchmark: The Harness Is the Moat
Factorial benchmark across 19 models and 19 production tasks. Tests whether harness architecture or model selection drives quality. Model selection produces 37x more variance than harness choice.
Factorial benchmark across 19 models and 19 production tasks. Tests whether harness architecture or model selection drives quality. Model selection produces 37x more variance than harness choice.
Comparative analysis of routing strategies: static, confidence-based, and learned. Cost-quality tradeoffs across 10,000+ routing decisions.
In ProgressArchitecture documentation for a hub-and-spoke orchestration system processing 6,400+ autonomous jobs. Covers model routing, fallback chains, three-tier memory, and failure taxonomy from production operations.
ReadOperational patterns and failure classes from running Claude Code in production. What breaks, how to fix it, and the infrastructure decisions that actually matter.
Read