About
I GOT TIRED OF
AI DEMOS THAT
DON'T DO ANYTHING.
So I built a system that does. It runs 24/7. It handles real work. And it breaks in ways nobody warned me about.
The Short Version
I've spent twenty years building software. Enterprise systems, data infrastructure, the full stack from database to deployment. For the last several years, I've been focused on AI - not the conference-talk version, but the version where you're debugging a failed job at 2am because the context window overflowed and the agent lost its memory mid-task.
The gap between what AI tools promise and what they actually deliver in production became the thing I couldn't stop thinking about. So I built an orchestration system to find out where the real boundaries are. Not a prototype. Not a weekend project. A production system that runs continuously, dispatches work across multiple model families, manages its own failures, and has processed over 6,000 autonomous jobs since October 2025.
Most of those jobs worked. The ones that didn't taught me more than any benchmark or whitepaper ever could.
What I Operate
The system I run is a multi-model orchestration engine. It routes tasks across Claude, GPT, Gemini, and open-source models based on cost, capability, and context requirements. It manages persistent memory through vector stores. It handles scheduling, error recovery, and session lifecycle without manual intervention. When something fails - and things fail regularly - it logs the failure class and either retries with a different strategy or escalates.
This isn't a toy. The system autonomously executes research studies, manages a live trading portfolio, produces and publishes content, triages email, coordinates multi-step workflows, and runs its own infrastructure maintenance. It has 25+ integrations, from financial APIs to publishing platforms to smart home controls.
I built it because I wanted to know - empirically, not theoretically - what happens when you give AI systems real autonomy over real tasks with real consequences. The answer turned out to be more interesting and more humbling than I expected.
What I've Learned
FROM 6,000+ JOBSContext loss
The single largest failure class. Agents lose track of what they're doing mid-task. No vendor talks about this.
Rate limit cascades
One throttled request triggers a retry storm that burns through your budget before you notice.
Overhead tax
The compute your system spends managing itself. Memory, scheduling, error recovery, coordination. Every cost model misses this.
The biggest lesson: the models are the easy part. The hard problems are orchestration, memory, cost management, and building systems that recover from failure without human intervention. I've documented the failure taxonomy, the architecture decisions that survived, and the ones I'd reverse if I could start over. That documentation is most of what this site is.
What This Site Is
This is a field notebook from the production floor.
I write about orchestration architecture, multi-model routing, agent memory systems, the economics of running AI at scale, and the failure modes that don't show up in documentation or demos. Every article is grounded in data from the running system - real job counts, real failure rates, real cost figures. No hypotheticals.
I also publish original research. I've built and analyzed a corpus of 86,000+ news headlines for computational linguistics work, studying how different outlets frame the same events. That research uses the same infrastructure - same orchestration engine, same multi-model pipeline, same production tooling.
The writing here is practitioner-to-practitioner. If you're building production AI systems and want to know what actually happens past the demo stage - the architecture decisions, the failure classes, the cost surprises, the things that work and the things that definitely don't - that's what I document.
Currently
APR 2026Context Engineering
Developing the three-tier memory architecture that lets long-running agents maintain identity, learn from past work, and operate without context drift.
Failure Taxonomy
Cataloging and classifying production failure modes across 6,000+ jobs. Building toward a public dataset of what actually goes wrong in autonomous AI systems.
Media Framing Research
Computational analysis of how news outlets frame identical events. 86K+ headlines, semantic axis projection, quantitative divergence measurement.
Algorithmic Trading
Operating a prediction market system that identifies mispriced probabilities using news sentiment analysis. Real money, real risk, real data.
If you're building something similar, I'd like to hear about it.