About - D. Nakhla | Production AI Infrastructure

The Short Version

I've spent twenty years building software. Enterprise systems, data infrastructure, the full stack from database to deployment. For the last several years, I've been focused on AI - not the conference-talk version, but the version where you're debugging a failed job at 2am because the context window overflowed and the agent lost its memory mid-task.

The gap between what AI tools promise and what they actually deliver in production became the thing I couldn't stop thinking about. So I built an orchestration system to find out where the real boundaries are. Not a prototype. Not a weekend project. A production system that runs continuously, dispatches work across multiple model families, manages its own failures, and has processed over 6,000 autonomous jobs since October 2025.

Most of those jobs worked. The ones that didn't taught me more than any benchmark or whitepaper ever could.

What I Operate

The system I run is a multi-model orchestration engine. It routes tasks across Claude, GPT, Gemini, and open-source models based on cost, capability, and context requirements. It manages persistent memory through vector stores. It handles scheduling, error recovery, and session lifecycle without manual intervention. When something fails - and things fail regularly - it logs the failure class and either retries with a different strategy or escalates.

This isn't a toy. The system autonomously executes research studies, manages a live trading portfolio, produces and publishes content, triages email, coordinates multi-step workflows, and runs its own infrastructure maintenance. It has 25+ integrations, from financial APIs to publishing platforms to smart home controls.

I built it because I wanted to know - empirically, not theoretically - what happens when you give AI systems real autonomy over real tasks with real consequences. The answer turned out to be more interesting and more humbling than I expected.

What I've Learned

FROM 6,000+ JOBS

27%

Context loss

The single largest failure class. Agents lose track of what they're doing mid-task. No vendor talks about this.

22%

Rate limit cascades

One throttled request triggers a retry storm that burns through your budget before you notice.

40%

Overhead tax

The compute your system spends managing itself. Memory, scheduling, error recovery, coordination. Every cost model misses this.

The biggest lesson: the models are the easy part. The hard problems are orchestration, memory, cost management, and building systems that recover from failure without human intervention. I've documented the failure taxonomy, the architecture decisions that survived, and the ones I'd reverse if I could start over. That documentation is most of what this site is.

What This Site Is

This is a field notebook from the production floor.

I write about orchestration architecture, multi-model routing, agent memory systems, the economics of running AI at scale, and the failure modes that don't show up in documentation or demos. Every article is grounded in data from the running system - real job counts, real failure rates, real cost figures. No hypotheticals.

I also publish original research. I've built and analyzed a corpus of 86,000+ news headlines for computational linguistics work, studying how different outlets frame the same events. That research uses the same infrastructure - same orchestration engine, same multi-model pipeline, same production tooling.

The writing here is practitioner-to-practitioner. If you're building production AI systems and want to know what actually happens past the demo stage - the architecture decisions, the failure classes, the cost surprises, the things that work and the things that definitely don't - that's what I document.

Currently

APR 2026

memory

Context Engineering

Developing the three-tier memory architecture that lets long-running agents maintain identity, learn from past work, and operate without context drift.

monitoring

Failure Taxonomy

Cataloging and classifying production failure modes across 6,000+ jobs. Building toward a public dataset of what actually goes wrong in autonomous AI systems.

analytics

Media Framing Research

Computational analysis of how news outlets frame identical events. 86K+ headlines, semantic axis projection, quantitative divergence measurement.

trending_up

Algorithmic Trading

Operating a prediction market system that identifies mispriced probabilities using news sentiment analysis. Real money, real risk, real data.