D. Nakhla · essays

D. Nakhla · essays Daniel Nakhla helps people and teams adopt and use AI. Four years building agentic AI and RAG systems for Fortune 500 teams, plus side projects and explorations run daily in the open, numbers public. https://dnakhla.com/ 2026-06-16T00:00:00Z Daniel Nakhla https://dnakhla.com/about/ dnakhla@gmail.com © 2026 Daniel Nakhla Cookie transplant: how my agent posts as a real account https://dnakhla.com/writing/cookie-transplant-browser-automation/ 2026-06-16T00:00:00Z 2026-06-16T00:00:00Z

Reddit's bot detection does not like fresh Chromium sessions. So I stopped giving it fresh sessions.

A five-dollar AI scored 76. The professionals scored 84. https://dnakhla.com/writing/can-a-5-dollar-ai-do-your-job/ 2026-06-14T00:00:00Z 2026-06-14T00:00:00Z

I gave a five-dollar, open-weight AI the real work of 44 jobs and graded it against the professionals' own work. It scored 76; the professionals scored 84. A field guide to AI evaluations: what they are, what the setup around a model changes, and why a single score never tells the whole story.

Cognitive surrender happens at the approval gate https://dnakhla.com/writing/cognitive-surrender/ 2026-06-11T00:00:00Z 2026-06-11T00:00:00Z

Everyone will run agent fleets eventually. The dangerous part is the human at the approval gate, nodding work through without reading it.

Your group chat is dying. I built a member that won't let it. https://dnakhla.com/writing/keeping-the-group-chat-alive/ 2026-06-06T00:00:00Z 2026-06-06T00:00:00Z

A group chat has no one paid to keep it alive, so it fades. letmecheckbot pushes back: quiet while people talk, and when the room goes silent it speaks first, with the room's own history.

How I keep one face consistent across AI-generated portraits https://dnakhla.com/writing/consistent-face-ai-portraits/ 2026-06-02T00:00:00Z 2026-06-02T00:00:00Z

I spent months trying to get AI to render my own face the same way twice and it never worked. The fix was to stop letting the model render the face at all. Here is the pipeline I built: generate the world, lift the real pixels, and finish in code.

How my agent earns the word done: evals and certification gates https://dnakhla.com/writing/eval-discipline-certification-gates/ 2026-06-02T00:00:00Z 2026-06-02T00:00:00Z

My agent built my daughter's first-birthday book and spelled her name wrong on the cover. A spellchecker would have passed it. Why an agent cannot grade its own work, and the eval gates I built.

LetMeCheckThatBot: a group-chat agent that remembers https://dnakhla.com/writing/letmecheckbot-group-chat-agent/ 2026-06-02T00:00:00Z 2026-06-02T00:00:00Z

A Telegram bot that lives in a group chat as the extra member: eighteen tools, every message, image, voice note, and link turned into a multimodal RAG over the chat. The hard engineering was the retrieval.

Work with the agent until it works. Then make it a workflow. https://dnakhla.com/writing/make-it-a-workflow/ 2026-06-02T00:00:00Z 2026-06-02T00:00:00Z

Field notes from running one agent in production. I work with it until I find what reliably works, then distill that into a workflow. The practice sits exactly where Anthropic draws the agent and workflow line, with the actual workflows attached.

Auto-decomposition, multi-model review, and quality gates https://dnakhla.com/writing/one-agent-2102-obituaries/ 2026-06-02T00:00:00Z 2026-06-02T00:00:00Z

Three principles that turn one plain-language instruction into a dependency-aware pipeline, plus a corpus job that ran them: 2,102 records, six steps, two model families, twelve issues caught.

The model is the orchestrator, the workflow engine is the hands https://dnakhla.com/writing/the-model-is-the-orchestrator/ 2026-06-02T00:00:00Z 2026-06-02T00:00:00Z

Anthropic says decouple the brain from the hands. In my production system the model is the brain that decides and n8n is the hands that act. Here is why that split, not an agent fleet, is the load-bearing one.

Your agent needs routines, not just skills https://dnakhla.com/writing/agent-routines-not-just-skills/ 2026-05-26T00:00:00Z 2026-05-26T00:00:00Z

Model, agent, orchestrator, framework, skill, workflow engine. Six words the field uses as one. They are not the same thing, and the confusion costs you architecture.

The $50k that used to pay people now pays for tokens https://dnakhla.com/writing/the-50k-used-to-feed-someone/ 2026-05-26T00:00:00Z 2026-05-26T00:00:00Z

I am a tokenmaxxer. To learn whether a fleet of AI agents could do a team's work, I spent more than the engineers would have cost, and the work was worse. The honest part is who used to get the money.

The API said success. The work never happened. https://dnakhla.com/writing/api-said-success-work-never-happened/ 2026-05-22T00:00:00Z 2026-05-22T00:00:00Z

For three weeks our automated publishing system reported a hundred percent success while posting nothing. A note on the worst class of production bug, and the one rule that prevents it.

Why Chrome hidden tabs silently corrupt fetch results, and the one-line fix https://dnakhla.com/writing/chrome-hidden-tab-fetch-corruption/ 2026-05-22T00:00:00Z 2026-05-22T00:00:00Z

A Chrome call kept returning empty strings until the tab moved to the foreground. The fix was one line. The lesson was about which browser runtime guarantees actually hold when a tab is hidden, and why that bites any automation running many tabs in one browser, not just the Chrome DevTools Protocol.

From a fleet of agents to one agent: what changed and why https://dnakhla.com/writing/from-fleet-to-one-agent/ 2026-05-22T00:00:00Z 2026-05-22T00:00:00Z

A milestone note. The thinking on this site graduated from agent-fleet architectures to a single agent in a loop with many tools. Here is what I used to think, what triggered the change, and what I publish from here on.

Split reads from writes: how I cut my agent Chrome cost 90% https://dnakhla.com/writing/split-reads-from-writes/ 2026-05-22T00:00:00Z 2026-05-22T00:00:00Z

For a few weeks this spring my agent was paying full price for a headless Chrome session every fifteen minutes just to read comments. Then I named the problem.

6,442 jobs later: model selection beats harness choice 37 to 1 https://dnakhla.com/studies/harness-production-6442-jobs/ 2026-04-06T00:00:00Z 2026-04-06T00:00:00Z

Across 6,442 production jobs and a 197-run, 19-model benchmark: swapping the model moves quality 37x more than swapping the harness. The harness wins optionality, not performance.

Composite scoring: fixing stale agent recall https://dnakhla.com/writing/composite-scoring-stale-recall/ 2026-04-05T00:00:00Z 2026-04-05T00:00:00Z

Pure cosine similarity has no concept of time, so a four-month-old memory can overwrite the right context. The composite-scoring formula, anti-stale penalty, and 10-step retrieval pipeline that fixed it, from one production AI agent over 230K vectors.

Context engineering in practice: a 3-tier memory system, 230K vectors https://dnakhla.com/writing/context-engineering-in-practice/ 2026-04-05T00:00:00Z 2026-04-05T00:00:00Z

The architecture, code patterns, and production numbers behind a 3-tier memory system for one AI agent: a 75-line core file, a 230K-vector LanceDB archival store, and self-teaching procedural memory. 6,370 jobs, $0.003/month embedding cost.

Claude Code in production: what breaks after 5,000 jobs https://dnakhla.com/writing/claude-code-production-lessons/ 2026-03-29T00:00:00Z 2026-03-29T00:00:00Z

Five thousand Claude Code jobs on a home server. The model was never the hard part. Rate limits, context bloat, and session state were, and the fixes were all infrastructure.

What 12 months of daily AI coding in production actually cost https://dnakhla.com/writing/ai-code-tools-2026/ 2026-03-28T00:00:00Z 2026-03-28T00:00:00Z

Twelve months running Claude Code, GPT-5.4, Grok, Windsurf, and local Llama daily. What I actually paid, where each tool earned its place, and what the bill teaches you that benchmarks don't.

Inside a Claude Code setup running 6,442 jobs: the completion gate https://dnakhla.com/writing/anatomy-claude-code-setup/ 2026-03-28T00:00:00Z 2026-03-28T00:00:00Z

The architecture of a Claude Code system that dispatches jobs at production volume: a triage-router CLAUDE.md, a forked-worker job queue, injectable skills, and a completion gate that bans single-pass work.

The memory system and composite scoring https://dnakhla.com/writing/production-ai-memory/ 2026-03-28T00:00:00Z 2026-03-28T00:00:00Z

How one production AI agent remembers: a three-layer memory store, an LLM judge that gates writes, and a 197KB identity layer assembled at boot. Composite recall scoring is its own essay.

Inside one production AI agent: routing and the failure log https://dnakhla.com/writing/production-ai-orchestration/ 2026-03-28T00:00:00Z 2026-03-28T00:00:00Z

What one production AI agent actually looks like after 5,252 jobs: multi-model routing, an explicit fallback chain, a durable job engine, and the five failure classes that account for the breakage.

The control plane became the product: 5,079 jobs of operational truth https://dnakhla.com/writing/control-plane-is-the-product/ 2026-03-27T00:00:00Z 2026-03-27T00:00:00Z

Across one 37-day operating log of 5,079 jobs, the hard part of production AI was never the model. It was the control plane wrapped around it.