Here is the whole argument in one table, before any of the vocabulary. A skill and a routine are not two flavors of the same thing. They sit on opposite sides of the agent loop and they fail in opposite ways. Everything after this is me earning the distinction one word at a time.
| Skill | Routine (workflow) | |
|---|---|---|
| Where it runs | Inside the agent loop, in context. | Outside the loop, called by name. |
| How it behaves | Applied with fresh judgment, re-derived on every pass. | Runs the same concrete steps every time. |
| What it survives | Only the current context window. | Context windows, model swaps, multi-hour pauses. |
| What it is for | Making a decision better. | Making a recurring decision unnecessary. |
A model predicts tokens and nothing more
The model is the neural network. It takes tokens in and predicts tokens out. That is the whole job. It has no loop, no memory between calls, and no way to act on the world. Claude is a model. So is GPT. Ask one a question and it answers once, then stops and forgets you. A model is a brilliant, amnesiac contractor who does exactly one thing when you hand it a page of text.
Everything people find exciting about "AI agents" is built on top of that single, stateless prediction. The model is the engine. It is not the car.
An agent is a model put in a loop with tools
The agent is what you get when you wrap the model in a loop and give it tools. It reads a result, decides the next step, calls a tool, reads what came back, and goes again. Anthropic's Building Effective Agents reduces it to one line: an agent is a model "using tools in a loop." The loop is the entire difference between a model and an agent. Take the loop away and you are back to a contractor who answers once.
My system, Penny, is an agent in exactly this sense. One model, in a loop, with a set of tools it can call. It has run continuously since October 2025. It owns the loop, and that single fact decides almost everything else about the architecture.
The orchestrator is a role the agent plays, not a box in your diagram
This is the word that starts the hour-long arguments, because the field uses it for two incompatible things. The first meaning is the one Anthropic uses: in their orchestrator-workers pattern, the orchestrator is "a central LLM" that breaks a task down and delegates the pieces. The orchestrator is the model, doing the deciding. The second meaning, the one most vendors are selling, is a framework: a queue, a scheduler, a piece of coordination software that shuffles work between components.
When I say the orchestrator is the model, I mean the deciding lives in the model and nowhere else. The thing that chooses what happens next is an act of reasoning, not a line of routing code. Harrison Chase of LangChain draws the same line from the other side when he points out that tools like n8n are visual workflow builders, not agent builders. A workflow builder runs the path you drew. An agent picks the path. Calling your workflow builder an "orchestrator" does not move the intelligence into it.
So yes, an "agent orchestrator" and an "orchestrator agent" are the same thing, and that is the point. The orchestrator is a role an agent plays. It is not a separate product you buy.
The orchestrator is a role an agent plays. It is not a separate product you buy.
The framework and the harness are plumbing
The framework is the code that wires the model to its tools and runs the loop. It might be something you import, or something you wrote yourself. Either way it is plumbing. It does not decide anything.
The harness is the bigger version of the same idea: the runtime that keeps an agent alive across many context windows, manages its state, decides which tools exist, and recovers it when it falls over. Anthropic's recent work calls this decoupling the brain from the hands. The brain is the model and its reasoning. The hands are the tools and the execution. The harness is the body that holds the two together. None of it is smart. All of it is necessary. Confusing the harness for the orchestrator is like confusing the nervous system for the decision.
A workflow engine executes. It does not decide.
A workflow engine runs predefined, repeatable procedures. You draw the steps once, and it runs them the same way every time, durably, with retries and a log you can read. For me that engine is n8n. Anything that has to survive a context window, a model swap, or a five-hour pause lives there as a workflow the agent calls. Temporal makes the cleanest version of this argument: in their model the workflow is the deterministic blueprint and the unpredictable work happens inside it, never the other way around.
Think about how a person works. You wake up and run a morning routine. You do not re-derive how to make coffee from first principles every day. The routine is a workflow: a fixed sequence you can call by name that runs the same way each time. You still decide whether to run it and when, and that decision is yours to improvise. But the routine itself is concrete, and that is exactly why you keep it. Improvising a known thing every morning is fragile and slow. A distilled routine holds up precisely because its steps are fixed, observable, and identical every time you call them.
This is the part of my stack that does the most work and gets the least credit. n8n is the hands. It is the biggest single tool the agent has. But it executes what the agent decided. It is not deciding anything itself, and the day I start letting it decide is the day the system gets harder to reason about, not easier.
A reasoning step inside a workflow answers one scoped question
Workflows are not all deterministic plumbing. A node in one can stop and ask the model to judge something: classify this reply, rank these three drafts, decide whether this one is good enough to send. The workflow hands over a bounded question, sometimes to a single model call, sometimes to a whole sub-agent, and waits for the answer. That is real reasoning, happening inside the routine.
But the model in that seat is not the orchestrator. It is a decision-maker the workflow called for one scoped judgment, the way a recipe tells you to taste and adjust the salt. The orchestrator decided to run the workflow at all, and decides what to do with whatever comes back. The reasoning node decides the one thing it was asked, then hands control back. Same model, different seat. The orchestrator owns the loop. The decision-maker answers a question and sits back down.
Why not just use agents and skills?
This is the question I get the moment I say "workflow," because skills are the fashionable answer right now. A skill is packaged know-how the model loads into its context when the work calls for it. Anthropic ships Agent Skills as exactly this: a folder of instructions and scripts that makes the agent better at a class of task. A skill is the difference between a contractor who has read the manual and one who has not. It sharpens the judgment the model brings to a decision it is making right now.
A workflow is the opposite kind of thing. A skill makes a decision better. A workflow makes a recurring decision unnecessary, because you already distilled it into concrete steps. Cooking is a skill. Your Tuesday-night dinner is closer to a routine. The skill lives in the moment, in the agent's head, applied fresh. The routine lives outside the moment, called by name, run the same way every time. That is the split the table at the top was drawing.
The answer is in where a skill runs. A skill executes inside the agent's loop, in its context window, with the model freehanding the work on every pass. That is what you want when the work needs judgment. It is not what you want when the work is repeatable and you need it to come out identical on the hundredth call, long after the conversation that triggered it has scrolled out of memory. My test is one sentence: if I would be annoyed to find the model did this a slightly different way today, it belongs in a workflow, not a skill. Skills make my agent smarter. Workflows make my agent's work boring, in the way production systems are supposed to be boring.
Why the words are load-bearing
I learned this by getting it backwards. For six months I ran a fleet of agents with a coordination layer in the middle, and I told myself the coordination layer was the orchestrator. It was not. It was a framework pretending to be intelligence, and every week it produced a fresh class of bug where one agent could not tell what another had decided. I tore the whole thing down and rebuilt around one model in a loop. The teardown was a vocabulary correction. Once I stopped calling the framework an orchestrator, the second, third, and fortieth agents stopped looking necessary. Cognition reached the same place from the data side in Don't Build Multi-Agents: coordination between agents is where the decisions silently conflict.
The shape that survived is the one I still run. The model orchestrates. The harness runs the loop. The workflow engine executes. The intelligence sits in exactly one layer, and everything else is bookkeeping.
If you are sketching your own system this week, label every box with a single question: does this box decide, or does it execute? Exactly one of them should decide, and it should be the model. The day you cannot tell your orchestrator from your workflow engine is the day you start adding agents you do not need.
Some operational details in these essays have been changed for narrative or privacy reasons. The arguments, the numbers, and the lessons are real.