Architecture · 2026-06-23

Editing n8n by hand fights the canvas. I made it show me a diff.

n8n's canvas is a great way to read a workflow and a miserable way to change one. To move a step you drag a box. To rewire it you drag a wire. To rename one field you open a panel, scroll to the field, and hope you got the expression syntax right. For a five-node workflow that is fine. My agent runs on workflows that are not five nodes, and the change I want is almost always one sentence long: run this twelve hours later, log to memory before it sends, drop the retry on that branch.

A typed plain-language request on the left, a workflow graph on the right with nodes marked added and removed, and a closed gate between them showing nothing commits until it opens.

These workflows are the agent's hands. One agent owns the loop; the durable, scheduled, branching work lives in n8n, and over time the agent distills its own repeated work into more of them. The more of them there are, the more often I need to change one, and the less I want to do it a box at a time.

So the obvious move is to let the model rewrite the workflow JSON for me. Type the sentence, let it emit the new file, push the file through the n8n REST API. I tried that. It works right up until the model quietly drops a node you needed, and you find out three days later when the thing that node did stops happening. A workflow edit made by a model has the same failure shape as any other model write: the call returns, the dashboard looks healthy, and you have no idea whether the result is the one you asked for. You cannot see what changed.

The version that shipped fixes that by making every change reviewable before it commits. I tap "Ask to edit" on a workflow, type the change in plain English, and get back a diff: nodes added, nodes removed, nodes modified, connections rewired. The live workflow has not moved. I read the diff. If it is what I meant, I approve, and only then does it commit. The model proposes a change; nothing touches production until I say so.

Three decisions made that loop trustworthy.

The model reads the live workflow, not the one I remember

The first cut read each workflow from the agent's own memory, a stored description of what the workflow did. That was a mistake, and a specific one. The descriptions had drifted. Some of them described a workflow the way I had meant to build it rather than the way it was actually deployed, and a couple described steps that were never wired up at all. A diff computed against a fiction is worse than no diff, because it arrives looking authoritative and you approve it on trust.

So the editor stopped reading memory. It now fetches the current workflow JSON straight from the n8n API at the moment you ask, and the model proposes its change against that exact object. The remembered version gets no vote. A workflow you have not opened in a month is the one whose description you trust most and check least, which is exactly the one where a diff against memory will lie to you without ever looking wrong. This is the same split I lean on everywhere in this system: the read that grounds a decision has to come from the live external system, never the local cache that believes it already knows the state. I wrote up the cost side of that same read-versus-write split in split reads from writes. The model's job here is narrow and well posed: given this workflow and this sentence, return the workflow the sentence describes. That is the model doing the one thing it is reliably good at, translation, with the runtime holding the state of record. It is the division of labor I argue for in the model is the orchestrator: the model decides, the engine holds the truth.

The diff is the product, so it has to survive a refresh

The output of this tool is not the new workflow. It is the diff between the live workflow and the proposed one, and that diff is the entire point. So the proposal cannot be a fire-and-forget response that scrolls off the screen. It has to sit somewhere I can come back to and study.

The diff itself is more than a node count. It marks which nodes are new, which are gone, which kept their place but changed a parameter, and how the wires between them moved, because a workflow that keeps every node and reroutes one connection is still a different workflow. The sidecar holds each proposal in memory for thirty minutes. Inside that window the original JSON and the proposed JSON both live on the server under a proposal id, so the diff renders the same after a page refresh or a walk to get coffee. Refresh a normal chat response and it is gone. Refresh this one and it is still there, because the thing under review is a pending change to a production system, and a pending change you can lose by reloading a tab is a change you will eventually commit without reading. Only an explicit approval turns the proposal into a PUT against the workflow API. Letting the thirty minutes lapse is itself a safe outcome: the proposal expires, the live workflow is untouched, and the worst case of indecision is that nothing happens. This is the human checkpoint Anthropic keeps returning to in Building Effective Agents, and the one the 12-factor agents authors make a named principle: own your control flow, and keep a human on the writes that matter.

The browser never holds the key

The model call could have gone straight from the browser to OpenRouter. It would have been less code. It would also have meant shipping an API key to every tab that loads the dashboard, a credential sitting in client-side JavaScript for anyone who opens the network panel.

Instead the call goes to a small sidecar (penny-edit-bridge, a container that talks to Claude Sonnet 4.5 through OpenRouter) that runs behind the same nginx that serves the dashboard. The browser posts to a path on its own origin. nginx proxies that path to the sidecar. The key lives on the server and never crosses to the client. Same origin also buys me no second CORS surface and no separate auth domain to reason about. The browser gets a diff back; it never gets a secret. Simon Willison has made the version of this point that stuck with me across his practitioner notes: a secret that reaches the client is a secret you have already leaked, you just have not been told yet.

If a model edits a system you run, make it show you the diff first

None of this is really about n8n. The same trap shows up anywhere you let a model change a system you run: it edits faster than you can check, so you quietly stop checking. What keeps it honest is boring. Show me the change as a diff against what is actually live, keep that diff where I cannot miss it, and make a person say yes before anything ships. You still read every change. You just stop dragging boxes to make them.

Some operational details in these essays have been changed for narrative or privacy reasons. The arguments, the numbers, and the lessons are real.