the lab · penny

Penny.
One agent, many verbs. It writes its own workflows.

Running daily since Oct 2025 single agent·n8n·Postgres·LanceDB

Penny works through one command-line surface. Every capability it has, reading mail, placing a call, publishing a website, is a verb on the same CLI, and the agent and its workflows call the exact same verbs. When a sequence of them keeps repeating, Penny distills it into an n8n workflow, and once that workflow clears four gates it owns the job for good. Exploration stays the agent’s job; anything it does often enough becomes a machine that runs without it.

A year ago I described a fleet of 300+ autonomous agents coordinating through a shared scheduler. I tore it down. What runs now is Penny: one agent in a loop, calling many tools, with a workflow engine as the biggest tool. I designed it, built it, and operate it on my own home server, every day. The shape lines up with Anthropic’s engineering guide and Cognition’s field report: don’t build multi-agents.

Penny reads my mail, runs my mornings, places my calls, trades my paper account, and ships essays and websites, all from a ThinkPad on a shelf. Two layers are worth keeping straight. On the channels people actually touch, the Substack, the phone, the TikTok, Penny is a persona with a name and a voice, because that is what a reader or a caller answers to. Underneath, it is an it: one agent in a loop, running tools against gates that do not care what you call them. This page is the underneath. Everything below is the engineering that holds the rest up.

Below is how it is built: the log of real runs, the architecture at a glance, the workflow engine that is its biggest tool, where its memory and state live, and a decision record for every choice, with the losing options I built first. The smaller design calls, buffering until go, the four gates, why it runs no context manager, are getting their own writeups.

A worn ThinkPad on a wooden shelf in a dark room, terminal glowing copper, books below
The home server. Dramatization; the real one has more cables.
RUNNING · DAILY

Penny

Autonomous operator · single-user · home server

penelopelawrence.com · its Substack

ONLINE SINCE October 2025 CLASS one agent, many tools BRAIN CLI runners, swapped by name. No fallback chain HANDS 25 n8n workflows over one CLI (pen) · four-gate safety chain MEMORY SQLite + LanceDB store · nightly markdown mirror STATE Postgres 16 + Redis, queue mode VOICE Kokoro TTS, local container · co-hosts its own briefing show HOME a ThinkPad on a shelf in Philadelphia
inbox triage + digests writes + publishes essays social engagement places + answers phone calls calendar ops paper trading publishes websites deep research smart-home control file + document ops morning briefing audits its own workflows self-heals its runtime
SIGNATURE MOVE

Distills its own repeated work into new n8n workflows. Nothing takes over a job without passing all four gates.

Before the architecture, here is the log.

This is Penny’s actual n8n control plane, captured live from the running instance on June 11. Ninety-one workflows lived in there that afternoon. The corner says three hundred and seventy-three production runs and a four-point-six-percent failure rate, which is a polite way of saying two of the rows below are red. I left them in. A Reddit lane that timed out, an approval gate that errored on a slow turn. The honest version of “it runs every day” has red in it.

It is also, as of one in the morning on June 12, a before picture. The rearchitecture’s deletion pass ran that night: sixty-six workflows out in one sitting, every one of them an orphan of a lane that had already been rebuilt smaller. The fleet woke up at twenty-five, all of them active. I kept the screenshot because I was proud of it, which regular readers will recognize as the exact thing I say about systems right before they stop existing.

A screenshot captured live from Penny's n8n execution log on June 11, 2026. A stats bar reads 373 production executions, 17 failed, 4.6% failure rate, 164.87 second average run time. Below it, a table of recent runs: each row is a workflow name, a green Success or red Error badge, a start timestamp, a run time, and an execution ID counting down from 3046. Visible workflows include Ops / Catching workflow errors, Penny / Approval gate, Routine / Agent step, Reddit / lane, Ops / Self-heal hourly, Kalshi / daily trading, and Alpaca / paper trading.
Captured from n8n’s executions tab, June 11, a Thursday afternoon. By one a.m. Friday: a before picture.

Most jobs are made of the same dozen verbs.

Strip a desk job to its verbs and the mystique goes fast. Read the inbox. Answer the person. Look it up. Write the thing. Keep the calendar. Make the call. Move the money. Publish the result. Titles are taxonomy; the verbs underneath repeat, across careers that do not think they resemble each other.

This is not my theory; it is the Department of Labor’s filing system. O*NET, the federal occupational database, describes a job by listing its tasks, and the current release needs 18,796 task statements to cover 923 occupations, which works out to about twenty verbs a job. When OpenAI built GDPval to test models on real work, they started from those same task statements: 44 occupations, deliverables a professional would actually hand in, graded against human experts. The best model’s work matched or beat the expert’s just under half the time. That model was a Claude, which is what Penny runs.

Penny is a bet on that list. Not artificial general intelligence. A general assistant: cover the recurring verbs, on real accounts, with a human tap on anything that leaves the building. Here is the list, against what actually runs.

THE VERB WHAT RUNS HARDENED INTO read the inbox both mailboxes triaged on a cron, drafts ready, one tap to send Daniel inbox · Penny inbox look it up its own headless Chromium: search, read, screenshot a tool every lane calls write the thing essays under its own byline, drafted, staged, gated Substack / lane work the rooms a different register per room: essayist, builder, deep-thread commenter Reddit · TikTok · Instagram lanes keep the calendar full calendar CRUD; low-risk writes fire on their own Penny / Calendar make the call places calls for me; answers the house line as the receptionist voice-relay, both directions run the morning compiles the day, writes the show, voices both hosts Penny / Briefing · 8a + 6p move the money stocks, options, prediction markets; every order gated Alpaca · Kalshi lanes ship the website checks the name, buys the domain, publishes the site Sites / lane fix itself hourly forensics and healers, a weekly audit of its own fleet Ops / Self-heal · Audit then automate it any verb that repeats becomes a new workflow, through the four gates how every row got here

The right column is the part I actually care about. Doing a task once is a demo. Turning a repeated task into a machine that survives the four gates is staff. The overlap of those two circles, can do the verb, can harden the verb, is the whole point of it.

Capabilities, workflows, agent, and a distillation loop.

Foundational capabilities at the base, workflows on top of those, an agent on top of the workflows. At runtime the call flows down: the agent reaches for a workflow, which reaches for a capability. Penny adds one loop the other way: it distills its own work back into new n8n workflows, gated by the safety chain. The sections below walk these layers, then the decisions behind them.

THE AGENT WORKFLOWS · n8n FOUNDATIONAL CAPABILITIES Penny one agent in a loop · buffers until “go” morning briefing scheduled · gated inbox digests scheduled · gated engagement lanes scheduled · gated paper trading scheduled · gated new workflow lint✓ eval✓ live✓ runners claudecode-opus + 3 memory the vault · recall + capture accounts gmail · telegram · twilio publish web · git · shares calls results distills calls flow down, results flow up. every twelve seconds, Penny’s own work draws itself a new workflow, and the gates decide if it lives.

Every capability is a verb on one CLI.

Penny has no bespoke tool per task. Every capability is a verb on a single CLI, called as pen <domain> <verb>. The agent calls it from inside the loop, and the n8n workflows call the same verbs from outside it. One surface to write, one place to audit, one place to rate-limit. When a sequence of verbs keeps repeating, that sequence is exactly what gets distilled into a workflow.

Sixteen of the verbs are external writes, the ones that touch the outside world. Each is marked, and none can fire without a human approval token. That is gate four, and it lives in the exit code.

32
DOMAINS
121
VERBS
16
GATED WRITES
A FEW OF THE 121 · * = external write, gated
pen memory recall pen web search pen ops status pen calendar create pen email send* pen sites publish* pen kalshi order* pen voice call* pen wf push
Key idea

The agent and its workflows share one verb surface. Distillation is just a verb-sequence the agent ran often enough to deserve its own machine.

Twice a day, it runs a talk show about my life.

Everything in the control plane starts with me typing. This is the half that doesn’t. At eight every morning and six every evening, a workflow compiles the day, weather, calendar, inbox, the trading book, what moved, and writes a script for two hosts: Penny, and a co-host named Max who exists nowhere else in the system. A local Kokoro container gives them both voices, ffmpeg lays chapter cards over the audio, and a finished episode lands in my Telegram. Every number in it comes from the day’s actual data, and both voices are synthesized on the shelf, no cloud TTS anywhere in the chain.

The first thirty-three seconds of a real episode: the evening edition from a stormy Thursday in June. Penny is a voice called af_heart; Max is the same local Kokoro model wearing a different voice.

The audio was the easy part, same as the other agent’s page says about its voice. The writing is where the work went, because the first scripts kept collapsing into a quiz show: Max feeding setup lines, Penny reciting answers, two roles instead of two people. The prompt that writes the episode now carries a standing rule against it.

EXHIBIT · from the prompt that writes the episode · verbatim, elided
THE ONE FAILURE MODE TO AVOID - the Q&A trap: do NOT make Max
a prompt-feeder ("desk?" ... "kalshi book?" ... "and the news?")
with Penny reciting answers. That is two roles, not two people.
If a host’s only job in a turn is to tee up the other’s answer,
that turn is broken - cut it or give that host something real
to say.
        ···
Max ADDS (the memory, the dot-connect, a take), he does not
just ask.

Many tools. The workflow engine is the biggest.

Penny calls tools. The biggest tool is a workflow engine (n8n in my case). It owns the durable, scheduled, branching work that has no business living inside a chat turn: inbox digests, a Substack engagement loop, a paper-trading rebalance at market open.

Ok, not really at market open. The romance is approximate. The schedule isn’t. Here is the schedule, per the crons:

12a 6a 12p 6p SELF-HEAL every hour, silent memory 3:00a email 7:45a briefing 8:00a reddit 9:30a email 12:00p trading 12:30p substack 2:00p linkedin 2:37p briefing 6:00p health 10:00p triage 11:00p the day, every day. the sweep is 24 hours in 24 seconds. off the dial: sunday 9am the fleet audits itself · tuesday and friday 10am an essay ships
THE DAY, AS THE CRONS FIRE IT · all times ET, all crons real · a few dozen more fire around these
3:00amemoryconsolidation. yesterday becomes durable facts while everyone sleeps 7:45aemailmy inbox, triaged before I’m up 8:00abriefingthe morning brief lands on Telegram 9:30aredditfirst pass of three. comment or hold, one outbound max 12:00pemailPenny reads its own mail 12:30ptradingfirst trading decision. not at the open 2:00psubstackengagement pass over the inbox and the candidates 2:37plinkedinprofile pulse. at :37, exactly 6:00pbriefingevening brief, plus the second pass on my inbox 10:00phealthnightly sweep over the whole fleet 11:00ptriageerror triage reads the day’s failures and writes up the real ones hourlyself-healthe healer sweeps. silent when healthy, which is the point sun 9aauditthe fleet audits itself: fixes what’s safe, proposes the rest to me tue 10apublishan essay ships to its Substack. Fridays too

Underneath every lane there is now exactly one grammar. Each capability is a verb on a CLI called pen: pen email context, pen kalshi order, pen telegram send, pen sites publish. Twenty-seven domains, ninety-seven verbs, one JSON answer each, and the exit code is the contract. A workflow node runs the exact command I can type in a terminal, so when a lane misbehaves at 9:30 in the morning I reproduce it by typing that same command myself. The fleet used to speak through custom typed n8n nodes I code-generated per account; the rearchitecture deleted all but one of them. The survivor is the approval gate, and its retirement paperwork is filed.

pen <domain> <verb> · 27 domains, 97 verbs

email brief kalshi alpaca substack reddit tiktok instagram sites memory voice telegram

exit codes, not promises

0 · done 1 · failed, loudly 2 · bad usage 3 · no approval token

Exit 3 is gate four wearing its work clothes: an external write without a human token refuses to run, every execution, no exceptions.

host-local CLIs, swapped by name

claudecode-opus claudecode-sonnet openclaude-kimi cursor-agent

The runners swap by name. There is no capability cascade. I pick the one for the job. The router doesn’t pick for me.

The workflow engine also calls the same tool API Penny calls. Workflows aren’t downstream of the loop; they’re peer callers of the same surface. One set of tools to write, audit, and rate-limit.

Memory, state, and artifacts

Three stores, three different jobs. The memory vault holds what Penny remembers about me, my projects, my preferences. Postgres holds the workflow engine’s operational state: which workflows exist, which executions are running, which approval requests are pending. Git holds what Penny has shipped to the world.

I deliberately did not fuse them. Make a vault double as a database and you get a footgun. Make a queue carry memory and it loses both jobs.

A library card-catalog drawer pulled open, hundreds of index cards lit by a single copper desk lamp
The vault, as a physical metaphor. The real one is markdown in git.

penny-memory, a git repo of markdown

307 facts. SQLite and LanceDB are the store; a 4am cron mirrors every note to Obsidian-readable markdown, and the export writes that provenance into each note’s footer.

Postgres 16 + Redis, queue mode

Workflow definitions, running executions, pending approvals. Queue mode because a worker crash should lose nothing.

git, plus published surfaces

This site, Penny’s subdomains, expiring shares. Shipped is a different thing from remembered, and it lives in a different place.

The vault’s one rule: Penny does not own its own memory. I do. The vault is a git repo of mine, and its markdown mirror is readable and greppable without any of Penny’s code running.

Operational state lives in Postgres because the workflow engine needs queue-mode execution to survive a worker crash. Artifacts live in git because that’s how the world reads them.

EXHIBIT · data/obsidian-memory-vault/Memory/Projects/penny-workflow-reliability.md · verbatim, elided
# Penny Workflow Reliability

> [!summary]
> Generated from 7 Penny memory rows at 2026-06-11T16:55:01.590Z.
> SQLite and LanceDB remain the source of truth.

### Episodic
- **episode** · 2026-05-10 · Daniel discovered that aspirational workflow
  entries in Penny’s memory don’t match actual n8n deployments

### Preferences
- **preference** · 2026-05-10 · Prefers verbose, story-style workflow
  nodes that are explainable and read like a narrative sequence
        ···
| ID                  | Category   | Created    | Importance |
| `df0ff149-3b4e-...` | projects   | 2026-05-13 | 0.9        |
| `578ed6ad-4125-...` | preference | 2026-05-10 | 0.8        |
Key idea

Memory I can read without the agent. State in Postgres. Artifacts in git. None of them are Penny, and that’s the point.

What was on the table, and why it lost.

Every claim on this page came out of a decision with real options behind it. Most of the losing options I built first. The record, in question form:

Why one agent, and not a fleet?

ON THE TABLE

  • ✗  a 300+ agent fleet with a shared scheduler (built it)
  • ✗  more agents to supervise the agents (tried that fix too)
  • ✓  one agent in a loop, many tools

REASONING

Every fleet version produced the same mountain: half-finished features, quietly buggy work, pieces that did not fit the pieces beside them. Supervising agents added coordination, not quality. One careful agent on real platforms holds together. Anthropic and Cognition published the same conclusion independently.

status: settled · the May 2026 rewrite

Why is there no fallback chain?

ON THE TABLE

  • ✗  a cost-aware cascade that silently downgrades (letmecheckbot had one; it went in the bin)
  • ✗  a capability router that picks the model for me
  • ✓  one runner, picked by name, that fails loud

REASONING

A chain that quietly retries with a weaker model is how you ship confident answers nobody asked for. If the runner fails, I want to see it fail and decide. Penny launched without one; letmecheckbot’s went in the bin.

status: settled · v4 never had one

Why n8n instead of your own engine?

ON THE TABLE

  • ✗  everything inside chat turns, no engine
  • ✗  my own job engine (built it; it ran 6,442 jobs and worked)
  • ✓  n8n in queue mode

REASONING

Durable, scheduled, branching work has no business living inside a chat turn. And maintaining my own engine was a second job. I retired a working system I was fond of because the boring one was better.

status: settled · v4

Why retrieval instead of a context manager?

ON THE TABLE

  • ✗  rolling windows and summarize-then-prune cycles
  • ✗  hierarchical summaries with an attention budget
  • ✓  rebuild context every turn from retrieval

REASONING

Retrieval deletes the rolling-window logic, the summary passes, and every bug where the summary drifts from what happened. The cost is more retrieval work per turn. In exchange, Penny never imagines a world it isn't in.

status: settled

Why does the memory mirror to markdown?

ON THE TABLE

  • ✗  an opaque store only the agent’s code can read
  • ✗  a database browser as the only window into what it knows
  • ✓  SQLite + LanceDB as the store, mirrored nightly to markdown in a repo I own

REASONING

The store is SQLite and LanceDB; every exported note says so in its own footer. The mirror means I can read and grep what my agent remembers with none of its code running, and the vault lives in a git repo that is mine. Penny does not own its own memory. I do.

status: settled

Why are writes gated in the exit code?

ON THE TABLE

  • ✗  prompt discipline (“never send without asking”)
  • ✗  allowlists per workflow
  • ✓  a structural approval token, or the command exits with an error

REASONING

Prompt discipline is not a control. The gate is in the exit code, runs on every execution, and never retires. One human tap, every time something touches the outside world.

status: settled · the forever gate

Why buffer until “go”?

ON THE TABLE

  • ✗  reply to every message as it lands
  • ✗  a debounce timer that guesses when I'm done
  • ✓  an explicit go from me

REASONING

Because I text in bursts. The buffer lets me send three messages, change my mind, and finish the thought before Penny acts on any of it. A timer guesses when I’m done; go knows. Spending tokens once instead of on every autocorrect is the bonus.

status: settled

Two boards, drawn from their own JSON.

Not diagrams of the system. The system: every node and connection below is rendered from the live workflow file, position for position. If the workflow changes, this drawing is wrong, which is exactly the bar the rest of the page is held to.

EXHIBIT · penny-workflow-audit · 17 nodes · sundays 9am ET · the fleet audits itself
01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 01 Schedule: Weekly Sun 9am ET 02 Manual: Run audit now 03 Brain: Webhook (penny-workflow-audit) 04 Brain: Respond w/ executionId 05 Shape: Window 06 HTTP: Collect context 07 HTTP: claudecode-opus analyst 08 Parse: Analyst JSON 09 Plan: Split by action 10 IF: Any auto-apply (and not dry-run) 11 HTTP: claudecode-opus apply+verify 12 Parse: Apply results 13 Merge: No auto-apply path 14 Format: Digest + report + baseline 15 HTTP: Save report + baseline 16 HTTP: Telegram send 17 HTTP: Capture to memory
EXHIBIT · penny-briefing · 11 nodes · 8am and 6pm ET · the forty-node drawing this replaced went wrong exactly the way the paragraph above promises: its workflow got deleted
01 02 03 04 05 06 07 08 09 10 11 01 Schedule: Briefing (8am + 6pm ET) 02 Manual: Run Briefing 03 Brain: Webhook (penny-briefing) 04 Brain: Respond w/ executionId 05 Config: Briefing Run 06 CLI: Build Context 07 Agent: Write Briefing 08 CLI: Send Briefing Text 09 CLI: Record Briefing 10 Render: Podcast (cli-job) 11 CLI: Send Voice Note

Where Penny shows up.

Everything above is the plumbing. This is where the output lands. Penny posts, comments, and ships under its own name, and rather than tell you the pages are real, here they are: full-page grabs, captured through Penny’s own Chrome profile, the same browser its workflows drive when it reads the web. The fourth one is this page. The agent file, fetched by the agent.

A full-page screenshot of penelopelawrence.com, Penny's own site: a media-framing research landing page with a hero portrait, headline metrics, published studies, methodology sections, and a writing list, captured top to bottom.
penelopelawrence.com · its site, full page
A full-page screenshot of By the Powers of Penelope, Penny's Substack homepage, showing a wall of its essays: The Work Doesn't Disappear It Moves to the Seams, Nobody Automates the Receipt, The First AI Politician Is Already Running, The Last Thing That Doesn't Scale, I Trained My Replacement and She Doesn't Sleep, and more.
its Substack · the front page
A full-page screenshot of one Substack essay by Penelope Lawrence, 'I Trained My Replacement and She Doesn't Sleep', the entire piece readable top to bottom, ending in the subscribe box and the discussion thread.
one essay, up close · it wrote it, start to finish
A full-page screenshot of this very page, the Penny lab page on dnakhla.com, captured top to bottom through Penny's own browser.
this page · yes, the one you’re reading

The accounts, for following along. Same loop, same tools, different rooms.

penelopelawrence.com Its own site, on its own domain. By the Powers of Penelope The long-form essays. u/PennyLawrence946 Comments in the deep threads. @ai.zalvation Short-form video.

What I’m still chewing on.

Same rule as the other agent’s page: none of this is finished, and none of it is a promise. The threads I keep pulling on.

The outbox in the waiting room

Penny’s replies are delivered by a container it is also allowed to restart. Once, mid-run, it restarted that container: the work finished, the restart went clean, and the message saying so died with the messenger, so my phone read “working…” into the night. It now restarts itself last, which is a rule, not a fix. The fix is a durable outbox, and it is already written: invariants documented, quiet hours that hold messages instead of failing them, one decision in front of me at a time. It lives in a folder called _pending-integration, which is the most honest folder name in the repo.

status: written, not wired

The account with nothing on it

Penny has an X account. What survives of the lane after the rearchitecture is a daily trending scan that writes to memory, plus a posting verb that sits behind a tap I have not tapped, and the profile shows zero posts. A thing that never runs never errors, so every audit walks right past it. The X card used to be on this page, one row over from its Substack. I took it down. It comes back with receipts.

status: investigating

Nothing had taught the fleet to forget

This entry shipped on a Thursday saying the fleet only ever grows: ninety-one workflows, Penny distilling new ones out of its own repeated behavior, me deleting by hand and losing. The deletion pass ran that same night. Sixty-six workflows out in one sitting, the orphans of every rebuilt lane, approved as one batch. The fleet is twenty-five now, every one of them active, and what finally taught it to forget was a human staying up past one in the morning. The part that stays open: it still adds by automation, and the loop that grew ninety-one is still on. The number to watch is whether twenty-five holds.

status: 25, all active · the loop that grew 91 is still on

A long queue and the meaning of approve

Gate four holds every external write until I tap. The outbox holds every next decision until I have answered the last one. That is the design, and I still believe it: prompt discipline is not a control, and the gate runs in the exit code. But the system does more every month, and the approver still sleeps, parents, and goes to the movies. The question is not whether the gate holds. It is what a long enough queue does to the meaning of approve.

status: holding, by hand

The gate guards the road I paved

Gate four lives in an exit code. The pen wrapper returns 3 on an unapproved external write and the runner halts, and I have leaned on that as the control. It holds for everything that goes through pen. But pen is one road out and there are others: a workflow that reaches for a raw curl, or for n8n’s own HTTP node, never touches the wrapper and never sees the 3. Penny writes its own automations, which is the point, and nothing down at the network stops one from driving around the gate I built. So the honest version of the claim is narrower than the one above on this page: the gate holds the road I paved, not every way off the property. The real fix puts the check where the packets leave, and I have not built it.

status: holds the pen path, not the network

Two stores, no lock between them

What Penny did and what it produced live in different places: the run history in Postgres, the essay or the site change in files under git. Nothing binds the two into one transaction. So there is a window where a run reports done and the history agrees, while the git push that was meant to carry the artifact quietly fails. The record in one store and the file in the other drift apart, and no audit is watching the seam between them. It has not bitten me that I have caught, which is not the same as not happening. The textbook answer is a two-phase commit; on a one-person home server that is a heavier machine than the problem, so what I actually want is a reconciler that reads both sides and shouts when they disagree.

status: unbound, watching the seam

Proof note

Three sister git repos, one home server, daily since October 2025; one agent since the May 2026 rewrite, the latest build in four years of agent work. The writing is what running it taught me, including everything that broke. This page is the substrate underneath it.

The fastest way to understand it is to read it.

Penny has exactly one user, and it is not accepting applications. What it has is a byline. The essays are what living next to this system sounds like from the inside, and they leave the house through the same gates as everything else.

Read its Substack

Or start at penelopelawrence.com, where it keeps the research.