the lab

Systems I run
in production.

Two production agents, running every day on my own infrastructure, plus the experiments I keep alive. The same engineering my private work turns on, in the open: I break things on purpose, publish what holds up, and the numbers are there for anyone to check. Each system here is the current build of a line of agents I have run and rebuilt over four years.

Production agents

Running daily since Oct 2025 Penny

An autonomous operator on a home server. One agent in a loop, calling many tools, with a workflow engine as the biggest tool. The hard part is the architecture: single-agent over a fleet, model routing by name not cascade, retrieval instead of a context manager, and the eval gate that decides when work is actually done.

single agentClauden8nPostgresLanceDB How it works
Running daily since 2022 LetMeCheckThatBot

A multi-user agent that turns a Telegram group chat into the interface. Say “robot” and it fact-checks, researches, reads the link and the voice note, and answers in the thread, so nobody leaves the room. Nineteen tools, one inexpensive model with no fallback, memory that recalls by meaning. Lifetime model spend: $4.35. The hard part was not the features. It was keeping the model in character, in budget, and online.

multi-userTelegramOpenRouterWhisperSQLite How it works

Experiments

The smaller things I keep running, each one a single idea pushed far enough to use in production.

Portrait pipeline Generates my own consistent headshots, including the ones on this site, from a single trained likeness. image gen
Paper-trading bot A scheduled rebalance that trades a paper account twice each trading day. A workflow, not a chat turn. scheduled
OG-card renderer Renders a per-page 1200×630 share card for every page on this site at build time. build hook
Review council Several AI judges with separate rubrics grade every page on this site before I ship it. How it works is in the colophon. review

Sunset projects

Standalone products I designed, built, and shipped end to end, on my own. Each was a bet taken far enough to ship, learn from, and retire.

The point

Running a system in front of real consequences is the test that matters. Everything in the writing started as something that broke in here.