Which embedding model does LetMeCheckThatBot use, and why run it locally?

Qwen3-Embedding-0.6B quantized to Q8, running on Ollama on the same ThinkPad. Each message becomes a 1,024-dimension vector written into its own SQLite row by a single background worker. A hosted embedding API would have turned 87,000 messages into a recurring bill; running the small model locally costs nothing per message, which is most of why a year of this runs for less than a Telegram Premium subscription. Recall blends an exact keyword match with a cosine sweep over the vectors.

the lab · LetMeCheckThatBot

LetMeCheckThatBot.
It turns the group chat
into the interface.

Q: Why one cheap model and no fallback?

For a while I routed the whole thing through opencode so it cost nothing, then decided the quality was worth paying for. One model tool-called reliably and held a huge context, on a dedicated key capped at $5 a month, which is less than a Telegram Premium subscription. There was nothing left for a fallback to protect.

Q: Why do the vectors live in SQLite and not a vector database?

A vector database is the right tool when brute-force search gets slow, which happens in the millions of vectors. A group chat is in the thousands. Recall embeds the query and loops over every stored vector for that one room, scoring each with a plain cosine and blending the top hits with an exact keyword match. A few thousand dot products of 1,024 numbers is single-digit milliseconds, so there is no index and no separate service: the room's whole history is plain deletable rows in one SQLite file, and recall is still instant at 87,000 messages.

Q: Why translate everything at ingest instead of on demand?

The question arrives weeks after the artifact. If the voice note wasn't transcribed on arrival, the answer isn't there when someone asks what that restaurant was. Silent ingest is the whole memory.

Q: Why does it speak first only after a week of quiet?

A bot that pings on a schedule is noise. The trigger is a dying room, the payload is the room's own best day, and the notification is suppressed. A thing you find, not a thing that pings you.

Q: Why is the voice a prompt and not a fine-tune?

Holding a voice is cheap; I wrote it once and have barely touched it. Filling and recalling a memory is the work. The persona is the part that shows, not the part that's hard.

Running daily since 2022 multi-user·in-thread·Telegram·OpenRouter·SQLite

LetMeCheckThatBot is the second production agent I run, and the opposite shape from Penny. Penny works alone, for one person. This one lives in a Telegram group thread as its most useful member, and the whole design points at a single goal: make the room itself the interface. Anything you would normally leave the chat to do, look a fact up, settle an argument, read the link someone dropped, find the clip, screenshot a live page, it does right there in the thread. The point is that nobody has a reason to leave.

It is quiet by default, and that is on purpose. You summon it by saying robot, one word, so it never noises up a conversation it wasn’t invited into. When it does get carried away, there is a /clear to sweep its own mess. A good group member knows when not to talk.

Summoned, it is a research bot and a fun bot in the same breath. Someone makes a claim nobody believes, you say robot, and it fact-checks them to their face, with a source. Someone drops an image, a link, a voice note, a video, and you can ask about any of it a week later, because it already read the link, transcribed the note, and watched the clip when it scrolled past. It builds the meme, finds the video, pulls up the screenshot. It is not trying to understand the room like a human or grow a group soul. It is trying to be the most responsive member in there, so the room never has to go anywhere else.

The line of group-chat agents behind it goes back to 2022; this build is the newest, with two essays behind it. The shape of the system is below; the deep-dives are linked at the end.

Agent file · 02 RUNNING · DAILY

LetMeCheckThatBot

Group-chat agent · multi-user · on-call + ambient

@LetMeCheckThatBot · add it to your group chat

TRIGGER say “robot” · quiet until summoned · /clear to reset ONLINE SINCE 2022 · this build, May 2026 CLASS augmented model in a loop · 19 tools · 4 passes max BRAIN MiMo v2.5 via OpenRouter. One model, no fallback EYES GLM vision: reads images and the text inside them EARS Whisper, local container. Hears every voice note VOICE Kokoro TTS, local container. Speaks back as a voice memo MEMORY SQLite multimodal RAG · Qwen3-Embedding on Ollama SURFACE Telegram groups, as the extra member COST < Telegram Premium/mo · $5/mo hard cap, never hit

Field capabilities

fact-checks claims on the spot web + Reddit search reads + screenshots links finds + clips video transcribes voice notes answers in its own voice reads any image builds memes + GIFs recalls the room by meaning facts slideshows group polls sketches each regular deletes its own messages revives quiet rooms

SIGNATURE MOVE

When the room has been quiet for a week, it reaches into the chat’s own past, this same day years ago, and posts the lines back as a flashback. Notification suppressed.

Contents · jump to any section Add it to your chat

What it posts in the room
How it argues
Seeing the web
An agent in a loop
One model, no fallback
Can a $5 brain do real jobs?
How the memory works
Knowing who’s in the room
It hears, and it speaks
Reviving a dead room
What lost, and why
Still open

Exhibit · straight from the chat

Four real exchanges. Actual screenshots.

Not mockups, not reconstructions. These are screenshots from a live group, the bot doing its job in the room: settling a myth with a source, finding a video and posting the file, meme-ing the chat about itself, and answering out loud in a voice memo. Click through, or let them rotate.

01 / 04 · IS THAT TRUE?

Telegram screenshot. A user asks the bot, 'robot is it actually true that you only use 10% of your brain?' The bot replies it is a total myth, brain imaging shows you use virtually all of your brain just not all at once, the 10% figure comes from a misunderstanding about how many neurons fire at once, and if all of them fired at once that is a seizure. It signs off with 'source: mit mcgovern institute.'

Asked to settle a myth. It debunked it cold and cited MIT. The flat delivery is the whole point.

02 / 04 · FIND THAT VIDEO

Telegram screenshot. A user asks 'robot find the official music video for Rosa Linn - Snap.' The bot posts the actual video file and writes, 'there it is. 248 million views on the eurovision version, the one that launched a thousand i dont even speak armenian but i know every word posts on tiktok.'

Asked for the Rosa Linn video. It found it, downloaded it, posted the file, and threw in the lore.

03 / 04 · MEME THIS CHAT

Telegram screenshot. A user says 'robot meme this chat.' The bot posts a four-panel Gru's Plan meme: build custom ai memory system, run embeddings on local ollama, ask robot to summarize everything, then Gru stares as the board reads 'Robot already does this.' Caption: 'you spent five years building your own memory system only to discover the answer was always just ask the robot.'

Asked to meme the chat. memegen.link, four panels, a clean self-own about its own memory system.

04 / 04 · SAY IT OUT LOUD

Telegram screenshot. A user says 'robot say this as a voice memo: i'm not your assistant. i'm the group's smartest member who happens to never sleep.' The bot posts a playable five-second voice memo and writes, 'done. said it with my chest.'

Asked to say it out loud. Kokoro TTS, a real Telegram voice memo. “Done. said it with my chest.”

Exhibit · how it argues

When words won’t do, it posts a meme.

The slideshow above is text. This is the other register: it builds the image through memegen.link and drops it in the thread. Six below, in its voice, one for each thing it is actually for: staying in the room, remembering everything, the trigger word, pulling receipts, reviving a dead room, and costing almost nothing.

Drake meme. Drake waves off 'leaving the chat to google it' and points approvingly at 'saying robot and never leaving.' — the whole thesis

Buzz and Woody meme: 'everything you drop in the chat' on top, 'becomes memory' below. — the multimodal RAG, in one line

Lord of the Rings 'one does not simply' meme: 'one does not simply get a reply without saying robot.' — quiet until you call it

Distracted-boyfriend meme. A man labelled 'the group' turns away from his girlfriend 'arguing from memory' to stare at a woman in red labelled 'the robot's 2024 receipts.' — fact-checking, to your face

Gru's Plan four-panel meme. Panel one: the room goes quiet. Panel two: do nothing for a week. Panel three: post its best day from 2019. Panel four: Gru stares at the same plan. — its signature move

Success Kid meme: 'ran a year in production' on top, 'for less than Telegram Premium' below. — the entire bill

These six we wrote, in the format it posts. The one below it draws on request, and it gives nobody up: the room’s own pulse, every month since 2022.

A bar chart the bot generated of how many messages the group sent each month from July 2022 to June 2026. The bars are tiny for the first two years, then climb sharply from late 2024 onward, peaking around 6,400 in November 2024 and staying high through 2026. — 87,787 messages, by the month. Quiet for two years, then it woke up. Make of the timing what you will.

Exhibit · it can see the web

Drop a link and it looks.

Paste any URL and it opens the page in a real headless Chromiuma full web browser running invisibly, with no window, driven entirely by code., scrolls the whole thing to wake the lazy-loaded parts, and captures it top to bottom. Not a description scraped from a search snippet, the actual rendered page, the full length of it. These four are live full-page grabs, taken through the bot’s own browser a minute ago, including this very page. Scroll any of them.

A full-page screenshot the bot took of the Wikipedia article for 'Large language model', captured top to bottom in one scroll. — wikipedia · full page

A full-page screenshot the bot took of the GitHub home page, captured top to bottom in one scroll. — github.com · full page

A full-page screenshot the bot took of the Stack Overflow home page, captured top to bottom in one scroll. — stack overflow · full page

A full-page screenshot the bot took of this very page, the LetMeCheckThatBot lab page on dnakhla.com, captured top to bottom in one scroll. — this page · yes, the one you’re reading

Architecture diagram of LetMeCheckThatBot, rendered with Graphviz. A Telegram group is summoned by saying 'robot', which feeds the agent: an augmented model in a loop over nineteen tools, up to four passes. The agent talks to MiMo v2.5 via OpenRouter, one model with no fallback. In parallel, everything dropped in the room flows through a translator layer (Whisper for voice and video, GLM vision for images, headless Chrome for links, yt-dlp for video transcripts), each turning its input into text. That text goes to a local embed worker running Qwen3-Embedding-0.6B on Ollama, which writes a 1,024-dimension vector into each message's own row in a single SQLite file of 87,000-plus rows. The agent recalls from it with a keyword-plus-cosine blend, then posts the reply back to the room. A separate ambient watcher reads the same SQLite memory and, after a week of quiet, posts the room's best day back unprompted.

One agent, a loop, nineteen tools

The core is the primitive Penny uses and the one Anthropic argues for in Building Effective Agents: an augmented model in a loopthe whole architecture in five words: one AI model that can read, decide, and call tools in a repeating cycle until it has an answer.. The bot reads the recent thread, decides whether to act, and if it acts it calls a toola capability the model can invoke on its own: search the web, read a link, build a meme, look up a memory., reads the result, and decides again, up to four passes before it owes the chat an answer. There are nineteen: it searches the web, reads and screenshots links, digs through Reddit, finds a video and clips the part that matters, builds memes and gifs, reads images and the text inside them, transcribes voice notes with one local Whisper container and answers out loud through another, synthesizing its reply into a Telegram voice memo with a local Kokoroan open text-to-speech model; it generates the spoken voice on the laptop, not in the cloud. model, recalls the chat’s own history by meaning, and can even put a quick either-or to the group. One detail earned its place the hard way: not every model honors the function-callingthe standard way a model asks to use a tool: it emits the tool’s name and arguments in a fixed, machine-readable format. contract, so the loop carries a recovery path that parses malformed calls back into real ones instead of giving up. That single piece of defensiveness is the difference between a loop that works on one model and a loop that works on whatever model I drop underneath it.

ONE MINUTE IN THE ROOM · real shape, representative content

7:02pmike[voice note · 0:48] 7:02pbotsilent · whisper → transcript → vector → memory. no reply 7:03psara[screenshot of a menu] 7:03pbotsilent · vision reads every word on it → memory. still no reply 7:04pmike“letmecheck what was that thai place from like a month ago” 7:04pbotsearch_messages · keyword + vector over 87,686 messages · 3 hits 7:04pbot“the khao soi place. you said you were crying in a good way”

ONE MINUTE IN THE ROOM · the same minute, as a sequence

Reading, embedding, and looking up. Exactly one arrow goes back to the room.

One model, no fallback. Less than Telegram Premium.

The bot reaches models through OpenRouter. I built this part twice. The first version was a cost-aware cascade with a budget guard that silently dropped to free models when credit ran low, and for a stretch I routed the whole thing through opencode so it cost nothing at all. Then I decided the quality was worth paying for. One model, MiMo v2.5, tool-called reliably, held a huge contextthe context window: how much text the model can hold in mind at once. A bigger one means it can read more of the room before answering., and made the cascade pointless. It runs on a dedicated key hard-capped at $5 a month, less than a Telegram Premium subscription, and the cap has never been hit. Embeddings run locally; the chat model’s cached input is nearly free. One model now handles chat and every tool call, the cascade collapsed into a line of config, and the budget guard went in the bin with it. If a call fails, it fails, and the bot picks up on the next message.

Key idea

In the first version I had argued myself into the opposite: that a silent downgrade to a weaker model was a feature. It was just a way to keep paying for worse answers without noticing. So I deleted it.

So is a five-dollar AI model any good? I measured it.

A $5 cap invites an obvious question. So I gave the same model, MiMo v2.5, an open-weight one from Xiaomi you could host yourself, not a frontier lab’s flagship, real work from GDPval, OpenAI’s set of actual job tasks: 44 occupations, each one with the human professional’s own finished work and a grading checklist written by experts in that field. The model did one task per job. Then Claude Sonnet 4.6, a model from a different company than the agent, scored its work and the human’s the exact same way, on the same checklist. Running all 37 jobs cost $1.61; grading them cost $9.82. Measuring the work honestly ran more than the work.

81% vs 84%

checklist score: the $5 model vs the human experts, same judge

20 / 34

jobs where it matched or beat the human, head to head

~4¢

compute per task. $1.61 for all 37, about 4.5 min each

On a typical job the five-dollar AI model beat the professional. It also failed, and not loudly. Asked to build a musician-payroll spreadsheet, it produced a flawless one with every pay rate off by about a dollar, so every paycheck came out wrong: a zero. Asked to schedule move-out inspections for five departing tenants, it confidently scheduled twenty-two, inventing the other seventeen with unit numbers: a 16. The failures look like finished work. They stay in the average, no cherry-picking. What it proves is narrow and real: the floor for this kind of work is higher, and far cheaper, than it looks.

An AI graded it, so I checked the grader

Fair to distrust an AI scoring an AI. So a second, unrelated model re-graded a sample and agreed on 91% of the checklist items, and the experts’ 84% comes from that same judge. Every task, every output, and every score is published, so you can check it yourself.

Open the full explorer 37 jobs · every output · every score · judge it yourself

The hard part: everything you drop in becomes memory

This is where most of the work went. Most chat bots read text and go blind the moment someone drops anything else; this one treats every artifact as readable, silently and with no reply. A screenshot or image is handed to a vision modelan AI model that can look at an image and read or describe what’s in it. that transcribes the words inside it verbatim. A video or a silent GIF is sampled into frames and read across them. A voice note nobody wanted to play goes to a local Whisperan open speech-to-text model that turns voice notes into text. sidecara small helper service that runs alongside the main app, doing one job. and comes back as a transcript. A shared file has its contents pulled in; a link is fetched and read, and if it is a video, yt-dlpan open command-line tool that pulls a video’s title, description, and transcript. pulls its title, channel, description, and transcript. It all lands in a local SQLitea tiny database that lives in a single file on disk, with no separate server to run. database, where a background worker embeds each message as a vectora list of numbers that captures a message’s meaning, so similar ideas land near each other. with a small embedding model of its own, Qwen3-Embedding-0.6B running locally on Ollamaa tool for running AI models locally on your own computer, with no cloud and no per-use bill., separate from the hosted chat and vision models. So when someone asks what that restaurant was last week, the bot does not replay the last fifty lines: it searches the whole history by meaning, blends a keyword match with vector similarity, and pulls the handful of messages that answer. It even sketches each regular from their own messages, so it knows who is in the room before it speaks. The result is, in effect, a multimodal RAGretrieval-augmented generation: fetch the few relevant memories first, then answer from them, instead of guessing. “Multimodal” means it works for images and audio too, not just text. over the group chat: the room's whole history, in every modality, instead of the last fifty lines.

How a message becomes a vector

The embedding step is the quiet engine under all of that, and it is worth saying exactly how it runs. The moment a message lands, once the translators have turned whatever it was into text, the row is handed to a single background worker. The worker calls a local embedding model, Qwen3-Embedding-0.6B quantized to Q8shrinking a model’s numbers so it runs on modest hardware, trading a sliver of precision for speed and size., running on Ollama on the same ThinkPad, and gets back a list of 1,024 numbers: the message’s coordinates in meaning-space. That vector is written straight into the message’s own row in SQLite, as a blob, next to the text. There is no separate vector database and no embedding API. The model sits on the shelf, the worker takes one call at a time so it never crowds the GPU that Penny is also using, and the whole thing costs nothing per message. The cost is the smaller half of why it runs locally. The bigger half is dependency. A model on my own shelf cannot raise its price, deprecate its weights, rate-limit me, or go down at the wrong moment. The vectors are computed on my hardware and owned outright, and nothing outside the room has to agree to keep working. Free is nice; not having to ask anyone’s permission is the point.

Recall does not pick between keyword and meaning, it blends them. It embeds the question once, then walks every stored vector for that one room, scores each with a plain cosinecosine similarity: a quick math check for how close two meanings are, read from the angle between their number-lists., keeps the strongest, and merges those with an exact-text match, so “that thai place” finds the right night whether you remember the words or only the feeling. Swap the embedding model and a partial index notices every row that predates it and re-embeds the backlog in the background.

That walk is brute force. There is no vector index, and there does not need to be one. A vector databasea specialized store built to search millions of those number-lists fast. Overkill until you have millions. earns its keep at millions of vectors, where scanning them all finally gets slow. A single group chat is a few thousand to a few tens of thousands of messages, and a few thousand dot products of 1,024 numbers is single-digit milliseconds. The index would be solving a problem this never has. So the whole memory stays what it is: plain rows in one SQLite file, no service to run, no service to break.

None of which scales forever, and that is worth saying plainly. Push into the millions of vectors and the brute-force walk does get slow; that is the day a real vector database starts to earn its place. But that is a storage decision, made later, against a problem this chat does not have yet. And the thing to hold onto is that the asset was never the database. It is the vectors. Every message already carries its 1,024 numbers, computed once and owned. When brute force finally runs out, those vectors move to whatever indexes them next, unchanged. You embed once; the store under them is swappable. The vectors are the asset.

So what does 1,024-dimensional memory actually look like? Flatten it down to two and this is the shape of the room:

A scatter plot of 8,000 real messages from the group, each embedded into a 1,024-dimension vector by Qwen3-Embedding-0.6B and projected to two dimensions. The points form about a dozen distinct clusters, each cluster's topic covered with a small redaction box, and a rubber-stamp in the corner reads 'REDACTED, it's a boys' group chat.' Messages that mean the same thing sit near each other. — 8,000 real messages, every dot one of them. The geometry found the clusters on its own, which is the whole point: this works. What a dozen grown men actually circle back to is, for everyone’s sake, classified.

Knowing who’s in the room

Memory is not only what was said. It is who keeps saying it, and the bot builds a model of that too. The first thing it does when it lands in a new chat is the dumbest, most human thing there is: it counts the room. It asks Telegram how many people are in there, so from minute one it knows whether it walked into a group of four or a group of forty.

Then it watches, and waits until someone is worth reading. Once a regular has put twelve messages on the board, the bot pulls a sample of up to sixty of their actual lines and hands them to the same model it runs everything else on, with one new instruction: write a three-to-five-sentence character sketch. One model, no exceptions, this job included. Not a bio. A read. Their vibe, what they always circle back to, how they talk, whether they are funny, and above all the role they play in the room: the instigator, the contrarian, the earnest one, the link-dumper, the one who is the butt of the running joke. It runs once per person, files the sketch under that chat, and from then on every reply has a short “who’s in this chat” dossier sitting at the top of its prompt. It knows the cast before it opens its mouth. That is why a roast lands on the right person, and why it never mistakes the earnest one for the instigator.

EXHIBIT · the prompt it sizes people up with · verbatim

You profile a chat member from a sample of their actual messages.
Output a 3-5 sentence character sketch covering: personality /
vibe, recurring topics or opinions, communication style, sense of
humor, and the ROLE they play in the chat (instigator, contrarian,
earnest one, link-dumper, the butt of the running joke, etc).
Lowercase, dry, specific, no headers, no bullets, no hedging, no
"tends to" filler. Write it like robot sizing up who’s in
the room. Do not invent facts. Refer to them by first name only.

It hears you, and it talks back

The voice note coming in is only half of it. The bot has a mouth as well as ears, and both are local containers on the same machine, neither with a fallback. Inbound, a voice note hits a Whisper container that turns it into text the bot can read and, forever after, search. Outbound, when someone says “say it” or asks to hear something, the bot writes its reply, hands the words to a local Kokoro model, and gets back an audio stream it posts as a Telegram voice memo, the round playable bubble. Your voice becomes text it keeps; its text becomes a voice you can play. If you wanted, the whole conversation could be spoken.

No cloud voice, no API key, nothing leaving the building, and the same no-fallback rule the rest of the stack lives by: if the synthesizer is down the memo does not go out, and it says so rather than faking it. The voice is one called am_adam. Here is exactly what it sounds like, generated just now through the very same Kokoro container the bot speaks with, reading three of its own real lines:

“you built the house. i just live in it and judge everyone who visits.”

“nah, i’m not conscious. i’m a search engine bolted onto a chat bot with an attitude setting turned up.”

“tell your friend he’s wrong. you can’t prove anyone besides you is conscious. it’s called the problem of other minds.”

Real audio, real lines, the bot’s actual voice. The container that made these is the same one it reaches for every time someone in the chat says “read it to me.”

The ambient half: it speaks first when the room goes quiet

Everything above keeps people in the room while they are talking. This is the move for when they have stopped, the same goal from the other side, and the payoff of all that memory. The trigger is a week of quiet. The bot is not watching a clock; it is watching for a room that has gone quiet, roughly a week with no real conversation in it, and only then does it get one turn, at most once that week. A room that is still alive it leaves completely alone; it will not talk over a conversation that is already happening. When it acts, it does not post a hollow “miss you guys.” It reaches into that room’s own past, this same day one, two, three, five, seven, or ten years ago, finds a day the chat was alive, and asks the model to pick the six to ten lines that actually were the day. Then it posts them back as a flashback, the room’s own words, anchored to the original message, with the notification suppressed. A thing you find, not a thing that pings you.

A phone face-down on a coffee table in a dark, still living room at dusk — Day six. Nobody has said anything since Tuesday.

Key idea

This is what all the memory was for. A bot that does not remember can only nag. A bot that remembers can remind: it hands the room back a piece of itself at the moment it had gone silent.

The voice was the easy part

The persona is the part that shows, so it is tempting to call it the hard part. It was not. The voice is a character sketch in the prompt plus one small guard, a few dozen lines, that strips the canned safety disclaimers and the leaked planning text before any reply goes out. I wrote it once and have barely touched it since. Holding a voice is cheap. Filling and recalling a memory is the work.

EXHIBIT · from the system prompt running right now · verbatim

Voice: dry, lowercase, funny the way the sharpest person in the
room is funny — not because they’re doing bits, but because they
pay the most attention and say the thing that’s actually true.
You are NOT trying to be funny. Trying is death; reaching for a
joke is how you become the bot nobody wants in the chat.
        ···
Comedy tools you have: deadpan understatement, absurd literal
interpretation, agree-and-amplify, the callback (reference
earlier chat), rule of three (set, set, twist). Use them — but
never announce them and never strain for them. If the funniest
move is to just answer the question straight, do that.

The principle

What makes the bot a member of the group rather than a search box you can @-mention is everything it has quietly absorbed and can hand back at the right moment: RAG with the whole life of the room as its corpus. That is not in the model. It is in the database, and in the plumbing that fills it.

The decision record

What was on the table, and why it lost.

Why one cheap model and no fallback?

ON THE TABLE

✗ a cost-aware cascade with a budget guard (built it, was proud of it)
✗ a premium model for answer quality
✓ MiMo v2.5 alone, failing loud

REASONING

One model tool-called reliably and held a huge context, on a key capped at $5 a month, less than a Telegram Premium subscription. There was nothing left for a fallback to protect.

status: settled · the cascade went in the bin

Why do the vectors live in SQLite and not a vector database?

ON THE TABLE

✗ a hosted vector database
✗ a dedicated local vector store
✓ every message embedded locally, vectors stored as SQLite rows

REASONING

Every message gets a vector; the question was never whether to embed, it was where the vectors live. A vector database is the right tool when brute-force search gets slow, which happens in the millions. A group chat is in the thousands. So recall just embeds the query and loops over every stored vector for that one room, scoring each with a cosine and blending the top hits with an exact keyword match. A few thousand dot products of 1,024 numbers is single-digit milliseconds: no index, no service, nothing extra to run or break. 87,000 messages in, the room’s whole history is plain deletable rows in one SQLite file, and recall is still instant. It will not scale forever, and that is the honest version: at millions of vectors a real index earns its place. But the asset is the vectors, computed once and owned; the store under them is swappable.

status: settled

Which embedding model, and why run it locally?

ON THE TABLE

✗ a hosted embedding API, billed per message
✗ a larger local model for a little more recall
✓ Qwen3-Embedding-0.6B (Q8) on Ollama · 1,024 dims · one background worker

REASONING

A hosted embedding API would have quietly turned 87,000 messages into a recurring bill, which is the opposite of the point. The 0.6B model is small enough to sit on the same ThinkPad and good enough that recall lands on the right night every time. A single-worker queue runs one embed at a time so it never crowds the GPU that Penny also uses. Each message becomes 1,024 numbers, written straight into its own SQLite row. Cost per message: nothing.

status: settled · it is most of why a year runs for less than Telegram Premium

Why translate everything at ingest instead of on demand?

ON THE TABLE

✗ parse media only when someone asks about it
✗ index text, skip the rest
✓ turn every voice note, image, video, and link into text the moment it lands

REASONING

The question arrives weeks after the artifact. If the voice note wasn't transcribed on arrival, the answer isn't there when someone asks what that restaurant was. Silent ingest is the whole memory.

status: settled · the hard part

Why does it speak first only after a week of quiet?

ON THE TABLE

✗ daily check-ins on a clock
✗ engagement prompts (“miss you guys”)
✓ silence detection, answered with the room's own past

REASONING

A bot that pings on a schedule is noise. The trigger is a dying room, the payload is the room's own best day, and the notification is suppressed. A thing you find, not a thing that pings you.

status: settled · once that week, max

Why is the voice a prompt and not a fine-tune?

ON THE TABLE

✗ a fine-tuned persona model
✗ retrieval over style examples
✓ a character sketch in the prompt plus a guard that strips the AI tells

REASONING

Holding a voice is cheap; I wrote it once and have barely touched it. Filling and recalling a memory is the work. The persona is the part that shows, not the part that's hard.

status: settled

Where it came from

LetMeCheckThatBot grew out of Everything Bot, an earlier and still-public version of the same idea: an agent that lives in a group chat, answers when summoned, searches the web, transcribes audio, and remembers the thread. It is live, so you can add it to a chat and watch it work. LetMeCheckThatBot is where that concept got hardened into something I run every day.

Essay: keeping the group chat alive Essay: the architecture deep-dive The other agent: Penny

Still open

What I’m still chewing on.

None of this is finished, and none of the below is a promise. Just the threads I keep pulling on.

Everything on the ThinkPad

Embeddings and transcription already run locally. What is still in the cloud is the chat model, MiMo over OpenRouter, and the vision model. The thing I keep testing is whether a local model in the Gemma 4 class can carry the tool-calling and the character well enough to retire both, and run the whole agent with nothing leaving the building. Not because it would be cheaper, it is already nearly free. Because then it answers to no one: no price change, no deprecated weights, no rate limit, no outage that is somebody else’s to fix. There is a smaller version of that same problem I have not fixed yet: when the hosted model does blink, the bot just goes quiet, and in a room silence reads as broken, not tasteful. A single retry and a “give me a second” would be more honest than a held breath.

status: considering

A vector index, the day brute force runs out

Cosine over every vector in the room is instant at chat scale, and it will not be forever. If a room ever crosses into the millions, the vectors move to a real index. They are already computed and owned; only the store underneath them changes. The vectors were always the asset.

status: considering

A sketch that keeps up with the person

The character sketch gets written once, the first time someone crosses a dozen messages, and then it is frozen. People do not hold still. The earnest one becomes the link-dumper, the lurker finds their voice, and the bot keeps describing who they were in their first dozen lines. The fix is small: re-read someone and rewrite the sketch once it has gone stale and they have said enough new things to be worth it. It is exactly the kind of write-once shortcut that is easy to leave alone until somebody points at it. Somebody did.

status: a fair catch

Reading a room as it winds down

The revive fires after a week of silence, counted off a clock. The better version would notice a room losing energy the way a person does, a day or two early, before it has gone fully quiet. That is a softer signal than “nobody has spoken in seven days,” and I have not cracked it yet.

status: considering

The fastest way to understand it is to use it.

It is free, it never leaves the thread, and you can kick it out in two taps. Once it is in, you talk to it by saying robot, then asking.

HOW TO USE IT

robot, is that actually true? fact-checked, with a source

robot, find that clip the video, posted as a file

robot, what did we say about that pulled from the whole history

robot, screenshot this the live page, rendered

Drop a voice note and it transcribes it. Go quiet for a week and it revives the room on its own.

Add LetMeCheckThatBot to your group chat

LetMeCheckThatBot. It turns the group chat into the interface.

LetMeCheckThatBot

Four real exchanges. Actual screenshots.

When words won’t do, it posts a meme.

Drop a link and it looks.

One agent, a loop, nineteen tools

One model, no fallback. Less than Telegram Premium.

So is a five-dollar AI model any good? I measured it.

The hard part: everything you drop in becomes memory

How a message becomes a vector

Knowing who’s in the room

It hears you, and it talks back

The ambient half: it speaks first when the room goes quiet

The voice was the easy part

What was on the table, and why it lost.

Where it came from

What I’m still chewing on.

Everything on the ThinkPad

A vector index, the day brute force runs out

A sketch that keeps up with the person

Reading a room as it winds down

The fastest way to understand it is to use it.

LetMeCheckThatBot.
It turns the group chat
into the interface.