Simon Schmincke 18 June 2026

A VCs AI Brain

A retrieval system built around the bottleneck of every modern AI: context.

The Hook

The top users of AI at Creandum cover every position in the firm: GP, EA, Office Management, Legal. The team has been on the AI pill for nearly two years now.I wanted to see how far the current generation of tools can be pushed. Local models, context optimization, a complete mirror of everything I have ever written, communicated, or emailed, plus every person I've ever been in touch with, in one system that helps me navigate work and private life.I have a second brain. Not in the productivity-app sense, where the term means a Notion database. I mean a piece of infrastructure that runs on my laptop, ingests every email, message, meeting, payment, and document I touch, and answers questions about my life with full context. It knows who I'm meeting today, what they last wrote me, what we paid them, where we met, and what to say next. This post is a write-up of what's in it, why it exists, and how it works.

The Problem

Frontier AI is generic on purpose. The model is trained on the public internet, not on me. When I ask it to draft an email to a founder I've known for six years, it has no idea who the founder is, what we last discussed, what tone I use with them, or what I'm trying to get done. So I end up either pasting half a relationship into the prompt every time, or accepting a polished but empty draft I have to rewrite.The problem is the context window. A million tokens sounds like a lot until you try to fit a working life into one. What matters is what the model has loaded right now, not what it was trained on. Get the right slice of my life into the context at the right moment and the output sounds like me. Skip that step and the same model writes a polished stranger.That is what this is. A retrieval system, built around me, that gives a frontier model the context it would otherwise lack.

What "My Brain" is

A stack of local databases, a vector index, a set of ingestion pipelines, two front-ends, and a small library of skills that read and write across all of it. Everything lives on my laptop. Nothing important leaves it. The model on top is whichever frontier model is best this week, plus a local model for the workloads I do not want to send to a cloud provider.No single piece is unusual. The leverage comes from wiring them together: I can ask the system any question about myown life and get a complete, sourced, current answer.The idea came from a 70-page questionnaire. One of our portfolio companies was applying for a US banking license. The forms wanted bank statements, insurance contracts, credit agreements, travel logs, calendar history, every email I had on file with the company. I spent hours digging through ten different inboxes and folders, copying numbers, cross-checking dates. Halfway through I realized: this data already exists somewhere on my machine. The only thing missing was one system that could read all of it at once. That's where this started.

Folder structure

Every folder that holds my own content splits into 00 work/ and 01 private/. The 01 private/ side is gitignored everywhere. Nothing crosses the line. CLAUDE.md sits at the bottom of the tree but at the top of the brain's mind. It is the always-loaded instructions file every Claude Code session reads first. Hard rules, communication style, what to load when, how to handle sensitive data, how to use memory. Think of it as the constitution. Everything else is downstream.

Fig. 1 - The brain at a glance

What "My Brain" is not

Worth being precise about scope. What I built is narrower than the term "AI brain" often implies online.It is not a hosted clone of OpenClaw or any other public agent. It borrows ideas freely from public frameworks (Hermes Agent's dialectic user model, OpenClaw's self-improvement loop), but the code is built from scratch. The reason is practical, not territorial: this thing is never finished. Every second day I add a new feature as I learn more about what is actually possible. Owning the architecture end to end is what lets me customize and fine-tune at that pace, instead of waiting on someone else's roadmap or bending my workflow to fit a SaaS product. The code is mine, the data is mine,nothing routes through a third-party platform.It is not exposed to the internet (yet). The web CRM has an internet-facing surface, but the core stack - databases, vector index, ingestion pipelines, model inference for confidential workloads - runs on my laptop and is reachable only from localhost.There is no agent-to-agent communication. No background swarm. No one agent talks to another agent on its own.

There is no orchestration agent. No meta-controller deciding which agent runs when. Today the orchestrator is me, sitting at the keyboard, deciding which conversation to open.What I do use heavily are sub-agents spawned for specific tasks. They scan the internet for best-in-class architectures, read blog posts, evaluate competing UI patterns, then report back with suggested structural and UI improvements based on what's out there. Effectively, I spawn junior PMs to come back with recommendations on how to improve the brain itself. The brain reads about software design, suggests its own next refactor, then I greenlight the work. The loop closes through me, not on its own.

The Six Layers (plus the Filters Band)

The brain is structured as a vertical stack: Layer 1 at the top (where signal comes in), Layer 6 at the bottom (where it gets consumed). Between Layer 2 and Layer 3, a cross-cutting Filters band runs as a quality gate. Not a layer in its own right; the gate between raw ingestion and what gets stored.

Layer 1 — Sources

Every signal the brain can see, where it comes from, and what shape it arrives in:

  • Email (work) — the work account, via the Gmail API into a local SQLite. Full body, threading, labels, attachments.

  • Email (personal) — the personal account, same pipeline as work, separate database.

  • Calendar (work) — Google Calendar for work. Events, attendees, locations, conferencing links.

  • Calendar (personal) — personal Google Calendar plus the Family Calendar and Co-Parenting calendars (3 calendars merged into one DB).

  • WhatsApp — the entire message history my phone has, mirrored locally through the WhatsApp Web protocol via a Go bridge. Group chats and direct chats. Media downloadable on demand.

  • iMessage — years of iMessage and SMS history (about 500K messages) read directly from Apple's chat.db. Hourly .backup snapshot to a controlled mirror so downstream jobs don't need Full Disk Access per run, and a nightly per-message embed pass that writes each row into the dedicated comms_vec.db sidecar (see section 4.4).Google Drive — board materials, financial documents, photos, contracts. Walked on a schedule, eligible files registered for parsing.

  • The web CRM — phone-synced contacts, brain dumps, todos, notes captured on the go. Two-way bridge with the local people DB.

  • Attio — Creandum's firm-wide CRM, read-only into the people layer so I can see who on the team knows whom.

  • LinkedIn (via enrichment API) — full profile payloads (headline, current role, full experience and education history, skills) for everyone in the people DB.

  • Granola — meeting transcripts and notes, attached to attendees.

  • Markdown corpus — references/ , projects/, decisions/ , briefing files. Everything I write by hand or have Claude write on my behalf.

  • Manual notes — anything I dictate, paste, or drop in. Captured into the matching markdown file or the CRM brain-dump pile.

  • Bank statements / financial PDFs — current accounts, brokerages, depots. Parsed locally, transaction-level.

  • Tax & insurance documents — every tax filing, every insurance contract, parsed into the personal CFO database.

Fig. 2 — Sources fanning into ingestion.

Layer 2 — Ingestion

Engines and pipelines that pull from sources: hourly mirrors, Drive scanners, LinkedIn enrichers, the local Qwen parser for PDFs, vector embedders. Each pipeline's job is the same: get raw data into the system.

The Filters Band (between L2 and L3)

Cross-cutting gate. Four filters run between ingestion and storage, each with a specific job:

  • html→text — transformation filter, rewrites email HTML into clean text without dropping anything.

  • noise — gating filter, drops bulk-mail / newsletter threads from the indexed corpus (raw SQLite keeps everything; the indexed corpus stays clean).

  • sensitive — gating filter, drops threads matching a per-account block-list (sender addresses, sender domains, label-based rules) from the indexed corpus. The block-list is a JSON file in the sensitive vault, populated as need arises. Raw SQLite still keeps the messages, only the searchable corpus skips them.

  • sense check — applied at portfolio write-time, rejects implausible KPIs before they corrupt the ledger. Think of it as another "does this make sense?" check. Cash positions only fluctuate within a band. FTEs don't jump around quarter-over-quarter. If two near-in-time data points deviate more than physically plausible, the system challenges itself before writing. Catches budgets posing as actuals, forecasts posing as historicals, unit errors (€k vs €m), and outright outliers.

The index is only as good as what goes into it. So I keep each filter as its own named step, not buried inside ingestion code, which means I can see exactly what gets dropped and fix it when something looks off.

Layer 3 — Storage

Storage is split by what kind of data lives there. Each store is tuned for the queries you actually want to run against it.

Use a database when the data is structured and rows look alike (every transaction has amount, date, vendor, account); you want to filter, sort, or aggregate ("every Lufthansa charge above €500 in 2024"); the volume is high enough that grep gets slow (100k+ rows); the data is queried by other software, not just read by me (the CRM, SimonOS pages, Claude skills all hit the people DB).

Use markdown when the content is prose I or Claude actually want to read end-to-end (a portfolio dossier, a briefing, a decision log entry); it's slow-changing and benefits from version control; semantic search is more useful than structured filtering ("find any note where I mention the Portfolio Company X board dynamics"); a human will edit it directly in Obsidian or VS Code without going through a UI.

The actual stores:

  • people.db — SQLite, 19,300 contacts as of today. Core identity, multi-emails, multi-phones, addresses, full LinkedIn payload, relationship graph, tags. Most heavily-queried database in the system.

  • email-work.db / email-personal.db — SQLite, every thread, FTS5-indexed for full-text search. These are the source-of-record message stores; the body text also gets vectorized into comms_vec.db (below) for semantic retrieval.

  • calendar-work.db / calendar-personal.db — SQLite, events + attendees, indexed by date and attendee.

  • WhatsApp messages.db — SQLite, every chat and message, written in real time by the Go bridge from Luke Harries's whatsapp-mcp. Thanks Luke, that project did the hard part and gave me a clean local DB and gave me a clean local DB to build on top of.

  • iMessage chat.db — Apple's chat.db , snapshot-mirrored hourly via SQLite's .backup API to a controlled location. The mirror is the only step that needs Full Disk Access. AttributedBody blob decoding so modern messages whose body lives in the NSAttributedString payload (not the text column) get extracted. Sensitive vault, click-to-reveal.

  • portfolio-ledger.db — SQLite, board materials index, KPI snapshots, narrative, people changes per portco.

  • Personal CFO ledger.db — SQLite inside the sensitive vault. Every transaction, every tax filing, every insurance contract, every loan, structured.

  • memory.db — SQLite with sqlite-vec (1024-dim vectors) + FTS5 (BM25 lexical) virtual tables. The vector index over notes and documents only (markdown under references, projects, decisions, briefings, dossiers, plus Granola meeting notes that land as markdown). Messages are deliberately NOT here. They live in comms_vec.db so tens of thousands of threads don't dilute notes retrieval.

  • people_vec.db — sidecar for the people corpus. Per-person row of synthesized profile-text (headline, location, full work history, education, skills, languages), embedded with Qwen3-Embedding-0.6B. Re-embeds idempotently via content hash whenever the underlying people.db row changes.

  • comms_vec.db — separate sidecar for all messaging. One row per message across every channel: work email, personal email, iMessage, and WhatsApp. This is the dedicated message store, kept apart from the notes corpus so tens of thousands of threads don't dilute notes retrieval. Tables: comms_chunks (the raw per-message text), comms_vec ( sqlite-vec, FLOAT[1024], Qwen3-Embedding-0.6B), comms_fts (BM25 lexical for hybrid retrieval). Per-message granularity rather than chat-month chunking, so search lands on the specific message, not the bucket it lived in.

  • The markdown corpus itself — flat files under references/ , projects/, decisions/. Canonical for prose. memory.db indexes pointers into it, doesn't copy the content.

The principle: SQLite for anything you'd ever write a WHERE in a text editor, both wired into the vector index for "find me anything related to X." clause against, markdown for anything you'd ever read

Fig. 3 — One database per domain, plus the markdown corp us.

Layer 4 — Retrieval

How data leaves storage on demand: SQL queries via local FastAPIs (one service per database, each speaking JSON over HTTP), vector search via memory-index, Datasette for ad-hoc browsing in the browser, and the Memory MCP as the bridge into Claude conversations. The Memory MCP is the one that matters most for day-to-day use, more on it in section 5.6.

Layer 5 — Lifecycle

Lifecycle is the layer that keeps the brain compounding instead of rotting. It covers everything that has to happen over time, not just inside a single conversation. Auto-memory writes the small but durable observations from each session (feedback, project facts, reference pointers) into a persistent memory directory that loads at the start of every future session. The capability change log records every new MCP, every model swap, every new skill or daemon at the moment it lands, so I can answer "what does this system actually know how to do today" without grepping. Source-of-truth sync makes sure that when I edit a canonical markdown file, the rendered HTML surface follows in the same edit cycle (and a periodic safety-net skill catches anything that drifted). Audit trails and decay rules sit here too: board decisions and tax filings stay forever, WhatsApp banter and ephemeral todos get pruned, and the rules for what gets pruned when live in this layer rather than scattered across the ingestion pipelines.

Fig. 4 — Lifecycle: what persists, what decays, what reaches the next session.

Layer 6 — Workflows / Consumers

Layer 6 is everything that sits on top of the five lower layers and is what I use day-to-day. Claude Code skills do most of the work: small, invocable workflows like the morning briefing, the inbox triage, the pre-meeting one-pager on a person, the portfolio brief, the message drafter, the contact validator. SimonOS is the cockpit, a local set of HTML pages that render structured data from the SQL layer into dashboards I can open in a browser (people, portfolio companies, finances, travel, trip planners). The web CRM (sscrm) gives me the same contact data on my phone via a small Lovable-built PWA with push notifications. Everything in this layer is a thin client on top of the layers below; if I want a new view, I write a new skill or a new page, never a new database.

What It Can Do

The promise: ask anything, get the full context

Some examples of questions the system answers. "What did I pay for in Lisbon in March 2023." "Who did I meet at Web Summit last year and what did we talk about." "When did person X last write me, and what was the unresolved thread." "Show me every founder I passed on in the last 12 months grouped by reason." "What is my year-over-year carry distribution since 2018." "Which of the 47 LinkedIn enrichments completed yesterday flagged a job change." Each of these runs against structured, indexed, complete data, so the answer comes back sourced.Completeness is what makes it work. The brain is the actual log of my working life since I started ingesting it, not a curated set of notes I remembered to write down.The interface is conversational. The underlying capability is closer to a database join across a decade of my life. The model translates "what did I last discuss with person X" into the right query against the right store, fetches the result, writes the answer back in plain English.

Why it matters

The model itself is unchanged. What changes is what gets loaded into its context window. Same model, with my data in scope, produces output that reads like me. Without it, the same model writes a polished stranger. The gap widens as the corpus grows.

People at the center

A central design choice: everything is indexed by person, on top of the usual topic, project, date, and subject-line indexes. Both axes exist; person is the one I lean on most.The people DB holds 19,300 contacts today. For each contact, the brain has as much as it can find: full name, multiple emails and phone numbers, current and past job titles, full LinkedIn payload (headline, full experience history, education, skills, photo), the last time we interacted, every email thread between us, every WhatsApp message, every meeting we've both been in, every Granola note, every manual note I've ever logged about them, whether the team's other CRM (Attio) knows them, every tag I've ever applied, the relationship graph (who introduced us, who we both know).Almost every working-life question reduces to a person. "What's the latest on this deal" is "what did the founder last write me." "How are we doing with this LP" is "when did I last see this person, and what did they last say." "What should I prep for tomorrow" is "who am I meeting and what do they care about." Indexed by topic, every query has to re-discover the relevant person first. Indexed by person, that lookup happens up front and the rest of the query is cheap.The people DB is the most-queried database in the system. SimonOS pages link person names to a canonical people page, skills resolve people before doing anything else, the morning briefing is structured around the day's attendees.On top of that database sits a hybrid query layer at /api/people/query. It accepts free-text plus a list of structured constraints , each evaluated as its own EXISTS clause against the experience and education tables. That last bit was what fixed the constraint logic: "people who worked at McKinsey AND in automotive" used to collapse into a single EXISTS, which only matched if both attributes were attached to the same role. As one constraint per clause, the same query correctly returns people who were at McKinsey in one role and in automotive in another. Time filters ( active_year, started_year, ended_year ) plug into the same shape, so "former Uber engineers active at any startup in 2022" is one call.

Fig. 5 — Everything indexed by person.

Vectorization (semantic search)

Every piece of prose in the brain (every markdown file under references/ , projects/, decisions/ , every briefing, every dossier, every decision log entry) is run through a local embedding model (Qwen3-Embedding-0.6B 1024-dim) and stored in a vector database. The result: I can search by meaning, not by keyword. "Who did I meet in Lisbon about fintech" finds the right notes even if those exact words appear nowhere; the model knows that "Portugal" , "payments founder" , and "Lisbon coffee" all sit close to each other in vector space.Retrieval is a four-stage pipeline. Stage 1: a vector pass for semantic similarity and an FTS5 lexical pass over the same chunks. Stage 2: the two ranked lists fused via Reciprocal Rank Fusion. Stage 3 (optional, opt-in per query): a cross-encoder reranker (BGE-reranker-base, ~278M params, CPU-only) re-scores the top-25 with full attention over query + chunk and reorders. Where most of the precision gain comes from in the benchmarks; on the smoke tests it routinely promoted a chunk from RRF rank 17 to result position 2. Stage 4: a temporal rerank applies a per-source- type half-life decay (WhatsApp 30 days, email 60 days, curated markdown 180 days, fund legal 365 days), so a query that ties on relevance prefers the more recent answer.When something changes, the old version isn't deleted, just marked as no-longer-current. Search defaults to what's true now; a flag brings back the historical view. So the same store answers both "what's true now" and "what did the brain believe in March".Every piece of memory is also tagged as a fact, an event, or a rule. Asking "what happened with Portfolio Company X" returns events; "what's the deal with Portfolio Company X" returns facts. Keeps the two from competing for the same slot in the results. Top-k results come back scored. The whole pipeline runs locally; nothing about my markdown ever leaves the machine.The same retrieval stack runs in parallel across three vector corpora plus the structured DBs. The notes index lives in memory.db (markdown prose and Granola notes only). The people index lives in people_vec.db and embeds a synthesized profile-text per person (headline, location, full work history, education, skills, languages). The communications index lives in comms_vec.db and embeds every message across all four channels (work email, personal email, iMessage, WhatsApp) at per-message granularity, not as chat-month chunks, so a query lands on the exact sentence that mentioned the deal rather than the month it sat in.

All three stacks share the same four levers (Qwen3 query-instruction prefix, source-priority dedup, BM25+vector RRF, optional reranker), just over different corpora. The structured DBs (people, email, calendar, portfolio, personal CFO) handle anything you'd write a WHERE clause against. A query for "ex-Startup-X founders now at VCs" runs through the people stack; "what did person X and I last discuss about the Portfolio Company X round" runs through the comms stack; "Portfolio Company X board pack Q4" runs through the markdown stack; the agent picks the right one at call time, often in parallel.

Structured databases (one per domain)

Vectors handle fuzzy recall, databases handle precise queries. The brain runs one database per domain (people, email, calendar, WhatsApp, portfolio, personal CFO, plus the vector index itself). Section 4.4 has the full technical breakdown of what's in each.

The Memory MCP

The vector index and the databases are useless if Claude doesn't know how to reach them. The Memory MCP is what closes that gap. It is a small local server I wrote, speaking the Model Context Protocol (MCP), and it exposes one primary tool: mcp__memory__search. Claude Code loads it at session start; from then on every conversation has it available like any other built-in tool. Given a natural-language query, it runs the hybrid retrieval described above and returns a top-k list of hits with scores, source paths, snippets, and source-type tags. Before answering any non-trivial question, Claude calls it automatically; The behavior is encoded as a hard rule in the always-loaded instruction file, not prompted for case-by-case. From my side, I just ask the question and the retrieval happens silently.Three reasons it matters. First, retrieval triggers automatically — I don't have to remember to call a search tool, Claude calls it before answering. Second, MCP is an open protocol, so the same tool I expose to Claude today I can expose to OpenAI Codex tomorrow, or to a local Qwen model the week after. Third, it composes with the rest of the MCP servers, which by now cover Google Workspace, WhatsApp, the people DB, the personal CFO ledger, Attio, Granola, the sscrm CRM, flight search, and more. When I ask "what did person X and I last discuss," the agent fans out across all of them in parallel in a single turn. The Memory MCP gets called most often because semantic search covers the widest range of queries, but it is one tool among many.

It speaks in my voices, on my channels.

Memory is half the system. The other half is output, and output that doesn't sound like me is useless.Four tones of voice, codified. Separate voice guides for introductions (warm intros between two people, screenshot-driven, auto-drafted to Superhuman or sent out directly via the GWS MCPs depending on the task), emails (DE + EN, du/Sie rules, work voice and private voice, with annotated samples), blog posts (this voice — direct, technical, no padding), and LinkedIn (short-form, public-register, optimistic without performative). Each lives as its own markdown file with do's, don'ts, and annotated examples.They're distilled from my historic communication corpus, not hand-written rules. A decade of email, WhatsApp, and LinkedIn posts fed in as training material; the guides emerged by pattern-mining (where do I open with "Hey" vs "Lieber"? when do I use bullets vs prose? which words do I never use?). The output sounds like me because it is literally how I write when I'm stressed, when I'm pitching, when I'm trying to convince someone, when I'm giving feedback.Multi-account email. Two Workspace accounts (work + personal), each wired up. The brain picks the right account based on who I'm writing to.WhatsApp as a locked-down channel under the lethal-trifecta guardrails. Any language — matches the recipient's language without being told. Knows who to address how — du/Sie is a per-contact attribute, pulled from the relationship history.

Auto-memory: how the brain learns across sessions

Frontier models forget. Every new conversation starts from zero, same weights, no recollection of what worked yesterday. For a personal-AI system that's a problem: I shouldn't have to re-teach the same correction twice. The solution is a simple, file-based memory store the brain writes to during conversations and loads on demand in future ones.Structure (indexed-memory routing). A top-level MEMORY.md router (the load-bearing behavioral rules plus category links, always loaded, kept around 5KB). It started as a flat one-line-per-memory index but outgrew its load budget, so the bulk now lives in on-demand _index/<category>.md files (infrastructure, comms, finance, interests, and so on), each loaded only when that topic is relevant. One .md file per memory holds the body.

Five types, by intent:

  1. feedback — rules about how I should work. Corrections OR confirmed approaches. Each saves with a Why: and How to apply: line so it survives edge cases.

  2. user — facts about me (role, prefs, knowledge).

  3. interest — what I think about and ask about over time, the opinions I am forming, the topics I keep circling.

  4. project — time-bound state (active workstream context).

  5. reference — pointers to external systems or local registries.

The dialectic user model. The interest type is my version of what Hermes Agent calls a dialectic user model: a running picture of how my thinking evolves, not just facts and rules. The difference is the implementation. Hermes (and self-improving agents like OpenClaw) lean on a scheduled job that re-synthesizes a profile in the background. Mine is my own architecture: no cron, no cloud re-synthesis, just a standing rule that files an interest memory in the moment a durable or recurring signal shows up. The bar is deliberately high (recurring or emphasized only), so it captures the shape of what I care about without becoming a log of every passing question.

Filename convention: <type>_<topic>.md so the kind is obvious at a glance.

Cross-linking: memories link to each other with [[other-name]]. Over time the index becomes a small graph of how my preferences and constraints connect.

What it does NOT hold: anything derivable from the codebase, recent git activity, or task-scoped state. When I ask "save this code pattern" the right answer is usually "no, it's already in the file."How it grows: every conversation potentially adds one or two memories, corrections to my prior thinking, new working patterns, new external references. Nobody else reads them, so the file gets sharper over time.

The Local AI Layer (Qwen via LM Studio)

A local model runs alongside the frontier-API layer. Qwen3-30B-A3B (Apple MLX build, ~18GB RAM) loaded in LM Studio, exposed over an OpenAI-compatible API. Qwen3-Embedding-0.6B (sentence-transformers, CPU) as the local embedder for the vector index, plus BGE-reranker-base as a cross-encoder for the optional Stage 3 reranking pass.The local model handles the workloads where confidentiality matters most: portfolio board-pack parsing (PDFs and investor updates into KPIs, narrative, people changes), personal CFO PDF parsing (bank statements, tax documents, insurance contracts into structured JSON), and embedding the markdown corpus.The argument for keeping these local is simple. Confidentiality: board materials, LP names, founder financials, and tax filings should not pass through a third-party API. Cost: parsing hundreds of board PDFs through a frontier API would cost real money. Throughput: batch jobs run unattended overnight without rate limits.

Fig. 6 — Confidential local; cloud is scoped.

Background Jobs (launched)

The brain is mostly batch jobs, not always-on services. macOS ships with launchd , the native init and scheduler, so every recurring task is a LaunchAgent plist with its own log. The choice over cron or any homegrown runner comes down to: it's native to the OS and survives reboots.

The actual set of jobs sorts into two groups. A handful of always-on daemons keep things reachable: the WhatsApp bridge proxying the WhatsApp Web protocol, the people-server FastAPI in front of the people DB, the CFO-server FastAPI in front of the personal CFO database. Then a set of scheduled ingestion jobs wake up on a clock: email sync (hourly, work + personal), calendar sync (hourly, work + personal), WhatsApp archive (append-only), portfolio Drive scan, CRM-link-match, the nightly backup. On top of that, manual jobs run on demand: draining the portfolio parse queue through local Qwen, rebuilding the vector index after a markdown change. And conversation-time writes happen continuously: auto-memory entries during sessions.

One scheduled job stitches several of these together. The daily people-vectorize orchestrator runs in sequence: pull fresh sscrm data into people.db, re-embed any people row whose synthesized profile-text changed (into people_vec.db ), then vectorize each comms channel into comms_vec.db as its own step (iMessage, then WhatsApp, then email). Splitting the comms embed per channel means a broken source shows red on that channel's card rather than failing the whole run silently.

Knowing whether they actually ran is its own problem. A fleet of independent batch jobs has one failure mode that matters: a job dies quietly and nobody notices until the data is weeks stale. So every job is catalogued in a single registry ( job_registry.json ) with its schedule and an expected-max-gap, and one page, the Activity Hub, renders the whole fleet: category-colored pills, when each job was supposed to fire versus when it actually last fired, and a live activity log. The health colors are strict. Green means a job completed successfully, not merely that it started. Amber means it ran but could not confirm a clean exit. Red means it failed or is overdue past its gap. A separate daily watchdog cross-checks the installed LaunchAgents against the registry and flags drift in either direction: a job running that nobody registered, or a registered job whose plist has vanished. The point is to make silent failure loud, in a system whose whole value depends on the data underneath it being fresh.

Jobs are independent, not orchestrated. The brain is not a daemon thinking in the background; it is a set of single-purpose batch jobs on their own schedules. Each one's failure mode is contained. If the WhatsApp bridge dies, calendar sync still runs. If the portfolio scanner misses a file, the people DB doesn't care.

Fig. 7 — Always-on daemons and scheduled jobs.

SimonOS — The Front-End

The brain isn't useful if I can only access it through chat. SimonOS is a local HTML cockpit, one page per domain, sitting on top of the same databases Claude reads. Plain HTML + CSS + vanilla JS, served from the filesystem, opened as file:// URLs.

Every page lives at SimonOS/<side>/<category>/<slug>.html : no exceptions, no root-level pages, no per-domain folders outside this tree. Each page declares itself in a single registry file ( scripts/registry.js ) and uses a shared chrome injected at runtime by scripts/layout.js .

Pages fetch data from local FastAPIs (people DB, CFO, stats, memory control), embed Datasette views for ad-hoc database browsing, and pre-render markdown-canonical content at edit time. Vanilla JS, no build step. Edit the HTML, refresh the browser.

The discipline that keeps the cockpit usable at hundreds of pages: every page starts from the same template, registry entries are mandatory, person names are always clickable links to the people page, markdown with a sourceMd: link must regenerate the HTML in the same edit cycle, and sensitive-vault pages appear in the registry but never embed content into shared pages.

Decisions and Design Principles

A few choices that shape everything else.

A personal CRM on top of Attio

Creandum runs Attio firm-wide as the shared contact graph. I built a personal CRM next to it for workflows that don't fit a shared graph: voice capture, push notifications on validation events, the self-service share-link flow, and a two-way bridge into my local people DB.

I also keep work and private contacts in one place by design — most of life isn't cleanly one or the other — and that mix shouldn't live in shared infrastructure.

So my CRM runs for me, mixed, and syncs selectively back into Attio for contacts the team should also see. The plan is to roll the same per-user layer firm-wide with proper rights management; Attio stays the shared graph and the personal layer sits on top per user. The CRM itself is a curated phone-synced subset of the full local people DB. Integrations: Granola pulls meeting notes attached to attendees, Attio reads in the firm-wide relationship graph, LinkedIn enrichment auto-refreshes profiles. The mobile app (a PWA with push) makes capture happen in two taps, with voice dictation for notes, todos, and meeting summaries.

Two surfaces, two policiesSimonOS is filesystem-only and never on a network. The web CRM is internet-exposed but scoped to a curated subset of the data, with no tax filings, no board packs, and no portfolio financials. This way my human EA can help me manage contacts and sync everything back into my systems.

Defending against the lethal trifectaPrivate data plus untrusted input plus an outbound channel is the recipe for exfiltration via prompt injection. Any agent that ingests external content (email, web, documents) is denied outbound send capability in the same context. Any agent that can send messages is recipient-locked, with carved-out exceptions per skill, per recipient, per send approval. Embedded "send X to Y" instructions inside ingested content are treated as injection, surfaced to me, never executed. Sensitive identity documents (passport, IDs, KYC scans) live in a gitignored, TCC-protected vault; their contents never appear in any outbound output without per-use explicit approval, and a prior yes never authorizes the next share.

Secrets split by tierCredentials never live in the repo. Non-sensitive operational config (ports, flags, paths) sits in a gitignored .env . API keys and OAuth tokens are kept in the macOS keychain, pulled at process start, and never written to disk in plaintext. Identity-document scans and KYC material live in the sensitive vault. LaunchAgent plists carry paths only, never credentials. Pre-commit hooks scan for accidental leaks. Each secret lives in exactly one place, scoped to exactly the processes that need it.

Two editors, one canonical sourceObsidian is for writing prose against the markdown corpus without firing up Claude. VS Code (with the Claude Code extension) is the primary cockpit for code, skills, registry edits, diffs, and MCP debugging, with chat in the sidebar, code in the editor, and terminal below.

Other principlesStrict work/private split at every layer of the directory tree, with the 01 private/ side gitignored everywhere.

Markdown is the canonical source for any prose that has a rendered HTML twin, and the HTML has to be regenerated in the same edit cycle as the markdown change. Every capability change (new MCP, new local model, new skill, new daemon) is logged at the time of the change so the system stays auditable. Structured data goes in SQLite as soon as it stops being prose.

Learnings

A few patterns turned out to matter more than I expected.

I started by storing more things in markdown than I should have, hoping a vector index would compensate. It didn't. SQLite handles anything I would ever write a WHERE clause against; markdown is only useful for prose I would actually open in a text editor. Treating that split as a strict line made retrieval faster and the system easier to reason about.

Semantic search came later than it should have. Without it, half the corpus is effectively invisible, because I rarely remember the exact words I used, only the rough shape of the thought. A local vector index over the markdown corpus closed that gap and is now the default entry point for almost every question.

For scheduling, I tried cron, a Python supervisor, and a homegrown daemon before landing on launchd. The native OS scheduler has held up for months without surprises and gives me one line inventory plus per-job logs by default.

The local helper programs (the MCP servers) quietly pile up if you don't manage their lifecycle. Each time I reloaded the Claude Code panel in VS Code it started fresh copies, but the old ones didn't always shut down when they should have. After a long day a stack of stale processes was still running and eventually ate enough memory to crash the editor. Two fixes. First, launch each server so the editor is its direct parent, so a dead editor leaves an obviously orphaned process instead of hiding it behind a wrapper. Second, have each server notice when it has been orphaned and exit on its own.

I assumed for a while that Obsidian could be my primary interface to the brain. It works for reading and writing prose, but a real cockpit, with live data, action buttons, and hover drill-down on table rows, does things Obsidian cannot, and SimonOS is now where I actually live.

Markdown is the canonical store for any prose that has a rendered HTML twin. Whenever the markdown changes the HTML has to be regenerated in the same edit cycle. The discipline matters because rendered surfaces silently drift otherwise.

I log every capability change at the time it happens, whether that is a new MCP, a new local model, a new skill, a new daemon, a removed tool, or a swapped model. A few months in, the only reason I can still answer "what is installed and what can it do" in ten seconds is that log.

Indexing by person rather than by topic was the highest-leverage design decision. Almost every working-life question reduces to a person, and pre-resolving that lookup makes everything downstream cheaper.

A persistent memory layer the brain writes to during conversations scaled further than just throwing more tokens at a bigger context window. Each session starts from accumulated state, not from zero.

Two Claude Max subscriptions in parallel, plus OpenAI Codex as a third entry point, covers nearly all my agent capacity. Token limits are real; three accounts means I almost never hit the wall mid-task. Codex covers the workloads where its strengths fit better.

The real bottleneck is my own brain, not the model. I usually have four to eight agents running in parallel: one working on the brain itself, the others doing tasks, research, prep, emails, intros, validations, presentations, board work. Context-switching between highly technical streams is mentally exhausting, and the work requires concentration most office environments are no longer built for. I wonder, half seriously, whether the open-office era is on borrowed time. If more of us spend our days babysitting concurrent agents, a loud collaborative space is the opposite of what the work needs. I have not found a fix yet.

Local models become genuinely useful at 64GB of RAM and above. With less, you end up rationing tokens across batch jobs instead of getting actual work done. With enough headroom, the model bill on confidential workloads basically goes away. As hardware and models improve, edge AI gets more capable faster than most people realize. Dependence on frontier APIs will decrease, not increase. AI will live in pockets, watches, glasses — an allgegenwärtiger Begleiter, in two years, not five.

The system is fluid by design. Most of what's in this post will be obsolete in three to six months; that's the point. I rewrite a layer whenever I learn something better. This document is a snapshot of what was possible in May 2026, useful as inspiration, not as a blueprint to copy.

Next

Today the system is built around Claude Code. Skills are Claude-specific markdown files, and the scheduled LaunchAgents shell out to the claude CLI to do the work.

Always-on infrastructure:

  • Mac Mini host. Move the brain to a dedicated Mac Mini at the Berlin desk. Laptop becomes a thin client. Every background job stops dying the moment the laptop sleeps or travels. Today daily backups skip, the embedder falls behind, comms-vec drifts, the WhatsApp bridge needs re-pairing.

  • Tailscale mesh. Mac Mini, laptop, and phone on one private network. Every local service (people-DB, CFO, stats, memory-control, datasette, WhatsApp bridge, LM Studio) reachable by tailnet hostname from anywhere. Same URL on phone, laptop, on the road.

Surface side:

  • PWA layer for the SimonOS cockpit pages (people, portfolio, finance, today). Ships the day Tailscale lands. Browser-native, no app-store dance.

  • Native iOS for the parts the browser can't reach: voice capture for brain dumps, push notifications, contact-picker integration, share-sheet drop-in.

  • Daily morning call. Phone rings at 7am, brain reads today's calendar plus open todos plus action-required email plus flagged meetings, I ask follow-ups in conversation. ElevenLabs TTS out, Whisper-MLX in.

  • Boardy-style outbound. Today the brain is reactive — it speaks when I ask. Boardy reverses that: brain detects pending actions (todos with due dates, intros waiting on my reply, meetings without prep, contacts gone cold) and reaches out first.

Vendor independence: storage layer already neutral; skill layer needs abstraction; MCP-as-interface; local-first inference extends to the agentic layer.

Multiplayer for the team: multi-user team access, per-user/per-DB/per-record rights management, GDPR- compliant privacy with right-to-be-forgotten that propagates through every database and the vector index.

Retrieval quality, what's left on the smaller and more interesting backlog:

  • Atomic-fact extraction. The curator currently treats whatever I tell it as one fact. Mem0 splits a sentence like "Portfolio Company X's CEO is person Y and they just closed a new round" into two atomic claims and curates each independently. Next iteration.

  • Auto-memory curation. The same LLM-curation flow applied to the auto-memory layer where the highest- stakes contradictions actually live: role changes, residency changes, relationship status. Today that path is still manual edit.

  • Procedural memory shelf. The kind=procedural slot is empty in the index because auto-memory isn't indexed yet. Flip that on once the auto-memory write path is curated end-to-end.

Closing

I'm sharing this to inspire others, but really, to learn from you if you have a better approach to what I built. Reach out, teach me. Happy to buy you coffee and geek out about how AI has transformed your life. Thanks to Simon Lorenz for the initial inspiration, to Thomas Wolf for stress-testing the infrastructure and challenging architecture decisions, to Luke Harries for whatsapp-mcp which the WhatsApp ingestion sits on top of, and to everyone else who has shared their system with me so far. Every interaction has been an inspiration.

Further
articles

Johan Brenner
2025-09-10
Reflections on Klarna: The Paper Invoice
From humble beginnings to European powerhouse
Creandum Team
2023-11-16
The Hottest Tech Ecosystem at Slush? (Hint, it's not Finland)
A transformation of the Lithuanian tech scene
Johan Brenner
2025-09-10
What would it take for the next Klarna to IPO in Europe?
Making Europe an attractive IPO destination
Creandum Team
2023-10-18
AI in B2B SaaS
Beyond Vertical and into the Horizon(tal)
Creandum Team
2024-09-26