In April, Andrej Karpathy published a 75-line idea file he called the "LLM Wiki." Thousands of engineers forked it. Nobody built it for product managers. So I did — and open-sourced it.
The pattern is simple to state: instead of asking AI to re-retrieve your documents on every question, an AI agent maintains a knowledge wiki — integrating every new source, updating cross-references, flagging contradictions. Knowledge compounds instead of evaporating. Reading it, I kept thinking the same thing: this describes a PM's job better than an engineer's.
The loop every PM runs
A customer says something in a call. A metric moves. A stakeholder takes a position. A competitor ships. Six weeks later you're writing a PRD or defending a roadmap, and you're reconstructing all of it from memory, Slack scrollback, and half-remembered decks.
Three specific failures hide inside that loop:
-
1
Nothing compounds. Every document starts from scratch. The synthesis you did for last quarter's review is buried in a deck nobody will open again.
-
2
Evidence and claims get separated. "Users want X" survives; which users, how many, and what they actually said doesn't. PRDs become plausible-sounding instead of defensible.
-
3
Decision amnesia. "Why did we choose X over Y?" gets asked forever, and the rationale — the options weighed, the evidence, what would change our mind — lives nowhere.
We already know the two standard fixes, and we know why they fail. Traditional wikis — Confluence, Notion — die because maintenance grows faster than value. RAG-style "chat with your docs" tools don't accumulate; they re-retrieve fragments per question. The fix is an agent that maintains the knowledge base — because for an LLM, the cost of updating fifteen cross-referenced pages is near zero.
The PM version: four things, tracked forever
The engineering forks of Karpathy's idea track codebases and papers. The PM version needed its own entity model — the PM-shaped one. I kept it deliberately minimal: four entity types, each with a job.
- Problem pages. One per pain point, stated in the user's language — who has it, severity, a running mention count, and the evidence chain: every source that supports (or contradicts) it. Every claim links to who said it, when, and in what words.
- A decision log. Context, options considered with the evidence for each, rationale, decider — and reversal conditions: the observable facts that would reopen the decision. Never deleted; superseded decisions link forward.
- An assumption register. Every load-bearing belief, with a status — untested / validated / weakening / invalidated — and a dated history of every status flip with the source that caused it. This is your risk register.
- An open-questions queue. Every contradiction and evidence gap the weekly health check finds becomes a tagged research question: ask users, check data, ask a stakeholder, run an experiment. Your next interview script generates itself from this queue.
Personas, competitors, metrics, and bets emerge later — from evidence, not from an empty template. That restraint is the point: the schema stays small enough that the agent can hold the whole system honest.
Deliverables become queries
The division of labor is strict. Sources — interview transcripts, feedback digests, analytics readouts, meeting notes, competitor intel — are immutable; you curate what goes in. The wiki is the compounding artifact the agent maintains: every ingest updates cross-references, bumps evidence counts, and flags contradictions, so Thursday's synthesis already includes Tuesday's interview. Outputs are generated on demand.
And that last layer is where the payoff lands. A PRD or stakeholder brief that used to take days of re-gathering context is generated in minutes from evidence that's already synthesized, cross-referenced, and cited. "Why did we choose X over Y?" is answered in seconds, forever, with the original options, evidence, and reversal conditions attached.
"PRDs and stakeholder briefs stop being documents you write. They become queries against evidence you've already accumulated."— TheGlocalPM
The pressure test
Before publishing, I pressure-tested the system the only way that counts: I handed the rulebook to a completely fresh AI agent with a one-line prompt and no other context. It checked a decision's reversal conditions before touching it, refused to change an assumption's status without my approval, and generated a fully-cited PRD from the worked example. The rulebook held.
That matters because the schema — one file of rules, conventions, and workflows — is the real product. It's what turns a generic chatbot into a disciplined wiki maintainer. The repo ships it as a copy-paste template, alongside a full worked example (a fictional fitness app with a churn problem) and a quick-start guide any PM can follow in about fifteen minutes. No code involved anywhere: if you can use a chat tool and a folder, you can run this.
The honest cost
The system isn't free. It costs roughly thirty focused minutes a week: feeding in the sources you decide matter, and running a weekly health check where the agent surfaces contradictions, weakening assumptions, and gaps. The agent does the bookkeeping that kills every Confluence wiki; you keep the judgment. That trade — bookkeeping to the machine, judgment to the human — is exactly the shape I think AI-augmented product work should take, and it's the same argument I made in The Future of AI in Product Management: the PM's job shifts from coordinator to orchestrator.
Clone it, fork it, run it
Everything is free and open source under MIT: the schema, the entity model, the rituals, and the worked example. To the best of my knowledge it's the first open-source instantiation of the LLM Wiki pattern purpose-built for product management — and PM work is arguably the best fit for the pattern anywhere.
Credit where it belongs: the pattern is Andrej Karpathy's. The PM instantiation is my contribution back.