Memory
How the agent remembers across sessions. LanceDB-backed semantic memory, the recorder pattern, and the recall tool.
The agent doesn't carry conversation history in its context. The database does. Memory is queried on demand.
Three pieces: a recorder that extracts structured records from each chat turn, a LanceDB store with local embeddings, and a recall tool the agent calls when it needs prior context.
The pieces
LanceDB store
Embedded vector database at /data/memory inside the api container, volume-mounted for persistence. Embeddings computed locally via fastembed (ONNX, ~80MB model).
Schema:
{
"id": str, # uuid
"type": str, # decision | lesson | preference | skill_idea | topic | fact
"content": str, # 1-3 sentences
"vector": float[384], # auto-computed from content
"project_tags": list[str],
"source_thread_id": str,
"ts": float,
"why": str # one-line rationale
}The recorder
After every chat call, FastAPI fires a BackgroundTask that:
- Sends
(user_message, agent_response)to Claude Haiku 4.5 - Haiku extracts 0-3 typed records per turn (preferences, decisions, lessons, etc.)
- Records inserted into LanceDB with embeddings
Errors are logged, not raised — recorder failures don't block the chat response.
The recall tool
Exposed to the agent as recall(query, limit, record_type, project_tag). The agent calls it when it might benefit from prior context:
> recall("how should I respond to this user")
[preference] User prefers terse responses with no trailing summaries.
- tags: agents, workflow
- why: Guides how the agent should format all future responses.The agent decides WHEN to recall; the schema decides WHAT comes back.
Backfill from existing transcripts
The recorder only sees new chats. To populate memory from prior Claude Code sessions:
python scripts/backfill-claude-code.py
python scripts/backfill-claude-code.py --project c--Users-Liz-Make-Skills --limit 50
python scripts/backfill-claude-code.py --dry-runReads JSONL transcripts at ~/.claude/projects/<project>/<session>.jsonl, pairs user→agent turns, posts each pair to /memory/ingest for re-extraction via Haiku. Approximately $0.05 per 50-turn session.
In the UI
humancensys.com/memory (and localhost:3000/memory in self-host) is a search page. Queries return records ranked by semantic similarity. Filters: type, project_tag.
Two-mode notes
- Self-host: memory lives in your local LanceDB volume. Single user, no scoping needed.
- Hosted-multitenant: memory is
tenant_id-scoped — each user sees only their own records. Public knowledge commons (Pillar 3c) will be a separate, opt-in shared store.
What's next
- Two modes — how isolation works in hosted mode
- Architecture — the layered model