Memory

How the agent remembers across sessions. LanceDB-backed semantic memory, the recorder pattern, and the recall tool.

The agent doesn't carry conversation history in its context. The database does. Memory is queried on demand.

Three pieces: a recorder that extracts structured records from each chat turn, a LanceDB store with local embeddings, and a recall tool the agent calls when it needs prior context.

The pieces

LanceDB store

Embedded vector database at /data/memory inside the api container, volume-mounted for persistence. Embeddings computed locally via fastembed (ONNX, ~80MB model).

Schema:

{
  "id": str,           # uuid
  "type": str,         # decision | lesson | preference | skill_idea | topic | fact
  "content": str,      # 1-3 sentences
  "vector": float[384], # auto-computed from content
  "project_tags": list[str],
  "source_thread_id": str,
  "ts": float,
  "why": str           # one-line rationale
}

The recorder

After every chat call, FastAPI fires a BackgroundTask that:

Sends (user_message, agent_response) to Claude Haiku 4.5
Haiku extracts 0-3 typed records per turn (preferences, decisions, lessons, etc.)
Records inserted into LanceDB with embeddings

Errors are logged, not raised — recorder failures don't block the chat response.

The recall tool

Exposed to the agent as recall(query, limit, record_type, project_tag). The agent calls it when it might benefit from prior context:

> recall("how should I respond to this user")
[preference] User prefers terse responses with no trailing summaries.
   - tags: agents, workflow
   - why: Guides how the agent should format all future responses.

The agent decides WHEN to recall; the schema decides WHAT comes back.

Backfill from existing transcripts

The recorder only sees new chats. To populate memory from prior Claude Code sessions:

python scripts/backfill-claude-code.py
python scripts/backfill-claude-code.py --project c--Users-Liz-Make-Skills --limit 50
python scripts/backfill-claude-code.py --dry-run

Reads JSONL transcripts at ~/.claude/projects/<project>/<session>.jsonl, pairs user→agent turns, posts each pair to /memory/ingest for re-extraction via Haiku. Approximately $0.05 per 50-turn session.

In the UI

humancensys.com/memory (and localhost:3000/memory in self-host) is a search page. Queries return records ranked by semantic similarity. Filters: type, project_tag.

Two-mode notes

Self-host: memory lives in your local LanceDB volume. Single user, no scoping needed.
Hosted-multitenant: memory is tenant_id-scoped — each user sees only their own records. Public knowledge commons (Pillar 3c) will be a separate, opt-in shared store.

What's next

Two modes — how isolation works in hosted mode
Architecture — the layered model

Memory

On this page