Memory & Sessions
Session persistence, PostgreSQL storage, and semantic long-term memory
Tutti has two kinds of memory:
- Session memory — conversation history within a single session (short-term)
- Semantic memory — facts the agent remembers across sessions (long-term)
Session memory
When you call tutti.run(), the runtime creates a session automatically. The session stores the full message history.
const r1 = await tutti.run("assistant", "My name is Alice.");
// r1.session_id = "abc-123"
const r2 = await tutti.run("assistant", "What's my name?", r1.session_id);
// r2.output = "Your name is Alice."
Without a session_id, the agent starts fresh with no memory.
In-memory store (default)
Sessions live in memory. They’re lost when the process exits.
const tutti = new TuttiRuntime(score);
// sessions are in-memory by default
PostgreSQL store
Sessions persist across process restarts:
npx tutti-ai add postgres
DATABASE_URL=postgres://user:pass@localhost:5432/tutti
export default defineScore({
provider: new AnthropicProvider(),
memory: { provider: "postgres" },
agents: { /* ... */ },
});
Use the async factory — it creates the tutti_sessions table on first run:
const tutti = await TuttiRuntime.create(score);
The table schema:
| Column | Type | Description |
|---|---|---|
id | TEXT PRIMARY KEY | Session UUID |
agent_name | TEXT | Agent that owns the session |
messages | JSONB | Full conversation history |
created_at | TIMESTAMPTZ | Session creation time |
updated_at | TIMESTAMPTZ | Last update time |
:::tip
For production, always use TuttiRuntime.create() instead of new TuttiRuntime() — it handles the async database initialization.
:::
Semantic memory (long-term)
Semantic memory lets agents remember facts across sessions. When a user tells the coder agent “I prefer 2-space indentation”, the agent remembers this in the next session.
Enable it per agent
{
coder: {
name: "Coder",
system_prompt: "You are a TypeScript developer.",
memory: {
semantic: {
enabled: true,
max_memories: 5, // inject up to 5 relevant memories (default)
inject_system: true, // append to system prompt (default)
curated_tools: true, // expose remember/recall/forget tools to the agent (default)
max_entries_per_agent: 1000, // LRU cap per agent (default)
},
},
voices: [new FilesystemVoice()],
permissions: ["filesystem"],
},
}
How it works
- Before each LLM call, the runner searches semantic memory using the user’s input as a query
- The top N relevant memories are appended to the system prompt:
Relevant context from previous sessions: - User prefers 2-space indentation - Project uses ESM modules - The LLM sees these as context and can act on them
Storing memories from tools
Tools receive context.memory helpers when semantic memory is enabled:
execute: async (input, context) => {
// Store a fact
await context.memory?.remember("User prefers dark mode");
// Search for relevant memories
const prefs = await context.memory?.recall("UI preferences");
// → [{ id: "abc", content: "User prefers dark mode" }]
// Delete a memory
await context.memory?.forget("abc");
return { content: "Preferences updated." };
}
Agent-curated memory tools
When curated_tools is on (the default), the runtime exposes
remember, recall, and forget as Tools the model itself can call
across turns. Entries written by the model are tagged
source: "agent", and a per-agent cap (max_entries_per_agent,
default 1000) evicts the least-recently-used entry first when the
cap is reached. Both surfaces — the context.memory helpers above
and the agent-callable tools — share one enforcement pipeline, so
the cap, LRU eviction, and memory:write / memory:read /
memory:delete events fire exactly once per logical operation.
Subscribe to the events to observe what the agent is curating:
tutti.events.on("memory:write", (e) => {
console.log(`agent ${e.agent_name} stored ${e.entry_id} (${e.source})`);
});
A two-turn end-to-end example lives at
examples/curated-memory.ts.
Storing memories from your code
Access the semantic memory store directly on the runtime:
const tutti = new TuttiRuntime(score);
await tutti.semanticMemory.add({
agent_name: "coder",
content: "User prefers 2-space indentation",
metadata: { source: "onboarding" },
});
const memories = await tutti.semanticMemory.search(
"code style",
"coder",
5,
);
How search works (v1)
The InMemorySemanticStore uses keyword overlap scoring — no embeddings needed. It tokenises the query and each stored entry into word sets, scores by overlap ratio, and returns the top N.
This is simple and predictable. Future versions will support embedding-based search via custom SemanticMemoryStore implementations.
Memory lifecycle
| Method | What it does |
|---|---|
store.add({ agent_name, content, metadata }) | Store a new memory |
store.search(query, agent_name, limit) | Search by keyword overlap |
store.delete(id) | Delete one memory |
store.clear(agent_name) | Delete all memories for an agent |
Multi-turn sessions
Each run() is one turn (which may involve multiple LLM calls if tools are used). Across multiple run() calls with the same session, the full history accumulates:
const r1 = await tutti.run("coder", "Create hello.ts");
const r2 = await tutti.run("coder", "Add tests for it", r1.session_id);
// r2 has full context of r1
Token usage
Usage is reported per run(), not per session:
result.usage.input_tokens // tokens sent to the LLM
result.usage.output_tokens // tokens received
result.turns // agentic loop iterations
:::tip Long sessions accumulate large message histories, increasing input token counts. Start fresh sessions for unrelated tasks, and use semantic memory for facts that should persist. :::