Memory & Sessions

Session persistence, PostgreSQL storage, and semantic long-term memory

Tutti has two kinds of memory:

Session memory — conversation history within a single session (short-term)
Semantic memory — facts the agent remembers across sessions (long-term)

Session memory

When you call tutti.run(), the runtime creates a session automatically. The session stores the full message history.

const r1 = await tutti.run("assistant", "My name is Alice.");
// r1.session_id = "abc-123"

const r2 = await tutti.run("assistant", "What's my name?", r1.session_id);
// r2.output = "Your name is Alice."

Without a session_id, the agent starts fresh with no memory.

In-memory store (default)

Sessions live in memory. They’re lost when the process exits.

const tutti = new TuttiRuntime(score);
// sessions are in-memory by default

PostgreSQL store

Sessions persist across process restarts:

npx tutti-ai add postgres

DATABASE_URL=postgres://user:pass@localhost:5432/tutti

export default defineScore({
  provider: new AnthropicProvider(),
  memory: { provider: "postgres" },
  agents: { /* ... */ },
});

Use the async factory — it creates the tutti_sessions table on first run:

const tutti = await TuttiRuntime.create(score);

The table schema:

Column	Type	Description
`id`	`TEXT PRIMARY KEY`	Session UUID
`agent_name`	`TEXT`	Agent that owns the session
`messages`	`JSONB`	Full conversation history
`created_at`	`TIMESTAMPTZ`	Session creation time
`updated_at`	`TIMESTAMPTZ`	Last update time

:::tip For production, always use TuttiRuntime.create() instead of new TuttiRuntime() — it handles the async database initialization. :::

Semantic memory (long-term)

Semantic memory lets agents remember facts across sessions. When a user tells the coder agent “I prefer 2-space indentation”, the agent remembers this in the next session.

Enable it per agent

{
  coder: {
    name: "Coder",
    system_prompt: "You are a TypeScript developer.",
    memory: {
      semantic: {
        enabled: true,
        max_memories: 5,             // inject up to 5 relevant memories (default)
        inject_system: true,         // append to system prompt (default)
        curated_tools: true,         // expose remember/recall/forget tools to the agent (default)
        max_entries_per_agent: 1000, // LRU cap per agent (default)
      },
    },
    voices: [new FilesystemVoice()],
    permissions: ["filesystem"],
  },
}

How it works

Before each LLM call, the runner searches semantic memory using the user’s input as a query

The top N relevant memories are appended to the system prompt:

Relevant context from previous sessions:
- User prefers 2-space indentation
- Project uses ESM modules

The LLM sees these as context and can act on them

Storing memories from tools

Tools receive context.memory helpers when semantic memory is enabled:

execute: async (input, context) => {
  // Store a fact
  await context.memory?.remember("User prefers dark mode");

  // Search for relevant memories
  const prefs = await context.memory?.recall("UI preferences");
  // → [{ id: "abc", content: "User prefers dark mode" }]

  // Delete a memory
  await context.memory?.forget("abc");

  return { content: "Preferences updated." };
}

Agent-curated memory tools

When curated_tools is on (the default), the runtime exposes remember, recall, and forget as Tools the model itself can call across turns. Entries written by the model are tagged source: "agent", and a per-agent cap (max_entries_per_agent, default 1000) evicts the least-recently-used entry first when the cap is reached. Both surfaces — the context.memory helpers above and the agent-callable tools — share one enforcement pipeline, so the cap, LRU eviction, and memory:write / memory:read / memory:delete events fire exactly once per logical operation.

Subscribe to the events to observe what the agent is curating:

tutti.events.on("memory:write", (e) => {
  console.log(`agent ${e.agent_name} stored ${e.entry_id} (${e.source})`);
});

A two-turn end-to-end example lives at examples/curated-memory.ts.

Storing memories from your code

Access the semantic memory store directly on the runtime:

const tutti = new TuttiRuntime(score);

await tutti.semanticMemory.add({
  agent_name: "coder",
  content: "User prefers 2-space indentation",
  metadata: { source: "onboarding" },
});

const memories = await tutti.semanticMemory.search(
  "code style",
  "coder",
  5,
);

How search works (v1)

The InMemorySemanticStore uses keyword overlap scoring — no embeddings needed. It tokenises the query and each stored entry into word sets, scores by overlap ratio, and returns the top N.

This is simple and predictable. Future versions will support embedding-based search via custom SemanticMemoryStore implementations.

Memory lifecycle

Method	What it does
`store.add({ agent_name, content, metadata })`	Store a new memory
`store.search(query, agent_name, limit)`	Search by keyword overlap
`store.delete(id)`	Delete one memory
`store.clear(agent_name)`	Delete all memories for an agent

Multi-turn sessions

Each run() is one turn (which may involve multiple LLM calls if tools are used). Across multiple run() calls with the same session, the full history accumulates:

const r1 = await tutti.run("coder", "Create hello.ts");
const r2 = await tutti.run("coder", "Add tests for it", r1.session_id);
// r2 has full context of r1

Token usage

Usage is reported per run(), not per session:

result.usage.input_tokens   // tokens sent to the LLM
result.usage.output_tokens  // tokens received
result.turns                // agentic loop iterations

:::tip Long sessions accumulate large message histories, increasing input token counts. Start fresh sessions for unrelated tasks, and use semantic memory for facts that should persist. :::

Edit this page on GitHub →