Core Concepts

Understand agents, voices, scores, and the Tutti runtime

Score

A score is the top-level configuration file (tutti.score.ts). It defines which LLM provider to use and what agents are available.

import { AnthropicProvider, defineScore } from "@tuttiai/core";

export default defineScore({
  name: "my-project",
  provider: new AnthropicProvider(),
  default_model: "claude-sonnet-4-20250514",
  agents: {
    assistant: { /* ... */ },
    coder: { /* ... */ },
  },
});

The defineScore() function is a typed identity function — it gives you autocomplete and type checking with zero runtime overhead. The score is Zod-validated when loaded.

Agents

An agent is an LLM-powered worker. Each agent has:

Field	Required	Description
`name`	Yes	Display name
`system_prompt`	Yes	Instructions for the LLM
`voices`	Yes	Array of voice instances (can be empty)
`model`	No	Overrides `default_model` from the score
`permissions`	No	Permissions granted to this agent’s voices
`max_turns`	No	Max agentic loop iterations (default: 10)
`max_tool_calls`	No	Max tool calls per run (default: 20)
`tool_timeout_ms`	No	Per-tool timeout in ms (default: 30000)
`budget`	No	Token + per-run / daily / monthly USD limits for this agent
`memory`	No	Long-term memory (`{ semantic?, user_memory? }`). See Memory.
`streaming`	No	Enable token-by-token streaming (default: false)
`delegates`	No	Agent IDs this orchestrator can delegate to
`role`	No	`"orchestrator"` or `"specialist"`
`durable`	No	Checkpoint between turns to Redis/Postgres so crashed runs can resume.
`schedule`	No	Cron / interval / one-shot trigger — see the scheduler guide.
`outputSchema`	No	Zod schema; the agent returns a validated typed object.
`allow_human_input`	No	Agent can emit `hitl:requested` events to ask a human.
`requireApproval`	No	Gate specific tool calls behind an interrupt that must be approved.
`beforeRun` / `afterRun`	No	Guardrail hooks (validation, PII redaction, topic blocking).

{
  coder: {
    name: "Coder",
    model: "claude-sonnet-4-20250514",
    system_prompt: "You are a senior TypeScript developer.",
    voices: [new FilesystemVoice()],
    permissions: ["filesystem"],
    max_turns: 15,
    budget: { max_tokens: 50_000, warn_at_percent: 80 },
  },
}

budget accepts:

Field	Effect
`max_tokens`	Soft stop — emits `budget:exceeded` and returns the partial result.
`max_cost_usd`	Hard stop — throws `BudgetExceededError` with `scope: 'run'` once the run’s accumulated cost crosses the cap.
`max_cost_usd_per_day`	Hard daily cap — aggregates across every run that started since 00:00 UTC. Requires a `RunCostStore` on the runtime.
`max_cost_usd_per_month`	Hard monthly cap — aggregates across every run that started since the 1st of the current UTC month. Requires a `RunCostStore`.
`warn_at_percent`	Threshold (default 80) at which `budget:warning` fires. Applied per-scope.

Wire the store on the runtime to enable daily/monthly enforcement:

import { TuttiRuntime, InMemoryRunCostStore, PostgresRunCostStore } from "@tuttiai/core";

// Single-process / dev:
const runtime = new TuttiRuntime(score, {
  runCostStore: new InMemoryRunCostStore(),
});

// Multi-process / prod — every worker shares one daily total:
const runtime = new TuttiRuntime(score, {
  runCostStore: new PostgresRunCostStore({
    connection_string: process.env.DATABASE_URL!,
  }),
});

Without a store, daily/monthly limits log a one-time warning per run and are skipped (the per-run cap still applies). Use Postgres in any deployment with more than one worker — the in-memory backend cannot coordinate across processes.

Voices

A voice is a pluggable package that gives agents tools. Think of it as a capability module — filesystem access, GitHub integration, browser control, or anything you build.

Each voice declares:

name — identifier
tools — array of tool definitions (name, description, Zod schema, execute function)
required_permissions — what permissions the agent must grant

import { FilesystemVoice } from "@tuttiai/filesystem";

// This voice provides: read_file, write_file, list_directory,
// create_directory, delete_file, move_file, search_files
const fs = new FilesystemVoice();
fs.name;                  // "filesystem"
fs.required_permissions;  // ["filesystem"]
fs.tools.length;          // 7

Official voices:

Voice	Package	Permissions	Tools
Filesystem	`@tuttiai/filesystem`	`filesystem`	7 tools
GitHub	`@tuttiai/github`	`network`	10 tools
Playwright	`@tuttiai/playwright`	`network`, `browser`	12 tools
Web	`@tuttiai/web`	`network`	3 tools
Sandbox	`@tuttiai/sandbox`	`shell`	4 tools
RAG	`@tuttiai/rag`	`filesystem`, `network`	ingest / chunk / embed / search
MCP Bridge	`@tuttiai/mcp`	`network`	dynamic (any MCP server)

See the Voices Overview for details on each.

Runtime

The runtime (TuttiRuntime) is the engine. It takes a score, creates the event bus and session store, and runs the agentic loop.

const tutti = new TuttiRuntime(score);

// Run an agent
const result = await tutti.run("coder", "Fix the bug in index.ts");

// Continue a conversation
const result2 = await tutti.run("coder", "Now add tests", result.session_id);

The agentic loop works like this:

Send the conversation to the LLM
If the LLM returns text — done
If the LLM returns tool calls — execute them, append results, go to step 1
Repeat until max_turns or budget is exhausted

Events

Every step emits a typed event on the event bus:

tutti.events.on("tool:start", (e) => {
  console.log(`Using tool: ${e.tool_name}`);
});

tutti.events.on("budget:warning", (e) => {
  // e.scope is 'run' | 'day' | 'month' (absent on token-only warnings).
  console.log(`Budget warning [${e.scope ?? "run"}]: $${e.cost_usd} of $${e.limit ?? "?"}`);
});

tutti.events.onAny((e) => {
  console.log(`[${e.type}]`, JSON.stringify(e));
});

Available events: agent:start, agent:end, llm:request, llm:response, tool:start, tool:end, tool:error, turn:start, turn:end, delegate:start, delegate:end, parallel:start, parallel:complete, cache:hit, cache:miss, security:injection_detected, budget:warning, budget:exceeded, token:stream, hitl:requested, hitl:answered, hitl:timeout.

Event handlers are isolated — a throwing handler is logged and siblings keep firing, so a bad telemetry subscriber can’t crash an agent run.

Sessions

Sessions track conversation history. The runtime creates a session automatically on the first run() call and returns a session_id. Pass it back to continue the conversation.

const r1 = await tutti.run("assistant", "Hello");
const r2 = await tutti.run("assistant", "What did I just say?", r1.session_id);
// r2 has full context of the prior turn

By default, sessions live in memory. Add memory: { provider: "postgres" } to your score to persist them to PostgreSQL.

Memory

Tutti has three kinds of memory, each configured separately:

Type	Scope	Backend	Config
Session	One conversation	In-memory / Postgres	`score.memory.provider`
Semantic	All sessions for an agent	In-memory / Postgres	`agent.memory.semantic`
User	All sessions for an end-user, across agents	Postgres	`agent.memory.user_memory`

Semantic memory lets agents remember facts about their own work — project context, past decisions, recurring preferences. Two surfaces share the same backing store: relevant entries are auto-injected into the system prompt at the start of each turn, and the agent can call remember / recall / forget as tools to curate memory itself. Enable per agent:

{
  coder: {
    memory: {
      semantic: {
        enabled: true,
        max_memories: 5,        // entries injected per turn (default 5)
        max_entries_per_agent: 1000, // LRU cap per agent (default 1000)
        curated_tools: true,    // expose remember/recall/forget tools (default true)
      },
    },
    // ...
  },
}

User memory attaches facts to an end-user identifier and auto-injects them into the system prompt on every run, so agents remember who they’re talking to across sessions. Optionally, memories can be auto-extracted from conversation. Backed by TUTTI_PG_URL.

{
  assistant: {
    memory: {
      user_memory: { enabled: true, auto_extract: true, max_memories: 20 },
    },
  },
}

// At runtime — pass the end-user id so memories are scoped correctly:
await tutti.run("assistant", "Hi again", undefined, { user_id: "alice" });

Inspect or edit user memories via the tutti-ai memory CLI.

Tools can explicitly store semantic memories via context.memory.remember(). See the Memory & Sessions guide for full details.

Tool result caching

Repeated tool calls — same tool, same input — can be served from an in-memory cache instead of re-executing. Opt in per agent:

{
  researcher: {
    voices: [new FilesystemVoice()],
    cache: {
      enabled: true,
      ttl_ms: 60_000,                  // optional: default 5 min
      excluded_tools: ["run_migration"] // in addition to built-in write-tool exclusions
    },
  },
}

Known write tools (write_file, delete_file, move_file, create_issue, comment_on_issue) and errored results are never cached. Cache keys are scoped per agent, so a poisoned result from one agent can’t be served to another with a different trust model. Observe with the cache:hit / cache:miss events. See the Tool Result Caching guide for details, including custom cache backends.

Parallel execution

Fan one input out to several agents simultaneously by setting entry to a parallel config:

defineScore({
  provider: new AnthropicProvider(),
  entry: { type: "parallel", agents: ["bull", "bear"] },
  agents: { bull: { /* ... */ }, bear: { /* ... */ } },
});

router.run(input) dispatches to every listed agent at once (each with its own session) and returns a merged AgentResult. For per-agent inputs, timeouts, or rollup metrics, call router.runParallel() / router.runParallelWithSummary() directly. A failed agent never blocks the others — it surfaces as a synthetic [error] entry in the result map. Observe with the parallel:start / parallel:complete events. See the Multi-Agent guide.

Permissions

Voices declare what they need. Agents declare what they grant. If there’s a mismatch, the runtime throws before executing anything.

{
  coder: {
    voices: [new FilesystemVoice()],  // requires: ["filesystem"]
    permissions: ["filesystem"],       // granted — OK
  },
  reader: {
    voices: [new FilesystemVoice()],  // requires: ["filesystem"]
    permissions: [],                   // not granted — throws!
  },
}

The four permission types: filesystem, network, shell, browser.

Streaming

Enable token-by-token streaming on any agent:

{
  assistant: {
    name: "Assistant",
    system_prompt: "You are helpful.",
    voices: [],
    streaming: true,
  },
}

When streaming: true, the runtime uses provider.stream() instead of provider.chat(). Each text token emits a token:stream event:

tutti.events.on("token:stream", (e) => {
  process.stdout.write(e.text);
});

The tutti-ai run command enables streaming automatically — tokens print to the terminal as they arrive.

All three providers support streaming: Anthropic (message stream events), OpenAI (delta chunks), and Gemini (content stream).

Logging

Tutti uses structured logging via pino. All runtime events, provider calls, and errors are logged with structured context.

import { createLogger, logger } from "@tuttiai/core";

// Default logger
logger.info({ agent: "assistant" }, "Agent started");

// Custom named logger
const myLogger = createLogger("my-app");

Control the log level with the TUTTI_LOG_LEVEL environment variable:

TUTTI_LOG_LEVEL=debug npx tsx app.ts  # debug, info, warn, error

In development, logs are colorized via pino-pretty. In production (NODE_ENV=production), logs output as raw JSON for log aggregation.

Telemetry

Tracing is always on — the runtime emits spans for every agent run, LLM call, and tool invocation through a built-in in-process tracer (TuttiTracer from @tuttiai/telemetry). No config needed for local inspection:

tutti-ai serve            # in one shell
tutti-ai traces list      # in another — see the last 20 runs
tutti-ai traces show <id> # render every span in a trace as an indented tree
tutti-ai traces tail      # live-tail spans as they are emitted

To export spans to an external backend (Grafana Tempo, Honeycomb, Jaeger, etc.), add an OTLP endpoint to your score:

export default defineScore({
  provider: new AnthropicProvider(),
  telemetry: {
    enabled: true,
    endpoint: "http://localhost:4318",  // OTLP HTTP endpoint
    headers: { Authorization: "Bearer ..." },  // optional
  },
  agents: { /* ... */ },
});

Span tree:

agent.run (agent.name=assistant, session.id=abc-123)
  ├── llm.call (llm.model=claude-sonnet-4-20250514)
  ├── tool.call (tool.name=read_file)
  ├── llm.call (llm.model=claude-sonnet-4-20250514)
  └── ...

Cost estimates are attached to every llm.call span via the built-in MODEL_PRICES table — register custom models with registerModelPrice() from @tuttiai/telemetry.

MCP Bridge

The @tuttiai/mcp voice wraps any MCP server as a Tutti voice. Tools are discovered dynamically at runtime:

import { McpVoice } from "@tuttiai/mcp";

const mcp = new McpVoice({ server: "npx @playwright/mcp" });

// The agent gets ALL tools from the MCP server
{
  browser: {
    name: "Browser",
    system_prompt: "You control a browser.",
    voices: [mcp],
    permissions: ["network"],
  },
}

The voice starts the MCP server as a child process, connects via stdio transport, calls listTools() to discover available tools, and proxies execute() calls through callTool().

Edit this page on GitHub →