Core Concepts

Understand agents, voices, scores, and the Tutti runtime

Score

A score is the top-level configuration file (tutti.score.ts). It defines which LLM provider to use and what agents are available.

import { AnthropicProvider, defineScore } from "@tuttiai/core";

export default defineScore({
  name: "my-project",
  provider: new AnthropicProvider(),
  default_model: "claude-sonnet-4-20250514",
  agents: {
    assistant: { /* ... */ },
    coder: { /* ... */ },
  },
});

The defineScore() function is a typed identity function — it gives you autocomplete and type checking with zero runtime overhead. The score is Zod-validated when loaded.

Agents

An agent is an LLM-powered worker. Each agent has:

FieldRequiredDescription
nameYesDisplay name
system_promptYesInstructions for the LLM
voicesYesArray of voice instances (can be empty)
modelNoOverrides default_model from the score
permissionsNoPermissions granted to this agent’s voices
max_turnsNoMax agentic loop iterations (default: 10)
max_tool_callsNoMax tool calls per run (default: 20)
tool_timeout_msNoPer-tool timeout in ms (default: 30000)
budgetNoToken + per-run / daily / monthly USD limits for this agent
memoryNoLong-term memory ({ semantic?, user_memory? }). See Memory.
streamingNoEnable token-by-token streaming (default: false)
delegatesNoAgent IDs this orchestrator can delegate to
roleNo"orchestrator" or "specialist"
durableNoCheckpoint between turns to Redis/Postgres so crashed runs can resume.
scheduleNoCron / interval / one-shot trigger — see the scheduler guide.
outputSchemaNoZod schema; the agent returns a validated typed object.
allow_human_inputNoAgent can emit hitl:requested events to ask a human.
requireApprovalNoGate specific tool calls behind an interrupt that must be approved.
beforeRun / afterRunNoGuardrail hooks (validation, PII redaction, topic blocking).
{
  coder: {
    name: "Coder",
    model: "claude-sonnet-4-20250514",
    system_prompt: "You are a senior TypeScript developer.",
    voices: [new FilesystemVoice()],
    permissions: ["filesystem"],
    max_turns: 15,
    budget: { max_tokens: 50_000, warn_at_percent: 80 },
  },
}

budget accepts:

FieldEffect
max_tokensSoft stop — emits budget:exceeded and returns the partial result.
max_cost_usdHard stop — throws BudgetExceededError with scope: 'run' once the run’s accumulated cost crosses the cap.
max_cost_usd_per_dayHard daily cap — aggregates across every run that started since 00:00 UTC. Requires a RunCostStore on the runtime.
max_cost_usd_per_monthHard monthly cap — aggregates across every run that started since the 1st of the current UTC month. Requires a RunCostStore.
warn_at_percentThreshold (default 80) at which budget:warning fires. Applied per-scope.

Wire the store on the runtime to enable daily/monthly enforcement:

import { TuttiRuntime, InMemoryRunCostStore, PostgresRunCostStore } from "@tuttiai/core";

// Single-process / dev:
const runtime = new TuttiRuntime(score, {
  runCostStore: new InMemoryRunCostStore(),
});

// Multi-process / prod — every worker shares one daily total:
const runtime = new TuttiRuntime(score, {
  runCostStore: new PostgresRunCostStore({
    connection_string: process.env.DATABASE_URL!,
  }),
});

Without a store, daily/monthly limits log a one-time warning per run and are skipped (the per-run cap still applies). Use Postgres in any deployment with more than one worker — the in-memory backend cannot coordinate across processes.

Voices

A voice is a pluggable package that gives agents tools. Think of it as a capability module — filesystem access, GitHub integration, browser control, or anything you build.

Each voice declares:

  • name — identifier
  • tools — array of tool definitions (name, description, Zod schema, execute function)
  • required_permissions — what permissions the agent must grant
import { FilesystemVoice } from "@tuttiai/filesystem";

// This voice provides: read_file, write_file, list_directory,
// create_directory, delete_file, move_file, search_files
const fs = new FilesystemVoice();
fs.name;                  // "filesystem"
fs.required_permissions;  // ["filesystem"]
fs.tools.length;          // 7

Official voices:

VoicePackagePermissionsTools
Filesystem@tuttiai/filesystemfilesystem7 tools
GitHub@tuttiai/githubnetwork10 tools
Playwright@tuttiai/playwrightnetwork, browser12 tools
Web@tuttiai/webnetwork3 tools
Sandbox@tuttiai/sandboxshell4 tools
RAG@tuttiai/ragfilesystem, networkingest / chunk / embed / search
MCP Bridge@tuttiai/mcpnetworkdynamic (any MCP server)

See the Voices Overview for details on each.

Runtime

The runtime (TuttiRuntime) is the engine. It takes a score, creates the event bus and session store, and runs the agentic loop.

const tutti = new TuttiRuntime(score);

// Run an agent
const result = await tutti.run("coder", "Fix the bug in index.ts");

// Continue a conversation
const result2 = await tutti.run("coder", "Now add tests", result.session_id);

The agentic loop works like this:

  1. Send the conversation to the LLM
  2. If the LLM returns text — done
  3. If the LLM returns tool calls — execute them, append results, go to step 1
  4. Repeat until max_turns or budget is exhausted

Events

Every step emits a typed event on the event bus:

tutti.events.on("tool:start", (e) => {
  console.log(`Using tool: ${e.tool_name}`);
});

tutti.events.on("budget:warning", (e) => {
  // e.scope is 'run' | 'day' | 'month' (absent on token-only warnings).
  console.log(`Budget warning [${e.scope ?? "run"}]: $${e.cost_usd} of $${e.limit ?? "?"}`);
});

tutti.events.onAny((e) => {
  console.log(`[${e.type}]`, JSON.stringify(e));
});

Available events: agent:start, agent:end, llm:request, llm:response, tool:start, tool:end, tool:error, turn:start, turn:end, delegate:start, delegate:end, parallel:start, parallel:complete, cache:hit, cache:miss, security:injection_detected, budget:warning, budget:exceeded, token:stream, hitl:requested, hitl:answered, hitl:timeout.

Event handlers are isolated — a throwing handler is logged and siblings keep firing, so a bad telemetry subscriber can’t crash an agent run.

Sessions

Sessions track conversation history. The runtime creates a session automatically on the first run() call and returns a session_id. Pass it back to continue the conversation.

const r1 = await tutti.run("assistant", "Hello");
const r2 = await tutti.run("assistant", "What did I just say?", r1.session_id);
// r2 has full context of the prior turn

By default, sessions live in memory. Add memory: { provider: "postgres" } to your score to persist them to PostgreSQL.

Memory

Tutti has three kinds of memory, each configured separately:

TypeScopeBackendConfig
SessionOne conversationIn-memory / Postgresscore.memory.provider
SemanticAll sessions for an agentIn-memory / Postgresagent.memory.semantic
UserAll sessions for an end-user, across agentsPostgresagent.memory.user_memory

Semantic memory lets agents remember facts about their own work — project context, past decisions, recurring preferences. Two surfaces share the same backing store: relevant entries are auto-injected into the system prompt at the start of each turn, and the agent can call remember / recall / forget as tools to curate memory itself. Enable per agent:

{
  coder: {
    memory: {
      semantic: {
        enabled: true,
        max_memories: 5,        // entries injected per turn (default 5)
        max_entries_per_agent: 1000, // LRU cap per agent (default 1000)
        curated_tools: true,    // expose remember/recall/forget tools (default true)
      },
    },
    // ...
  },
}

User memory attaches facts to an end-user identifier and auto-injects them into the system prompt on every run, so agents remember who they’re talking to across sessions. Optionally, memories can be auto-extracted from conversation. Backed by TUTTI_PG_URL.

{
  assistant: {
    memory: {
      user_memory: { enabled: true, auto_extract: true, max_memories: 20 },
    },
  },
}

// At runtime — pass the end-user id so memories are scoped correctly:
await tutti.run("assistant", "Hi again", undefined, { user_id: "alice" });

Inspect or edit user memories via the tutti-ai memory CLI.

Tools can explicitly store semantic memories via context.memory.remember(). See the Memory & Sessions guide for full details.

Tool result caching

Repeated tool calls — same tool, same input — can be served from an in-memory cache instead of re-executing. Opt in per agent:

{
  researcher: {
    voices: [new FilesystemVoice()],
    cache: {
      enabled: true,
      ttl_ms: 60_000,                  // optional: default 5 min
      excluded_tools: ["run_migration"] // in addition to built-in write-tool exclusions
    },
  },
}

Known write tools (write_file, delete_file, move_file, create_issue, comment_on_issue) and errored results are never cached. Cache keys are scoped per agent, so a poisoned result from one agent can’t be served to another with a different trust model. Observe with the cache:hit / cache:miss events. See the Tool Result Caching guide for details, including custom cache backends.

Parallel execution

Fan one input out to several agents simultaneously by setting entry to a parallel config:

defineScore({
  provider: new AnthropicProvider(),
  entry: { type: "parallel", agents: ["bull", "bear"] },
  agents: { bull: { /* ... */ }, bear: { /* ... */ } },
});

router.run(input) dispatches to every listed agent at once (each with its own session) and returns a merged AgentResult. For per-agent inputs, timeouts, or rollup metrics, call router.runParallel() / router.runParallelWithSummary() directly. A failed agent never blocks the others — it surfaces as a synthetic [error] entry in the result map. Observe with the parallel:start / parallel:complete events. See the Multi-Agent guide.

Permissions

Voices declare what they need. Agents declare what they grant. If there’s a mismatch, the runtime throws before executing anything.

{
  coder: {
    voices: [new FilesystemVoice()],  // requires: ["filesystem"]
    permissions: ["filesystem"],       // granted — OK
  },
  reader: {
    voices: [new FilesystemVoice()],  // requires: ["filesystem"]
    permissions: [],                   // not granted — throws!
  },
}

The four permission types: filesystem, network, shell, browser.

Streaming

Enable token-by-token streaming on any agent:

{
  assistant: {
    name: "Assistant",
    system_prompt: "You are helpful.",
    voices: [],
    streaming: true,
  },
}

When streaming: true, the runtime uses provider.stream() instead of provider.chat(). Each text token emits a token:stream event:

tutti.events.on("token:stream", (e) => {
  process.stdout.write(e.text);
});

The tutti-ai run command enables streaming automatically — tokens print to the terminal as they arrive.

All three providers support streaming: Anthropic (message stream events), OpenAI (delta chunks), and Gemini (content stream).

Logging

Tutti uses structured logging via pino. All runtime events, provider calls, and errors are logged with structured context.

import { createLogger, logger } from "@tuttiai/core";

// Default logger
logger.info({ agent: "assistant" }, "Agent started");

// Custom named logger
const myLogger = createLogger("my-app");

Control the log level with the TUTTI_LOG_LEVEL environment variable:

TUTTI_LOG_LEVEL=debug npx tsx app.ts  # debug, info, warn, error

In development, logs are colorized via pino-pretty. In production (NODE_ENV=production), logs output as raw JSON for log aggregation.

Telemetry

Tracing is always on — the runtime emits spans for every agent run, LLM call, and tool invocation through a built-in in-process tracer (TuttiTracer from @tuttiai/telemetry). No config needed for local inspection:

tutti-ai serve            # in one shell
tutti-ai traces list      # in another — see the last 20 runs
tutti-ai traces show <id> # render every span in a trace as an indented tree
tutti-ai traces tail      # live-tail spans as they are emitted

To export spans to an external backend (Grafana Tempo, Honeycomb, Jaeger, etc.), add an OTLP endpoint to your score:

export default defineScore({
  provider: new AnthropicProvider(),
  telemetry: {
    enabled: true,
    endpoint: "http://localhost:4318",  // OTLP HTTP endpoint
    headers: { Authorization: "Bearer ..." },  // optional
  },
  agents: { /* ... */ },
});

Span tree:

agent.run (agent.name=assistant, session.id=abc-123)
  ├── llm.call (llm.model=claude-sonnet-4-20250514)
  ├── tool.call (tool.name=read_file)
  ├── llm.call (llm.model=claude-sonnet-4-20250514)
  └── ...

Cost estimates are attached to every llm.call span via the built-in MODEL_PRICES table — register custom models with registerModelPrice() from @tuttiai/telemetry.

MCP Bridge

The @tuttiai/mcp voice wraps any MCP server as a Tutti voice. Tools are discovered dynamically at runtime:

import { McpVoice } from "@tuttiai/mcp";

const mcp = new McpVoice({ server: "npx @playwright/mcp" });

// The agent gets ALL tools from the MCP server
{
  browser: {
    name: "Browser",
    system_prompt: "You control a browser.",
    voices: [mcp],
    permissions: ["network"],
  },
}

The voice starts the MCP server as a child process, connects via stdio transport, calls listTools() to discover available tools, and proxies execute() calls through callTool().

Edit this page on GitHub →