Core Concepts
Understand agents, voices, scores, and the Tutti runtime
Score
A score is the top-level configuration file (tutti.score.ts). It defines which LLM provider to use and what agents are available.
import { AnthropicProvider, defineScore } from "@tuttiai/core";
export default defineScore({
name: "my-project",
provider: new AnthropicProvider(),
default_model: "claude-sonnet-4-20250514",
agents: {
assistant: { /* ... */ },
coder: { /* ... */ },
},
});
The defineScore() function is a typed identity function — it gives you autocomplete and type checking with zero runtime overhead. The score is Zod-validated when loaded.
Agents
An agent is an LLM-powered worker. Each agent has:
| Field | Required | Description |
|---|---|---|
name | Yes | Display name |
system_prompt | Yes | Instructions for the LLM |
voices | Yes | Array of voice instances (can be empty) |
model | No | Overrides default_model from the score |
permissions | No | Permissions granted to this agent’s voices |
max_turns | No | Max agentic loop iterations (default: 10) |
max_tool_calls | No | Max tool calls per run (default: 20) |
tool_timeout_ms | No | Per-tool timeout in ms (default: 30000) |
budget | No | Token + per-run / daily / monthly USD limits for this agent |
memory | No | Long-term memory ({ semantic?, user_memory? }). See Memory. |
streaming | No | Enable token-by-token streaming (default: false) |
delegates | No | Agent IDs this orchestrator can delegate to |
role | No | "orchestrator" or "specialist" |
durable | No | Checkpoint between turns to Redis/Postgres so crashed runs can resume. |
schedule | No | Cron / interval / one-shot trigger — see the scheduler guide. |
outputSchema | No | Zod schema; the agent returns a validated typed object. |
allow_human_input | No | Agent can emit hitl:requested events to ask a human. |
requireApproval | No | Gate specific tool calls behind an interrupt that must be approved. |
beforeRun / afterRun | No | Guardrail hooks (validation, PII redaction, topic blocking). |
{
coder: {
name: "Coder",
model: "claude-sonnet-4-20250514",
system_prompt: "You are a senior TypeScript developer.",
voices: [new FilesystemVoice()],
permissions: ["filesystem"],
max_turns: 15,
budget: { max_tokens: 50_000, warn_at_percent: 80 },
},
}
budget accepts:
| Field | Effect |
|---|---|
max_tokens | Soft stop — emits budget:exceeded and returns the partial result. |
max_cost_usd | Hard stop — throws BudgetExceededError with scope: 'run' once the run’s accumulated cost crosses the cap. |
max_cost_usd_per_day | Hard daily cap — aggregates across every run that started since 00:00 UTC. Requires a RunCostStore on the runtime. |
max_cost_usd_per_month | Hard monthly cap — aggregates across every run that started since the 1st of the current UTC month. Requires a RunCostStore. |
warn_at_percent | Threshold (default 80) at which budget:warning fires. Applied per-scope. |
Wire the store on the runtime to enable daily/monthly enforcement:
import { TuttiRuntime, InMemoryRunCostStore, PostgresRunCostStore } from "@tuttiai/core";
// Single-process / dev:
const runtime = new TuttiRuntime(score, {
runCostStore: new InMemoryRunCostStore(),
});
// Multi-process / prod — every worker shares one daily total:
const runtime = new TuttiRuntime(score, {
runCostStore: new PostgresRunCostStore({
connection_string: process.env.DATABASE_URL!,
}),
});
Without a store, daily/monthly limits log a one-time warning per run and are skipped (the per-run cap still applies). Use Postgres in any deployment with more than one worker — the in-memory backend cannot coordinate across processes.
Voices
A voice is a pluggable package that gives agents tools. Think of it as a capability module — filesystem access, GitHub integration, browser control, or anything you build.
Each voice declares:
name— identifiertools— array of tool definitions (name, description, Zod schema, execute function)required_permissions— what permissions the agent must grant
import { FilesystemVoice } from "@tuttiai/filesystem";
// This voice provides: read_file, write_file, list_directory,
// create_directory, delete_file, move_file, search_files
const fs = new FilesystemVoice();
fs.name; // "filesystem"
fs.required_permissions; // ["filesystem"]
fs.tools.length; // 7
Official voices:
| Voice | Package | Permissions | Tools |
|---|---|---|---|
| Filesystem | @tuttiai/filesystem | filesystem | 7 tools |
| GitHub | @tuttiai/github | network | 10 tools |
| Playwright | @tuttiai/playwright | network, browser | 12 tools |
| Web | @tuttiai/web | network | 3 tools |
| Sandbox | @tuttiai/sandbox | shell | 4 tools |
| RAG | @tuttiai/rag | filesystem, network | ingest / chunk / embed / search |
| MCP Bridge | @tuttiai/mcp | network | dynamic (any MCP server) |
See the Voices Overview for details on each.
Runtime
The runtime (TuttiRuntime) is the engine. It takes a score, creates the event bus and session store, and runs the agentic loop.
const tutti = new TuttiRuntime(score);
// Run an agent
const result = await tutti.run("coder", "Fix the bug in index.ts");
// Continue a conversation
const result2 = await tutti.run("coder", "Now add tests", result.session_id);
The agentic loop works like this:
- Send the conversation to the LLM
- If the LLM returns text — done
- If the LLM returns tool calls — execute them, append results, go to step 1
- Repeat until
max_turnsor budget is exhausted
Events
Every step emits a typed event on the event bus:
tutti.events.on("tool:start", (e) => {
console.log(`Using tool: ${e.tool_name}`);
});
tutti.events.on("budget:warning", (e) => {
// e.scope is 'run' | 'day' | 'month' (absent on token-only warnings).
console.log(`Budget warning [${e.scope ?? "run"}]: $${e.cost_usd} of $${e.limit ?? "?"}`);
});
tutti.events.onAny((e) => {
console.log(`[${e.type}]`, JSON.stringify(e));
});
Available events: agent:start, agent:end, llm:request, llm:response, tool:start, tool:end, tool:error, turn:start, turn:end, delegate:start, delegate:end, parallel:start, parallel:complete, cache:hit, cache:miss, security:injection_detected, budget:warning, budget:exceeded, token:stream, hitl:requested, hitl:answered, hitl:timeout.
Event handlers are isolated — a throwing handler is logged and siblings keep firing, so a bad telemetry subscriber can’t crash an agent run.
Sessions
Sessions track conversation history. The runtime creates a session automatically on the first run() call and returns a session_id. Pass it back to continue the conversation.
const r1 = await tutti.run("assistant", "Hello");
const r2 = await tutti.run("assistant", "What did I just say?", r1.session_id);
// r2 has full context of the prior turn
By default, sessions live in memory. Add memory: { provider: "postgres" } to your score to persist them to PostgreSQL.
Memory
Tutti has three kinds of memory, each configured separately:
| Type | Scope | Backend | Config |
|---|---|---|---|
| Session | One conversation | In-memory / Postgres | score.memory.provider |
| Semantic | All sessions for an agent | In-memory / Postgres | agent.memory.semantic |
| User | All sessions for an end-user, across agents | Postgres | agent.memory.user_memory |
Semantic memory lets agents remember facts about their own work — project context, past decisions, recurring preferences. Two surfaces share the same backing store: relevant entries are auto-injected into the system prompt at the start of each turn, and the agent can call remember / recall / forget as tools to curate memory itself. Enable per agent:
{
coder: {
memory: {
semantic: {
enabled: true,
max_memories: 5, // entries injected per turn (default 5)
max_entries_per_agent: 1000, // LRU cap per agent (default 1000)
curated_tools: true, // expose remember/recall/forget tools (default true)
},
},
// ...
},
}
User memory attaches facts to an end-user identifier and auto-injects them into the system prompt on every run, so agents remember who they’re talking to across sessions. Optionally, memories can be auto-extracted from conversation. Backed by TUTTI_PG_URL.
{
assistant: {
memory: {
user_memory: { enabled: true, auto_extract: true, max_memories: 20 },
},
},
}
// At runtime — pass the end-user id so memories are scoped correctly:
await tutti.run("assistant", "Hi again", undefined, { user_id: "alice" });
Inspect or edit user memories via the tutti-ai memory CLI.
Tools can explicitly store semantic memories via context.memory.remember(). See the Memory & Sessions guide for full details.
Tool result caching
Repeated tool calls — same tool, same input — can be served from an in-memory cache instead of re-executing. Opt in per agent:
{
researcher: {
voices: [new FilesystemVoice()],
cache: {
enabled: true,
ttl_ms: 60_000, // optional: default 5 min
excluded_tools: ["run_migration"] // in addition to built-in write-tool exclusions
},
},
}
Known write tools (write_file, delete_file, move_file, create_issue, comment_on_issue) and errored results are never cached. Cache keys are scoped per agent, so a poisoned result from one agent can’t be served to another with a different trust model. Observe with the cache:hit / cache:miss events. See the Tool Result Caching guide for details, including custom cache backends.
Parallel execution
Fan one input out to several agents simultaneously by setting entry to a parallel config:
defineScore({
provider: new AnthropicProvider(),
entry: { type: "parallel", agents: ["bull", "bear"] },
agents: { bull: { /* ... */ }, bear: { /* ... */ } },
});
router.run(input) dispatches to every listed agent at once (each with its own session) and returns a merged AgentResult. For per-agent inputs, timeouts, or rollup metrics, call router.runParallel() / router.runParallelWithSummary() directly. A failed agent never blocks the others — it surfaces as a synthetic [error] entry in the result map. Observe with the parallel:start / parallel:complete events. See the Multi-Agent guide.
Permissions
Voices declare what they need. Agents declare what they grant. If there’s a mismatch, the runtime throws before executing anything.
{
coder: {
voices: [new FilesystemVoice()], // requires: ["filesystem"]
permissions: ["filesystem"], // granted — OK
},
reader: {
voices: [new FilesystemVoice()], // requires: ["filesystem"]
permissions: [], // not granted — throws!
},
}
The four permission types: filesystem, network, shell, browser.
Streaming
Enable token-by-token streaming on any agent:
{
assistant: {
name: "Assistant",
system_prompt: "You are helpful.",
voices: [],
streaming: true,
},
}
When streaming: true, the runtime uses provider.stream() instead of provider.chat(). Each text token emits a token:stream event:
tutti.events.on("token:stream", (e) => {
process.stdout.write(e.text);
});
The tutti-ai run command enables streaming automatically — tokens print to the terminal as they arrive.
All three providers support streaming: Anthropic (message stream events), OpenAI (delta chunks), and Gemini (content stream).
Logging
Tutti uses structured logging via pino. All runtime events, provider calls, and errors are logged with structured context.
import { createLogger, logger } from "@tuttiai/core";
// Default logger
logger.info({ agent: "assistant" }, "Agent started");
// Custom named logger
const myLogger = createLogger("my-app");
Control the log level with the TUTTI_LOG_LEVEL environment variable:
TUTTI_LOG_LEVEL=debug npx tsx app.ts # debug, info, warn, error
In development, logs are colorized via pino-pretty. In production (NODE_ENV=production), logs output as raw JSON for log aggregation.
Telemetry
Tracing is always on — the runtime emits spans for every agent run, LLM call, and tool invocation through a built-in in-process tracer (TuttiTracer from @tuttiai/telemetry). No config needed for local inspection:
tutti-ai serve # in one shell
tutti-ai traces list # in another — see the last 20 runs
tutti-ai traces show <id> # render every span in a trace as an indented tree
tutti-ai traces tail # live-tail spans as they are emitted
To export spans to an external backend (Grafana Tempo, Honeycomb, Jaeger, etc.), add an OTLP endpoint to your score:
export default defineScore({
provider: new AnthropicProvider(),
telemetry: {
enabled: true,
endpoint: "http://localhost:4318", // OTLP HTTP endpoint
headers: { Authorization: "Bearer ..." }, // optional
},
agents: { /* ... */ },
});
Span tree:
agent.run (agent.name=assistant, session.id=abc-123)
├── llm.call (llm.model=claude-sonnet-4-20250514)
├── tool.call (tool.name=read_file)
├── llm.call (llm.model=claude-sonnet-4-20250514)
└── ...
Cost estimates are attached to every llm.call span via the built-in MODEL_PRICES table — register custom models with registerModelPrice() from @tuttiai/telemetry.
MCP Bridge
The @tuttiai/mcp voice wraps any MCP server as a Tutti voice. Tools are discovered dynamically at runtime:
import { McpVoice } from "@tuttiai/mcp";
const mcp = new McpVoice({ server: "npx @playwright/mcp" });
// The agent gets ALL tools from the MCP server
{
browser: {
name: "Browser",
system_prompt: "You control a browser.",
voices: [mcp],
permissions: ["network"],
},
}
The voice starts the MCP server as a child process, connects via stdio transport, calls listTools() to discover available tools, and proxies execute() calls through callTool().