OpenTelemetry on day one: every step is a span

If you can't explain what an agent did six hours after it did it, you can't operate it. Tutti emits OTEL spans for every run, LLM call, and tool invocation — and you wire it into whatever you already use.

Chihab

Building Tutti AI · 26 April 2026 · 5 min read

An agent system without observability is a system you can run, not a system you can operate. The two are not the same.

Running it means watching the demo work. Operating it means: a customer says "your bot deleted my issue at 14:30 yesterday and I want to know why." You need to find the run, find the span, find the prompt, find the tool call, find the model decision — in under a minute, six hours later. That's a different problem.

Tutti is built to be operable from day one. Every run, every LLM call, every tool invocation, every routing decision, every interrupt is an OpenTelemetry span. There's no "enable observability" toggle. Spans are how the runtime works internally; you just point them somewhere.

What gets traced

The `@tuttiai/telemetry` package wraps the agent loop in a parent span and adds child spans for:

- `agent.run` — the top-level call. Has the agent name, the input, the session ID, the model used. - `llm.call` — every Anthropic / OpenAI / Gemini round-trip. Has the model, the input/output token counts, the cost estimate, whether the call hit cache. - `tool.call` — every voice tool invocation. Has the voice name, tool name, sanitised input, result size, error flag. - `router.decision` — when `SmartProvider` is used, every routing decision. Has the input classifier signal, the tier chosen, the model selected, the cost ceiling. - `interrupt.requested` — every HITL pause. Has the tool, the arguments, who approved, how long it waited.

That's already enough to answer "why did the agent do X?" without asking the agent.

How you wire it up

OTEL_EXPORTER_OTLP_ENDPOINT=https://api.honeycomb.io/v1/traces \
OTEL_EXPORTER_OTLP_HEADERS=x-honeycomb-team=YOUR_KEY \
tutti-ai run

That's the entire setup for Honeycomb. Same shape for Tempo, for Datadog, for Jaeger, for any OTLP collector. Tutti uses the OpenTelemetry SDK directly — no Tutti-specific dashboard, no Tutti-hosted ingestion, no proprietary format. Your existing observability stack gets the data; your existing alerting fires on it.

If you don't have an OTEL collector wired up yet, `@tuttiai/telemetry` ships with a `JsonFileExporter` that writes spans to a local JSON file you can grep. It's not pretty but it's enough for development.

Why "OTEL or nothing" is right

A dependence on a Tutti-hosted observability product would mean: vendor lock-in, latency in the data path, an extra bill, and a blast radius on Tutti's uptime. We chose to stand on OpenTelemetry instead. It's the standard. Every observability vendor speaks it. Your team probably already runs an OTEL collector for everything else. There's no reason for an agent framework to invent its own.

What you get from this

Three things, immediately:

1. Replay. Use `tutti-ai replay ` to walk through the trace step-by-step, see exactly what each agent saw, and time-travel-debug a bad run. 2. Cost attribution. Every `llm.call` span includes a `cost.usd` attribute. Sum the spans to get a per-run, per-agent, per-model cost without writing a billing integration. 3. Routing audit. When `SmartProvider` saves you 60% on Haiku turns, the `router.decision` spans are how you prove it. They show the classifier signal, the tier, the model, and the dollar saved versus a hypothetical opus-everywhere baseline.

You can ship without OTEL. You won't operate without OTEL. Wiring it up before you have an incident is significantly cheaper than wiring it up during one.

Tags #observability #OTEL #telemetry

Older post

HITL gating destructive tools by default, not opt-in

6 min · Engineering

Newer post

Comparing Tutti to LangGraph, CrewAI, AutoGen, and Mastra

8 min · Product

What gets traced

How you wire it up

Why "OTEL or nothing" is right

What you get from this

Start conducting.