Why I'm building Tutti AI

I spent early 2026 trying to ship a real multi-agent workflow on top of every framework that almost worked. Each had a different deal-breaker. So I started writing the one I wished existed.

Chihab

Building Tutti AI · 8 March 2026 · 6 min read

I spent a chunk of early 2026 trying to ship a real multi-agent workflow on top of an existing framework. The system needed to read from GitHub, query Postgres, run a browser, post to Slack — and crucially, never do anything destructive without an operator saying so. I evaluated the four frameworks people usually compare on real code, not on marketing pages.

LangGraph is powerful, but security is an afterthought. Tools are functions; permissions are convention. You can write the gating yourself, but you'll get it wrong, and the framework won't tell you.

CrewAI moves fast and reads well, but treats tools as a flat list. There's no real plugin model — every team rebuilds the same Stripe wrapper, the same GitHub wrapper, the same Slack wrapper.

AutoGen has academic genealogy and lots of patterns, but weak primitives where it counts. Eval, HITL, and observability all read like next-quarter features.

Mastra has a clean TypeScript surface and an obvious aesthetic answer, but the eval and HITL primitives I needed were thin.

Each one had something to recommend it. None of them got the defaults right.

I wrote some throwaway code to compare them on the same task. The result was the same every time: I needed to bolt on the security boundaries, the plugin model, the typed config, the eval harness. By the time I'd done that, I'd written most of an agent framework.

So I started writing one on purpose.

What "getting the defaults right" means

I want a framework where you can declare an agent in twenty lines, and the runtime — not the docs — enforces:

- Voices declare which permissions they need (`network`, `filesystem`, `shell`, `browser`). Agents grant them explicitly. The runtime refuses to load otherwise. - Tool inputs are validated with Zod before they execute. Path traversals, private IPs, dangerous URL schemes — blocked at the boundary. - Tool outputs pass through a prompt-injection guard before reaching the model. - Every tool with real-world side effects — posting, deleting, paying, sending — is marked `destructive: true` and can be gated behind human approval with a single flag. - Token and cost budgets are hard limits. The agent loop halts when they're exceeded.

None of those are research. They're table stakes. They should be the boring default, not the premium tier.

What Tutti is

Tutti is the framework I wished existed. It treats agents as declarative score files instead of imperative graphs. It treats tools as a typed plugin model with versioned, npm-installable packages — what we call voices. It puts security, observability, evals, and HITL in the runtime on day one, not as quarterly roadmap items.

It's open source. Apache 2.0. TypeScript end-to-end. Twelve official voices today, more coming, and a built-in MCP bridge so anything with an MCP server is reachable with one install.

Where this is going

I'm one person right now. The roadmap is in GitHub Issues. The architectural decisions are written down in CLAUDE.md, which I update as I learn. I'll be writing here weekly — about the framework, the trade-offs I'm making, and the comparisons I want to be honest about rather than slick.

If you're trying to ship a real agent system and feeling the same pain, give Tutti a try and tell me where it breaks. The fastest signal is a GitHub issue. The hardest one is silence.

Tags #manifesto #origin

Newer post

Configuration over code: why score files beat Python graphs

7 min · Engineering

What "getting the defaults right" means

What Tutti is

Where this is going

Start conducting.