Comparing Tutti to LangGraph, CrewAI, AutoGen, and Mastra

An honest comparison. Each framework gets something right that Tutti doesn't, and vice versa. Here's where we differ, and where we'd recommend using something else.

Chihab

Building Tutti AI · 3 May 2026 · 8 min read

One of the questions I get every week is: how does Tutti compare to ? The honest answer is that the four frameworks people compare it to are good at different things, and a comparison page that just claims Tutti wins on every row would be marketing, not information.

So here's the honest version. I built a real workload on each in early 2026. These are notes from that, updated for the current state of each project as of May 2026.

LangGraph

The most powerful framework on the list. If your use case is genuinely about complex stateful workflows — a graph of agents that pass state through conditional branches, with checkpointing and replay — LangGraph is exceptional at it. The graph abstraction is right for that problem, and the LangSmith integration for tracing is mature.

Where it differs from Tutti: security and the plugin model. Permissions are convention, not enforcement. Tools are functions you register, not typed plugins you install. If your workload is more "wire up five integrations and gate the destructive ones" than "model a complex multi-step state machine," you'll feel the absence of those primitives.

Use LangGraph when: you have a real graph-shaped workflow with branching state, you want the LangSmith ecosystem, you're deep in Python.

Use Tutti when: you want declarative score files, runtime-enforced permissions, and a typed plugin model.

CrewAI

The fastest to read. CrewAI's "role + goal + backstory" decomposition is genuinely good copy and lowers the barrier to writing a first agent. If your team is mostly product-ish and the agents are mostly conversational, you'll feel productive faster on CrewAI than on anything else.

Where it differs from Tutti: tools are flat strings on a list, not typed plugins. The plugin model isn't really a plugin model — it's a registry. Eval, HITL, and observability are roadmap items, not core primitives. If you start with CrewAI and grow into a workload that needs production-grade safety, you'll need to do that wiring yourself.

Use CrewAI when: you're prototyping fast, your team isn't deep TypeScript, your workload doesn't have destructive tools.

Use Tutti when: any of those reverse — TypeScript-first, destructive tools in scope, plugin ecosystem matters.

AutoGen

Microsoft Research's framework. AutoGen has been around longer than any of the others on this list and has the academic depth to show for it — multi-agent conversation patterns, group-chat dynamics, agent-to-agent negotiation are all expressible in ways most frameworks haven't tried.

Where it differs from Tutti: the eval, HITL, and observability story is thin compared to the agent-design story. AutoGen reads like a research framework that grew a production wing. Tutti reads like a production framework that has agents in it.

Use AutoGen when: you're researching multi-agent dynamics, you want patterns that go beyond simple delegation.

Use Tutti when: you're shipping something that needs to operate at 3am.

Mastra

The TypeScript-first framework closest to Tutti in feel. Clean surface, good ergonomics, declarative API. If you'd asked me three months ago which framework I'd pick to build on top of, Mastra would have been the leading answer.

Where it differs from Tutti: it's a thinner runtime. The eval primitives, the HITL gating, the prompt-injection guard, the permission model — Mastra leaves more of those to you. The plugin ecosystem is younger. Some of this is because Mastra is doing a different scope on purpose; some of it is just the difference between two early-stage projects.

Use Mastra when: you want a clean TypeScript surface and you're comfortable wiring your own safety primitives.

Use Tutti when: you want those primitives in the runtime and you're willing to live with our opinions about how they should work.

Where Tutti is honestly weaker

A comparison post should also tell you where the framework you're considering is weaker. Tutti's current weak spots, in order of how often I get asked:

- Eval ergonomics. The CLI `tutti-ai eval` works, but the dashboard story is "your OTEL backend." LangSmith and the Mastra eval UI are both nicer to look at. - Community ecosystem. Twelve official voices. That's it today. The MCP bridge fills a lot of the gap, but a thicker community-published voice list would help. - Production case studies. I have early users, not case studies. Anyone who's running Tutti in real prod and wants to talk about it: please do. - Python support. None. Tutti is TypeScript-only by choice. If your team is Python-first, this is a real cost.

The honest summary

Tutti's bet is that the right defaults are the production defaults — runtime-enforced permissions, typed voices, prompt-injection at the architecture layer, HITL gating on, OTEL out of the box. We pay for that with a smaller ecosystem and weaker eval ergonomics today. If those trade-offs sound right, give Tutti a try. If they don't, the other four frameworks are good and I'd rather you ship on one of them than have a bad time on Tutti.

If you try Tutti and something is missing or worse than the alternatives, open an issue and tell me. The comparison only stays honest if I keep updating it.

Tags #comparison #langgraph #crewai #autogen #mastra

Older post

OpenTelemetry on day one: every step is a span

5 min · Engineering

Newer post

Smart routing: picking the cheapest model that can still do the job

8 min · Engineering