Why We Built Agentkit

Person working at a desk with a laptop and books.

Sana Lindqvist

May 18, 2026

8 min read

Every agent framework we evaluated in early 2024 made the same mistake. They optimized for the demo, not for the deployment.

You could spin up a chain in five lines of Python and get a compelling terminal recording. But the moment you needed retries, persistent memory across sessions, multi-model routing, cost controls, or any kind of observability — you were on your own.

The gap we kept hitting

We were building an internal invoice auditing agent at a fintech company. The happy path worked in a weekend. The production path took four months.

We needed the agent to fail gracefully when Stripe rate-limited us. We needed it to remember context from a previous audit session. We needed to route expensive reasoning to GPT-5 and simple extraction to a smaller model. We needed traces so we could debug why the agent hallucinated a tool call at 3am on a Tuesday.

None of the existing frameworks handled this. They gave us the first 20% and left us to build the remaining 80% from scratch.

What we decided to build

Agentkit is the 80% that nobody else ships. It is a framework built around six primitives that every production agent needs:

Runtime — stateful execution with retries, interrupts, and human-in-the-loop validation. Not just a loop that calls an LLM.

Tools — a typed tool system with 40 pre-built integrations and a custom tool API that generates schemas from your TypeScript types.

Memory — persistent context that survives across runs, sessions, and users. Vector search and structured state, handled without a separate database.

Models — multi-model routing that picks the right model per task, fails over automatically, and optimizes cost without code changes.

Trace — full observability for every run. Every thought, tool call, and token logged. Replay any execution and debug failures.

Trust — SOC 2 Type II, GDPR, and on-prem deployment. Granular role-based access and full audit logs.

Why a framework, not a platform

We thought hard about building a managed platform from day one. We decided against it for a simple reason: developers do not trust black boxes with their agent logic.

Agentkit runs in your infrastructure. You own the code. You own the data. You can read every line of the runtime because the core is open source.

The cloud offering exists for teams who want managed infrastructure, but it is optional. The framework stands alone.

Where we are now

Agentkit runs over 1.2 million agent executions per month across 500 teams. Average cost per run is $0.34. P50 latency is 1.4 seconds. Error rate is 0.02%.

We are just getting started. The gap between demo agents and production agents is still enormous, and we intend to close it.

Building Accessible Components from the Ground Up

Every agent framework we tried made the same mistake: they optimized for demos instead of production. Here is what we did differently and why it matters.

September 28, 2025

Marcus Reid

Multi-Model Routing Explained

Every agent framework we tried made the same mistake: they optimized for demos instead of production. Here is what we did differently and why it matters.

October 22, 2025

Priya Anand

Agent Observability Beyond Logs

Every agent framework we tried made the same mistake: they optimized for demos instead of production. Here is what we did differently and why it matters.

November 3, 2025

Marcus Chen

Back home

Usage-based pricing that scales with you.

Start free, pay for what runs. No seats, no platform fees, no surprise overages — just runs, tokens, and the tools you actually use.

book a demo

Start free, pay for what runs. No seats, no platform fees, no surprise overages — just runs, tokens, and the tools you actually use.

book a demo

Why We Built Agentkit

The gap we kept hitting

What we decided to build

Why a framework, not a platform

Where we are now

Related articles

Related articles

Usage-based pricing that scales with you.

Usage-based pricing that scales with you.