Every agent framework we evaluated in early 2024 made the same mistake. They optimized for the demo, not for the deployment.
You could spin up a chain in five lines of Python and get a compelling terminal recording. But the moment you needed retries, persistent memory across sessions, multi-model routing, cost controls, or any kind of observability — you were on your own.
The gap we kept hitting
We were building an internal invoice auditing agent at a fintech company. The happy path worked in a weekend. The production path took four months.
We needed the agent to fail gracefully when Stripe rate-limited us. We needed it to remember context from a previous audit session. We needed to route expensive reasoning to GPT-5 and simple extraction to a smaller model. We needed traces so we could debug why the agent hallucinated a tool call at 3am on a Tuesday.
None of the existing frameworks handled this. They gave us the first 20% and left us to build the remaining 80% from scratch.
What we decided to build
Agentkit is the 80% that nobody else ships. It is a framework built around six primitives that every production agent needs:
Runtime — stateful execution with retries, interrupts, and human-in-the-loop validation. Not just a loop that calls an LLM.
Tools — a typed tool system with 40 pre-built integrations and a custom tool API that generates schemas from your TypeScript types.
Memory — persistent context that survives across runs, sessions, and users. Vector search and structured state, handled without a separate database.
Models — multi-model routing that picks the right model per task, fails over automatically, and optimizes cost without code changes.
Trace — full observability for every run. Every thought, tool call, and token logged. Replay any execution and debug failures.
Trust — SOC 2 Type II, GDPR, and on-prem deployment. Granular role-based access and full audit logs.
Why a framework, not a platform
We thought hard about building a managed platform from day one. We decided against it for a simple reason: developers do not trust black boxes with their agent logic.
Agentkit runs in your infrastructure. You own the code. You own the data. You can read every line of the runtime because the core is open source.
The cloud offering exists for teams who want managed infrastructure, but it is optional. The framework stands alone.
Where we are now
Agentkit runs over 1.2 million agent executions per month across 500 teams. Average cost per run is $0.34. P50 latency is 1.4 seconds. Error rate is 0.02%.
We are just getting started. The gap between demo agents and production agents is still enormous, and we intend to close it.
