Multi-Model Routing Explained

Person using a smartphone over a notebook

Priya Anand

5 min read

Multi-Model Routing Explained

Not every task needs your most expensive model. A simple data extraction does not require GPT-5. A complex multi-step reasoning chain should not be sent to a 7B parameter model to save a fraction of a cent.

Agentkit's model router makes this decision automatically, per task, per step, inside a single agent run.

How it works

When an agent executes a step, the router evaluates three signals:

Task complexity. The router classifies each step as simple extraction, moderate reasoning, or complex multi-step. This classification is based on the prompt structure, tool requirements, and output schema.

Cost budget. Every agent run can have a cost ceiling. The router tracks cumulative spend across steps and downshifts to cheaper models as the budget depletes. A $1.00 max-cost run will start with GPT-5 for planning and finish with Haiku for formatting.

Latency target. If a step has a timeout, the router prefers faster models. Streaming a tool-call confirmation to a user in 200ms is more important than using the most capable model.

The routing table

You can configure routing explicitly or let the defaults handle it:

models:
  planning: gpt-5
  extraction: claude-haiku
  reasoning: claude-sonnet
  fallback: gemini-flash
  
routing:
  strategy: cost-aware
  max_cost_per_run: $1.00
  fallback_on_timeout: true
models:
  planning: gpt-5
  extraction: claude-haiku
  reasoning: claude-sonnet
  fallback: gemini-flash
  
routing:
  strategy: cost-aware
  max_cost_per_run: $1.00
  fallback_on_timeout: true
models:
  planning: gpt-5
  extraction: claude-haiku
  reasoning: claude-sonnet
  fallback: gemini-flash
  
routing:
  strategy: cost-aware
  max_cost_per_run: $1.00
  fallback_on_timeout: true

If you do not configure anything, Agentkit defaults to cost-aware routing with GPT-5 as the primary and Claude Haiku as the fallback.

Real-world impact

One of our early users was spending $40,000 per month on GPT-5 for a support automation agent. After enabling multi-model routing, 70% of their steps were routed to smaller models. Their bill dropped to $11,000. They measured no quality degradation on their internal eval suite.

The key insight is that most agent runs are 80% simple steps and 20% hard steps. You only need your best model for the hard parts.

Failover

Model providers go down. Rate limits hit. Agentkit handles this automatically. If GPT-5 returns a 429 or 500, the router retries once, then falls back to the next model in the chain. The trace logs which model handled which step so you can audit routing decisions after the fact.

Bring your own keys

On Pro and Custom plans, you can connect your own API keys for OpenAI, Anthropic, Google, and self-hosted endpoints. Agentkit charges only for framework usage, not for model tokens. You pay your model provider directly.

Related articles

Related articles

Usage-based pricing that scales with you.

Usage-based pricing that scales with you.

Start free, pay for what runs. No seats, no platform fees, no surprise overages — just runs, tokens, and the tools you actually use.

Start free, pay for what runs. No seats, no platform fees, no surprise overages — just runs, tokens, and the tools you actually use.

Start free, pay for what runs. No seats, no platform fees, no surprise overages — just runs, tokens, and the tools you actually use.

Start free, pay for what runs. No seats, no platform fees, no surprise overages — just runs, tokens, and the tools you actually use.

The framework for building, deploying, and observing production AI agents. Made in Berlin, shipped globally.

© AgentKit Inc.
Berlin — 2026

The framework for building, deploying, and observing production AI agents. Made in Berlin, shipped globally.

© AgentKit Inc.
Berlin — 2026

The framework for building, deploying, and observing production AI agents. Made in Berlin, shipped globally.

© AgentKit Inc.
Berlin — 2026

Create a free website with Framer, the website builder loved by startups, designers and agencies.