How to Cut Your Agent Costs by 60%
When we launched Agentkit's cloud offering, our own internal agents were spending $0.89 per run. Three months later, we got that down to $0.34 with no quality degradation. Here is exactly what we did.
Strategy 1: Multi-model routing
This is the single highest-impact change. Most agent runs are 80% simple steps — data extraction, formatting, summarization — and 20% hard steps — planning, reasoning, decision-making.
We configured the router to send simple steps to Claude Haiku ($0.25 per million input tokens) and hard steps to GPT-5 ($30 per million input tokens). The result: 70% of our token spend shifted to cheaper models.
Impact: -52% on model costs.
Strategy 2: Prompt caching
Many agent runs use similar system prompts, tool definitions, and few-shot examples. Anthropic and OpenAI both offer prompt caching that reduces the cost of repeated prompt prefixes.
We restructured our prompts to front-load the static parts (system instructions, tool schemas, guardrails) and append dynamic parts (user input, context) at the end. This maximized cache hit rates.
Impact: -18% on input token costs.
Strategy 3: Prompt compression
Agent prompts accumulate context over multi-step runs. By step seven, the prompt might include the full history of previous steps, tool outputs, and intermediate reasoning.
We implemented progressive summarization: after each step, the full output is compressed into a structured summary. The agent works from summaries, not raw history. Token count per step dropped from an average of 4,200 to 1,800.
Impact: -25% on total token usage.
Strategy 4: Tool output batching
Some tools return large payloads. A Stripe invoice list might return 50KB of JSON. Sending all of that to the model is wasteful when the agent only needs three fields per invoice.
We added output schemas to our tools. The tool extracts only the fields the agent needs before passing the result to the model. A 50KB Stripe response becomes a 2KB structured summary.
Impact: -30% on tool-related token costs.
Combined result
These four strategies are not additive — they compound. Model routing reduces the base cost. Caching reduces the per-call cost. Compression reduces the per-step cost. Batching reduces the per-tool cost.
Combined, our average cost per run dropped from $0.89 to $0.34. On high-volume agents running 100,000 times per month, that is a savings of $55,000 monthly.
Every strategy above is built into Agentkit. Routing and caching are default behaviors. Compression and batching are configuration options. You do not need to build any of this yourself.
