How to Cut Your Agent Costs by 60%

man in black crew neck t-shirt using macbook

Priya Anand

May 18, 2026

6 min read

How to Cut Your Agent Costs by 60%

When we launched Agentkit's cloud offering, our own internal agents were spending $0.89 per run. Three months later, we got that down to $0.34 with no quality degradation. Here is exactly what we did.

Strategy 1: Multi-model routing

This is the single highest-impact change. Most agent runs are 80% simple steps — data extraction, formatting, summarization — and 20% hard steps — planning, reasoning, decision-making.

We configured the router to send simple steps to Claude Haiku ($0.25 per million input tokens) and hard steps to GPT-5 ($30 per million input tokens). The result: 70% of our token spend shifted to cheaper models.

Impact: -52% on model costs.

Strategy 2: Prompt caching

Many agent runs use similar system prompts, tool definitions, and few-shot examples. Anthropic and OpenAI both offer prompt caching that reduces the cost of repeated prompt prefixes.

We restructured our prompts to front-load the static parts (system instructions, tool schemas, guardrails) and append dynamic parts (user input, context) at the end. This maximized cache hit rates.

Impact: -18% on input token costs.

Strategy 3: Prompt compression

Agent prompts accumulate context over multi-step runs. By step seven, the prompt might include the full history of previous steps, tool outputs, and intermediate reasoning.

We implemented progressive summarization: after each step, the full output is compressed into a structured summary. The agent works from summaries, not raw history. Token count per step dropped from an average of 4,200 to 1,800.

Impact: -25% on total token usage.

Strategy 4: Tool output batching

Some tools return large payloads. A Stripe invoice list might return 50KB of JSON. Sending all of that to the model is wasteful when the agent only needs three fields per invoice.

We added output schemas to our tools. The tool extracts only the fields the agent needs before passing the result to the model. A 50KB Stripe response becomes a 2KB structured summary.

Impact: -30% on tool-related token costs.

Combined result

These four strategies are not additive — they compound. Model routing reduces the base cost. Caching reduces the per-call cost. Compression reduces the per-step cost. Batching reduces the per-tool cost.

Combined, our average cost per run dropped from $0.89 to $0.34. On high-volume agents running 100,000 times per month, that is a savings of $55,000 monthly.

Every strategy above is built into Agentkit. Routing and caching are default behaviors. Compression and batching are configuration options. You do not need to build any of this yourself.

Building Accessible Components from the Ground Up

Model routing and caching and prompt compression and batching. Four concrete strategies we used internally to drop our cost per run from $0.89 to $0.34.

September 28, 2025

Marcus Reid

Why We Built Agentkit

Model routing and caching and prompt compression and batching. Four concrete strategies we used internally to drop our cost per run from $0.89 to $0.34.

October 15, 2025

Sana Lindqvist

Multi-Model Routing Explained

Model routing and caching and prompt compression and batching. Four concrete strategies we used internally to drop our cost per run from $0.89 to $0.34.

October 22, 2025

Priya Anand

Back home

Usage-based pricing that scales with you.

Start free, pay for what runs. No seats, no platform fees, no surprise overages — just runs, tokens, and the tools you actually use.

book a demo

Start free, pay for what runs. No seats, no platform fees, no surprise overages — just runs, tokens, and the tools you actually use.

book a demo

How to Cut Your Agent Costs by 60%

How to Cut Your Agent Costs by 60%

Strategy 1: Multi-model routing

Strategy 2: Prompt caching

Strategy 3: Prompt compression

Strategy 4: Tool output batching

Combined result

Related articles

Related articles

Usage-based pricing that scales with you.

Usage-based pricing that scales with you.