The Production Agent Checklist

Woman working on laptop on bed with snacks and books.

James Porter

May 18, 2026

4 min read

The Production Agent Checklist

You built an agent. It works in development. It demos well. Now you want to ship it to real users. Before you do, answer these fourteen questions.

Reliability

1. What happens when the model returns garbage? Your agent will occasionally receive malformed JSON, hallucinated tool names, or empty responses. Do you have validation at every step? Agentkit's runtime auto-retries with correction prompts, but you need to define what correct looks like.

2. What happens when a tool times out? If your Stripe API call takes 30 seconds instead of 2, does the agent hang forever? Set timeouts on every tool. Define fallback behavior.

3. What happens when the agent enters a loop? LLMs can get stuck repeating the same tool call. Set a max-steps limit. Agentkit defaults to 25 steps per run, but you should tune this to your use case.

Cost

4. What is your cost ceiling per run? An unconstrained agent with access to GPT-5 and ten tools can easily spend $5 on a single run. Set a max-cost guardrail. Agentkit kills the run if the budget is exceeded.

5. Are you routing to the cheapest sufficient model? Not every step needs your most expensive model. Use multi-model routing to send simple extraction to cheaper models.

6. Do you know your cost per user per month? Aggregate your trace data by user. Identify power users and potential abuse patterns before they become billing surprises.

Security

7. Can the agent access data it should not? Tool permissions should be scoped. An agent handling support tickets should not have write access to your billing database.

8. Are you masking PII in logs and traces? Agentkit strips PII automatically in traces, but verify that your custom tools are not logging sensitive data in their own outputs.

9. Have you tested prompt injection? Send adversarial inputs through your agent. Agentkit's Prompt Shield catches common injection patterns, but you should test with your specific domain.

Observability

10. Can you replay any production run? If a user reports a bad result, can you see exactly what the agent did? Enable Trace on every production agent.

11. Do you have alerts on error rate and latency? Set up alerts before you ship, not after. Agentkit integrates with PagerDuty, Slack, and Opsgenie.

12. Are you tracking model drift? Model providers update their models without notice. Compare trace outputs week over week to catch behavioral changes.

User experience

13. Does the user know the agent is working? Long-running agents need progress indicators. Stream step updates to the frontend so users do not stare at a spinner for 30 seconds.

14. Can the user intervene? Some actions should require confirmation. Agentkit's interrupt system lets you pause execution and ask the user before proceeding with high-stakes tool calls.

Print this list. Tape it next to your monitor. Check every box before you deploy.

Building Accessible Components from the Ground Up

Before you ship an agent to real users you need to answer 14 questions. We wrote them down so you don't have to learn them the hard way.

September 28, 2025

Marcus Reid

Why We Built Agentkit

Before you ship an agent to real users you need to answer 14 questions. We wrote them down so you don't have to learn them the hard way.

October 15, 2025

Sana Lindqvist

Multi-Model Routing Explained

Before you ship an agent to real users you need to answer 14 questions. We wrote them down so you don't have to learn them the hard way.

October 22, 2025

Priya Anand

Back home

Usage-based pricing that scales with you.

Start free, pay for what runs. No seats, no platform fees, no surprise overages — just runs, tokens, and the tools you actually use.

book a demo

Start free, pay for what runs. No seats, no platform fees, no surprise overages — just runs, tokens, and the tools you actually use.

book a demo

The Production Agent Checklist

The Production Agent Checklist

Reliability

Cost

Security

Observability

User experience

Related articles

Related articles

Usage-based pricing that scales with you.

Usage-based pricing that scales with you.