The Production Agent Checklist
You built an agent. It works in development. It demos well. Now you want to ship it to real users. Before you do, answer these fourteen questions.
Reliability
1. What happens when the model returns garbage? Your agent will occasionally receive malformed JSON, hallucinated tool names, or empty responses. Do you have validation at every step? Agentkit's runtime auto-retries with correction prompts, but you need to define what correct looks like.
2. What happens when a tool times out? If your Stripe API call takes 30 seconds instead of 2, does the agent hang forever? Set timeouts on every tool. Define fallback behavior.
3. What happens when the agent enters a loop? LLMs can get stuck repeating the same tool call. Set a max-steps limit. Agentkit defaults to 25 steps per run, but you should tune this to your use case.
Cost
4. What is your cost ceiling per run? An unconstrained agent with access to GPT-5 and ten tools can easily spend $5 on a single run. Set a max-cost guardrail. Agentkit kills the run if the budget is exceeded.
5. Are you routing to the cheapest sufficient model? Not every step needs your most expensive model. Use multi-model routing to send simple extraction to cheaper models.
6. Do you know your cost per user per month? Aggregate your trace data by user. Identify power users and potential abuse patterns before they become billing surprises.
Security
7. Can the agent access data it should not? Tool permissions should be scoped. An agent handling support tickets should not have write access to your billing database.
8. Are you masking PII in logs and traces? Agentkit strips PII automatically in traces, but verify that your custom tools are not logging sensitive data in their own outputs.
9. Have you tested prompt injection? Send adversarial inputs through your agent. Agentkit's Prompt Shield catches common injection patterns, but you should test with your specific domain.
Observability
10. Can you replay any production run? If a user reports a bad result, can you see exactly what the agent did? Enable Trace on every production agent.
11. Do you have alerts on error rate and latency? Set up alerts before you ship, not after. Agentkit integrates with PagerDuty, Slack, and Opsgenie.
12. Are you tracking model drift? Model providers update their models without notice. Compare trace outputs week over week to catch behavioral changes.
User experience
13. Does the user know the agent is working? Long-running agents need progress indicators. Stream step updates to the frontend so users do not stare at a spinner for 30 seconds.
14. Can the user intervene? Some actions should require confirmation. Agentkit's interrupt system lets you pause execution and ask the user before proceeding with high-stakes tool calls.
Print this list. Tape it next to your monitor. Check every box before you deploy.
