You built an AI agent. It works. It handles customer tickets, summarizes documents, classifies data, generates reports. Your team is happy. Your users are happy.
But do you know what it actually costs to run?
We analyzed cost data from 50 AI agent deployments across startups and mid-market companies. The findings were consistent: teams waste an average of 28% of their AI spend on suboptimal model selection, idle agents, and duplicate requests.
The anatomy of AI agent costs
An agent's cost isn't just tokens. It's a stack: LLM API calls (prices vary 100x between models), tool calls (search APIs, databases), compute (your LangChain/CrewAI infra), and retries (failed calls that cost the same but produce nothing). GPT-4o costs $2.50/M input tokens. GPT-4o-mini costs $0.15. That's a 16x difference.
Where the 28% waste comes from
Model overprovisioning (15-20%). Teams default to GPT-4o for everything. 73% of those calls could run on a cheaper model with no quality loss.
Idle agents (5-8%). Test agents never decommissioned. We found one team burning $800/month on 4 idle agents.
Duplicate requests (3-5%). Multiple agents making identical calls. No caching, no dedup.
Missing prompt caching (5-10%). Anthropic cached prompts cost 90% less, but most teams haven't enabled it.
What to do right now
1. Audit model usage — test cheaper models on 100 examples. 2. Check for idle agents. 3. Enable prompt caching. 4. Set daily spend alerts.
Or sign up for AgentCostPilot — free up to $1K/month — and see all of this automatically.