AgentCostPilot Docs
Everything you need to set up, integrate, and optimize your AI agent costs.
Quick Start Guide
Get up and running in under 2 minutes.
Sign up at agentcostpilot.com/signup with email or Google OAuth. No credit card required.
Go to Dashboard → Providers → Connect. Paste your OpenAI or Anthropic API key. We use read-only access.
Your dashboard populates within minutes. View cost by agent, model, and task. Set alerts and get optimization recommendations.
Connect a Provider
AgentCostPilot supports 9 LLM providers.
| Provider | Key Format | Scope |
|---|---|---|
| OpenAI | sk-proj-... | usage.read |
| Anthropic | sk-ant-... | admin (usage read) |
| Google AI | AIza... | Vertex AI usage |
| AWS Bedrock | AKIA... | bedrock:GetUsage |
| Azure OpenAI | Endpoint + key | Metrics read |
| Mistral AI | sk-... | Usage API |
| Cohere | co-... | Usage read |
| Groq | gsk_... | Usage read |
| Together AI | tog-... | Usage read |
All API keys are encrypted with AES-256.
Create Your First Agent
Two ways to create agents:
Dashboard → Agents → + Add agent.
New agent_id via MCP or REST API auto-creates the agent.
Cost Overview
Main dashboard: total spend (MTD), active agents, potential savings, token usage. Filter by 24h/7d/30d/90d. Budget burn rate, provider breakdown, top agents by cost.
Agent Management
Each agent has: cost history (30-day chart), model usage breakdown, budget utilization, health score (0-100), and personalized recommendations.
Alerts & Budgets
Alert types: threshold-based (daily spend exceeds $X) and smart alerts (anomaly detection using statistical analysis). Budget enforcement with kill switch.
Recommendations
Actionable recommendations with “Apply” button:
- ▸Switch model — e.g., GPT-4o → Claude Haiku for 73% savings
- ▸Cascade routing — start cheap, escalate on uncertainty
- ▸Kill idle agents — no output for 24h+
- ▸Enable prompt caching — repeated prompts detected, save 90%
- ▸Batch requests — reduce individual API calls
MCP Server Setup
Standard MCP protocol. Works with Claude, GPT, Gemini, LangChain, CrewAI, AutoGen.
// claude_desktop_config.json
{
"mcpServers": {
"agentcostpilot": {
"url": "https://mcp.agentcostpilot.com",
"headers": { "Authorization": "Bearer acp_sk_..." }
}
}
}| Tool | Description |
|---|---|
| acp_check_budget | Check remaining budget before expensive calls |
| acp_report_usage | Report token usage after each LLM call |
| acp_get_recommendation | Get model recommendation for current task |
| acp_check_limits | Check limits and agent status |
| acp_heartbeat | Confirm agent is alive and active |
REST API Reference
Base URL: https://api.agentcostpilot.com/v1
Check Budget
GET /v1/agents/{agent_id}/budget
Response:
{ "agent_id": "support-bot", "budget_usd": 500.00,
"spent_usd": 357.50, "remaining_usd": 142.50,
"utilization_pct": 71.5, "status": "active" }Report Usage
POST /v1/usage
{ "agent_id": "support-bot", "model": "gpt-4o",
"input_tokens": 1250, "output_tokens": 380,
"task": "customer_reply" }
Response:
{ "cost_usd": 0.0284, "budget_remaining_usd": 142.47 }Cache Analysis
GET /v1/cache/analysis
Response:
{ "total_duplicates": 47, "cacheable_spend_usd": 734.70,
"potential_savings_usd": 661.23,
"top_agents": [
{ "agent_id": "support-bot", "duplicates": 127, "savings_usd": 256.05 }
] }Python SDK
pip install agentcostpilot
from agentcostpilot import ACPClient, wrap_openai, track
acp = ACPClient(api_key="acp_sk_...")
# Option 1: OTEL wrapper
client = wrap_openai(OpenAI(), acp, agent_id="my-agent")
# Option 2: Decorator
@track(acp, agent_id="summarizer", task="summarize")
def summarize(text):
return openai.chat.completions.create(...)
# Option 3: Manual
acp.report_usage(agent_id="bot", model="gpt-4o",
provider="openai", input_tokens=150,
output_tokens=20, cost_usd=0.0001)
acp.flush()Authentication
Supabase Auth with email/password or Google OAuth. JWT tokens, RLS.
API keys (acp_sk_...) hashed with SHA-256. Rate limit: 100 req/min.
OpenAI Integration
Endpoint: GET https://api.openai.com/v1/organization/usage
Auth: Bearer token (Admin API key, usage.read scope)
Polling: Every hour | Rate limit: 60 req/min
Models: gpt-4o ($2.50/$10.00/1M), gpt-4o-mini ($0.15/$0.60/1M),
o1 ($15.00/$60.00/1M), text-embedding-3-small ($0.02/1M)Anthropic Integration
Endpoint: GET https://api.anthropic.com/v1/organizations/{org}/usage
Auth: x-api-key header (Admin API key)
Polling: Every hour | Rate limit: 60 req/min
Models: claude-opus-4-6 ($15.00/$75.00/1M),
claude-sonnet-4-6 ($3.00/$15.00/1M),
claude-haiku-4-5 ($0.80/$4.00/1M)Google AI Integration
Endpoint: Vertex AI Monitoring API
Auth: Service Account JSON key | Polling: Every hour
Models: gemini-2.5-pro ($1.25/$5.00/1M),
gemini-2.5-flash ($0.075/$0.30/1M)LiteLLM Gateway Integration
Using LiteLLM as your AI gateway? Connect ACP with one line in your config.
Option 1: LiteLLM Config (recommended)
# litellm config.yaml
model_list:
- model_name: gpt-4o
litellm_params:
model: openai/gpt-4o
litellm_settings:
success_callback: ["agentcostpilot"]
environment_variables:
ACP_API_KEY: "acp_sk_your_key_here"Option 2: Python callback
from agentcostpilot.litellm_callback import ACPCallback callback = ACPCallback(api_key="acp_sk_...", agent_id="litellm-proxy") import litellm litellm.callbacks = [callback]
Stripe Billing
Plans: Free ($0), Pro ($99/mo), Business ($299/mo), Enterprise ($999+/mo). Checkout via Stripe, self-service portal, webhook sync.
OTEL Integration
Track LLM costs via OpenTelemetry wrapper SDKs.
from agentcostpilot import ACPClient, wrap_openai
from openai import OpenAI
acp = ACPClient(api_key="acp_xxx")
client = wrap_openai(OpenAI(), acp, agent_id="my-agent")
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "Hello"}]
)
# ACP captures: model, tokens, cost, latencyIngest Endpoint
POST /v1/ingest/otlp
Content-Type: application/json
{ "traces": [...] } // Standard OTLP format
POST /v1/ingest/batch
{ "events": [
{ "agent_id": "bot", "model": "gpt-4o",
"input_tokens": 150, "output_tokens": 20,
"cost_usd": 0.0001, "timestamp": 1711600000 }
]
}Value Reports
Generate PDF/HTML reports showing AI agent ROI for stakeholders.
POST /v1/reports/value
{ "period_start": "2026-03-01", "period_end": "2026-03-31",
"tasks_completed": 12500, "hours_saved": 480 }
Response:
{ "id": "rpt_xxx", "total_cost_usd": 4872.00,
"total_savings_usd": 1364.16, "roi_percentage": 312,
"agentic_cost_ratio": 4.2, "pdf_url": "..." }Savings API
GET /v1/savings/summary
Response:
{ "total_savings_usd": 2840.00,
"by_category": {
"model_downgrade": 1680, "caching": 620,
"cascade_routing": 380, "waste_elimination": 160 },
"savings_rate_pct": 28.4 }Health Scores
GET /v1/agents/{agent_id}/health
Response:
{ "agent_id": "support-bot", "score": 87,
"efficiency_score": 92, "utilization_score": 85,
"waste_score": 16, "trend": "improving",
"recommendations": ["Enable prompt caching", "Try cascade routing"] }Cache Analysis
ACP scans usage logs every hour to find duplicate requests. Anthropic prompt caching saves up to 90%.
GET /v1/cache/analysis
Response:
{ "total_duplicates": 47,
"cacheable_spend_usd": 734.70,
"potential_savings_usd": 661.23,
"agents": [
{ "agent_id": "support-bot", "model": "gpt-4o",
"duplicates": 127, "potential_savings_usd": 256.05 }
] }Cascade Routing
Start with cheap models, escalate only when needed (Haiku → Sonnet → Opus).
GET /v1/recommendations?type=cascade_routing
Response:
{ "recommendations": [
{ "agent_id": "support-bot", "cascade_from": "gpt-4o",
"cascade_to": ["gpt-4o-mini", "gpt-4o", "o1"],
"estimated_savings_usd": 580.00,
"estimated_savings_pct": 65 }
] }Webhooks
Events: alert.triggered, budget.exceeded, agent.killed, anomaly.detected, recommendation.available, report.generated, cache.opportunity_found
POST /v1/webhooks
{ "url": "https://hooks.slack.com/services/...",
"events": ["alert.triggered", "budget.exceeded"],
"secret": "whsec_xxx" }x402 Overview
x402 is an open protocol by Coinbase for internet-native payments. AI agents pay for premium ACP tools via HTTP 402 with USDC stablecoins on Base. No accounts, no subscriptions — just request, pay, receive.
How it works
POST /v1/tools/recommendations/premium without payment
Response includes price ($0.05 USDC), network (Base), currency (USDC)
Uses Coinbase CDP facilitator to complete USDC payment
Repeats request with payment authorization header
Verifies payment, executes tool, logs in ledger
x402 is production-ready: 75M+ transactions, $24M+ volume. Adopted by Stripe, AWS, Vercel, Cloudflare. Built on open standard — not proprietary.
Paid Tools (x402)
Some ACP tools are free, others are premium and require x402 payment.
| Tool | Type | Price |
|---|---|---|
| acp_check_budget | Free | — |
| acp_report_usage | Free | — |
| acp_heartbeat | Free | — |
| acp_get_recommendation | Free | — |
| acp_premium_recommendation | Paid (x402) | $0.05 |
| acp_anomaly_scan | Paid (x402) | $0.50 |
| acp_model_route | Paid (x402) | $0.05 |
# Agent requests premium recommendation
POST /v1/tools/recommendations/premium
# Without payment:
HTTP 402 Payment Required
{ "x402": {
"price": "0.05",
"currency": "USDC",
"network": "base",
"facilitator": "https://x402.coinbase.com",
"description": "Premium model recommendation"
}
}
# After payment:
HTTP 200 OK
{ "recommendation": {
"current_model": "gpt-4o",
"suggested_model": "claude-haiku",
"estimated_savings_pct": 73,
"quality_impact": "< 2% degradation",
"confidence": 0.94
}
}Wallet & Policies
Manage USDC balance and spend policies for autonomous agent payments. Dashboard: → Wallet + Policies pages.
Wallet API
GET /v1/wallet/balance
{ "balance_usd": 49.35, "currency": "USDC", "network": "base" }
POST /v1/wallet/deposit
{ "amount_usd": 50.00, "currency": "USDC" }
GET /v1/ledger
{ "entries": [
{ "type": "credit", "amount": 50.00, "desc": "USDC deposit" },
{ "type": "debit", "amount": 0.05, "desc": "Premium recommendation",
"agent": "support-bot", "tx_hash": "0x7a3f...e291" }
]
}Policy API
GET /v1/policies
{ "max_per_call_usd": 1.00,
"max_daily_usd": 25.00,
"approval_mode": "auto",
"allowed_tools": ["acp_premium_recommendation", "acp_anomaly_scan"] }
POST /v1/policies
{ "max_per_call_usd": 0.50,
"approval_mode": "approve_above_threshold" }Approval modes
- ▸auto — agents pay within policy limits without human approval
- ▸approve_above_threshold — auto below limit, require approval above
- ▸manual_first_vendor — first payment to new vendor needs approval
- ▸simulation_only — agents can request quotes but cannot pay
Wallet available on Business and Enterprise tiers. Free and Pro users can still pay per call without a wallet.
Self-Hosted (Docker)
Run ACP on your own infrastructure. Business ($299/mo) and Enterprise ($999+/mo) plans.
git clone https://github.com/Dmitrze/agentcostpilot.git cd agentcostpilot cp .env.example .env docker-compose up -d # Dashboard at http://localhost:3000
Circuit Breaker
If ACP is down, agents continue working. SDK buffers data locally and syncs when recovered.
from agentcostpilot import ACPClient
acp = ACPClient(
api_key="acp_sk_...",
circuit_breaker_threshold=3,
circuit_breaker_timeout=60
)GET /v1/mcp/health
{ "status": "ok", "latency_ms": 5, "uptime_seconds": 86400 }Questions? Contact support@agentcostpilot.com