Documentation

AgentCostPilot Docs

Everything you need to set up, integrate, and optimize your AI agent costs.

Quick Start Guide

Get up and running in under 2 minutes.

1
Create an account

Sign up at agentcostpilot.com/signup with email or Google OAuth. No credit card required.

2
Connect a provider

Go to Dashboard → Providers → Connect. Paste your OpenAI or Anthropic API key. We use read-only access.

3
See your costs

Your dashboard populates within minutes. View cost by agent, model, and task. Set alerts and get optimization recommendations.

Connect a Provider

AgentCostPilot supports 9 LLM providers.

ProviderKey FormatScope
OpenAIsk-proj-...usage.read
Anthropicsk-ant-...admin (usage read)
Google AIAIza...Vertex AI usage
AWS BedrockAKIA...bedrock:GetUsage
Azure OpenAIEndpoint + keyMetrics read
Mistral AIsk-...Usage API
Cohereco-...Usage read
Groqgsk_...Usage read
Together AItog-...Usage read

All API keys are encrypted with AES-256.

Create Your First Agent

Two ways to create agents:

Manual (Dashboard)

Dashboard Agents + Add agent.

Automatic (API/MCP)

New agent_id via MCP or REST API auto-creates the agent.

Cost Overview

Main dashboard: total spend (MTD), active agents, potential savings, token usage. Filter by 24h/7d/30d/90d. Budget burn rate, provider breakdown, top agents by cost.

Agent Management

Each agent has: cost history (30-day chart), model usage breakdown, budget utilization, health score (0-100), and personalized recommendations.

Alerts & Budgets

Alert types: threshold-based (daily spend exceeds $X) and smart alerts (anomaly detection using statistical analysis). Budget enforcement with kill switch.

Recommendations

Actionable recommendations with Apply button:

  • Switch model — e.g., GPT-4o → Claude Haiku for 73% savings
  • Cascade routing — start cheap, escalate on uncertainty
  • Kill idle agents — no output for 24h+
  • Enable prompt caching — repeated prompts detected, save 90%
  • Batch requests — reduce individual API calls

MCP Server Setup

Standard MCP protocol. Works with Claude, GPT, Gemini, LangChain, CrewAI, AutoGen.

// claude_desktop_config.json
{
  "mcpServers": {
    "agentcostpilot": {
      "url": "https://mcp.agentcostpilot.com",
      "headers": { "Authorization": "Bearer acp_sk_..." }
    }
  }
}
ToolDescription
acp_check_budgetCheck remaining budget before expensive calls
acp_report_usageReport token usage after each LLM call
acp_get_recommendationGet model recommendation for current task
acp_check_limitsCheck limits and agent status
acp_heartbeatConfirm agent is alive and active

REST API Reference

Base URL: https://api.agentcostpilot.com/v1

Check Budget

GET /v1/agents/{agent_id}/budget

Response:
{ "agent_id": "support-bot", "budget_usd": 500.00,
  "spent_usd": 357.50, "remaining_usd": 142.50,
  "utilization_pct": 71.5, "status": "active" }

Report Usage

POST /v1/usage
{ "agent_id": "support-bot", "model": "gpt-4o",
  "input_tokens": 1250, "output_tokens": 380,
  "task": "customer_reply" }

Response:
{ "cost_usd": 0.0284, "budget_remaining_usd": 142.47 }

Cache Analysis

GET /v1/cache/analysis

Response:
{ "total_duplicates": 47, "cacheable_spend_usd": 734.70,
  "potential_savings_usd": 661.23,
  "top_agents": [
    { "agent_id": "support-bot", "duplicates": 127, "savings_usd": 256.05 }
  ] }

Python SDK

pip install agentcostpilot
from agentcostpilot import ACPClient, wrap_openai, track

acp = ACPClient(api_key="acp_sk_...")

# Option 1: OTEL wrapper
client = wrap_openai(OpenAI(), acp, agent_id="my-agent")

# Option 2: Decorator
@track(acp, agent_id="summarizer", task="summarize")
def summarize(text):
    return openai.chat.completions.create(...)

# Option 3: Manual
acp.report_usage(agent_id="bot", model="gpt-4o",
    provider="openai", input_tokens=150,
    output_tokens=20, cost_usd=0.0001)
acp.flush()

Authentication

User Auth (Dashboard)

Supabase Auth with email/password or Google OAuth. JWT tokens, RLS.

Agent Auth (API/MCP)

API keys (acp_sk_...) hashed with SHA-256. Rate limit: 100 req/min.

OpenAI Integration

Endpoint: GET https://api.openai.com/v1/organization/usage
Auth: Bearer token (Admin API key, usage.read scope)
Polling: Every hour | Rate limit: 60 req/min

Models: gpt-4o ($2.50/$10.00/1M), gpt-4o-mini ($0.15/$0.60/1M),
        o1 ($15.00/$60.00/1M), text-embedding-3-small ($0.02/1M)

Anthropic Integration

Endpoint: GET https://api.anthropic.com/v1/organizations/{org}/usage
Auth: x-api-key header (Admin API key)
Polling: Every hour | Rate limit: 60 req/min

Models: claude-opus-4-6 ($15.00/$75.00/1M),
        claude-sonnet-4-6 ($3.00/$15.00/1M),
        claude-haiku-4-5 ($0.80/$4.00/1M)

Google AI Integration

Endpoint: Vertex AI Monitoring API
Auth: Service Account JSON key | Polling: Every hour

Models: gemini-2.5-pro ($1.25/$5.00/1M),
        gemini-2.5-flash ($0.075/$0.30/1M)

LiteLLM Gateway Integration

Using LiteLLM as your AI gateway? Connect ACP with one line in your config.

Option 1: LiteLLM Config (recommended)

# litellm config.yaml
model_list:
  - model_name: gpt-4o
    litellm_params:
      model: openai/gpt-4o

litellm_settings:
  success_callback: ["agentcostpilot"]

environment_variables:
  ACP_API_KEY: "acp_sk_your_key_here"

Option 2: Python callback

from agentcostpilot.litellm_callback import ACPCallback

callback = ACPCallback(api_key="acp_sk_...", agent_id="litellm-proxy")

import litellm
litellm.callbacks = [callback]

Stripe Billing

Plans: Free ($0), Pro ($99/mo), Business ($299/mo), Enterprise ($999+/mo). Checkout via Stripe, self-service portal, webhook sync.

OTEL Integration

Track LLM costs via OpenTelemetry wrapper SDKs.

from agentcostpilot import ACPClient, wrap_openai
from openai import OpenAI

acp = ACPClient(api_key="acp_xxx")
client = wrap_openai(OpenAI(), acp, agent_id="my-agent")

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "Hello"}]
)
# ACP captures: model, tokens, cost, latency

Ingest Endpoint

POST /v1/ingest/otlp
Content-Type: application/json
{ "traces": [...] }  // Standard OTLP format

POST /v1/ingest/batch
{ "events": [
    { "agent_id": "bot", "model": "gpt-4o",
      "input_tokens": 150, "output_tokens": 20,
      "cost_usd": 0.0001, "timestamp": 1711600000 }
  ]
}

Value Reports

Generate PDF/HTML reports showing AI agent ROI for stakeholders.

POST /v1/reports/value
{ "period_start": "2026-03-01", "period_end": "2026-03-31",
  "tasks_completed": 12500, "hours_saved": 480 }

Response:
{ "id": "rpt_xxx", "total_cost_usd": 4872.00,
  "total_savings_usd": 1364.16, "roi_percentage": 312,
  "agentic_cost_ratio": 4.2, "pdf_url": "..." }

Savings API

GET /v1/savings/summary
Response:
{ "total_savings_usd": 2840.00,
  "by_category": {
    "model_downgrade": 1680, "caching": 620,
    "cascade_routing": 380, "waste_elimination": 160 },
  "savings_rate_pct": 28.4 }

Health Scores

GET /v1/agents/{agent_id}/health
Response:
{ "agent_id": "support-bot", "score": 87,
  "efficiency_score": 92, "utilization_score": 85,
  "waste_score": 16, "trend": "improving",
  "recommendations": ["Enable prompt caching", "Try cascade routing"] }

Cache Analysis

ACP scans usage logs every hour to find duplicate requests. Anthropic prompt caching saves up to 90%.

GET /v1/cache/analysis
Response:
{ "total_duplicates": 47,
  "cacheable_spend_usd": 734.70,
  "potential_savings_usd": 661.23,
  "agents": [
    { "agent_id": "support-bot", "model": "gpt-4o",
      "duplicates": 127, "potential_savings_usd": 256.05 }
  ] }

Cascade Routing

Start with cheap models, escalate only when needed (Haiku Sonnet Opus).

GET /v1/recommendations?type=cascade_routing
Response:
{ "recommendations": [
    { "agent_id": "support-bot", "cascade_from": "gpt-4o",
      "cascade_to": ["gpt-4o-mini", "gpt-4o", "o1"],
      "estimated_savings_usd": 580.00,
      "estimated_savings_pct": 65 }
  ] }

Webhooks

Events: alert.triggered, budget.exceeded, agent.killed, anomaly.detected, recommendation.available, report.generated, cache.opportunity_found

POST /v1/webhooks
{ "url": "https://hooks.slack.com/services/...",
  "events": ["alert.triggered", "budget.exceeded"],
  "secret": "whsec_xxx" }

x402 Overview

x402 is an open protocol by Coinbase for internet-native payments. AI agents pay for premium ACP tools via HTTP 402 with USDC stablecoins on Base. No accounts, no subscriptions just request, pay, receive.

How it works

1
Agent requests premium tool

POST /v1/tools/recommendations/premium without payment

2
ACP returns 402

Response includes price ($0.05 USDC), network (Base), currency (USDC)

3
Agent pays via x402

Uses Coinbase CDP facilitator to complete USDC payment

4
Agent retries with proof

Repeats request with payment authorization header

5
ACP delivers result

Verifies payment, executes tool, logs in ledger

x402 is production-ready: 75M+ transactions, $24M+ volume. Adopted by Stripe, AWS, Vercel, Cloudflare. Built on open standard not proprietary.

Paid Tools (x402)

Some ACP tools are free, others are premium and require x402 payment.

ToolTypePrice
acp_check_budgetFree
acp_report_usageFree
acp_heartbeatFree
acp_get_recommendationFree
acp_premium_recommendationPaid (x402)$0.05
acp_anomaly_scanPaid (x402)$0.50
acp_model_routePaid (x402)$0.05
# Agent requests premium recommendation
POST /v1/tools/recommendations/premium

# Without payment:
HTTP 402 Payment Required
{ "x402": {
    "price": "0.05",
    "currency": "USDC",
    "network": "base",
    "facilitator": "https://x402.coinbase.com",
    "description": "Premium model recommendation"
  }
}

# After payment:
HTTP 200 OK
{ "recommendation": {
    "current_model": "gpt-4o",
    "suggested_model": "claude-haiku",
    "estimated_savings_pct": 73,
    "quality_impact": "< 2% degradation",
    "confidence": 0.94
  }
}

Wallet & Policies

Manage USDC balance and spend policies for autonomous agent payments. Dashboard: Wallet + Policies pages.

Wallet API

GET /v1/wallet/balance
{ "balance_usd": 49.35, "currency": "USDC", "network": "base" }

POST /v1/wallet/deposit
{ "amount_usd": 50.00, "currency": "USDC" }

GET /v1/ledger
{ "entries": [
    { "type": "credit", "amount": 50.00, "desc": "USDC deposit" },
    { "type": "debit", "amount": 0.05, "desc": "Premium recommendation",
      "agent": "support-bot", "tx_hash": "0x7a3f...e291" }
  ]
}

Policy API

GET /v1/policies
{ "max_per_call_usd": 1.00,
  "max_daily_usd": 25.00,
  "approval_mode": "auto",
  "allowed_tools": ["acp_premium_recommendation", "acp_anomaly_scan"] }

POST /v1/policies
{ "max_per_call_usd": 0.50,
  "approval_mode": "approve_above_threshold" }

Approval modes

  • auto — agents pay within policy limits without human approval
  • approve_above_threshold — auto below limit, require approval above
  • manual_first_vendor — first payment to new vendor needs approval
  • simulation_only — agents can request quotes but cannot pay

Wallet available on Business and Enterprise tiers. Free and Pro users can still pay per call without a wallet.

Self-Hosted (Docker)

Run ACP on your own infrastructure. Business ($299/mo) and Enterprise ($999+/mo) plans.

git clone https://github.com/Dmitrze/agentcostpilot.git
cd agentcostpilot
cp .env.example .env
docker-compose up -d

# Dashboard at http://localhost:3000

Circuit Breaker

If ACP is down, agents continue working. SDK buffers data locally and syncs when recovered.

from agentcostpilot import ACPClient

acp = ACPClient(
    api_key="acp_sk_...",
    circuit_breaker_threshold=3,
    circuit_breaker_timeout=60
)
GET /v1/mcp/health
{ "status": "ok", "latency_ms": 5, "uptime_seconds": 86400 }

Questions? Contact support@agentcostpilot.com