Documentation

AgentCostPilot Docs

Everything you need to set up, integrate, and optimize your AI agent costs.

Quick Start Guide

Get up and running in under 2 minutes.

Create an account

Connect a provider

Go to Dashboard → Providers → Connect. Paste your OpenAI or Anthropic API key. We use read-only access.

See your costs

Your dashboard populates within minutes. View cost by agent, model, and task. Set alerts and get optimization recommendations.

Connect a Provider

AgentCostPilot supports 9 LLM providers.

Provider	Key Format	Scope
OpenAI	sk-proj-...	usage.read
Anthropic	sk-ant-...	admin (usage read)
Google AI	AIza...	Vertex AI usage
AWS Bedrock	AKIA...	bedrock:GetUsage
Azure OpenAI	Endpoint + key	Metrics read
Mistral AI	sk-...	Usage API
Cohere	co-...	Usage read
Groq	gsk_...	Usage read
Together AI	tog-...	Usage read

All API keys are encrypted with AES-256.

Create Your First Agent

Two ways to create agents:

Manual (Dashboard)

Dashboard → Agents → + Add agent.

Automatic (API/MCP)

New agent_id via MCP or REST API auto-creates the agent.

Cost Overview

Main dashboard: total spend (MTD), active agents, potential savings, token usage. Filter by 24h/7d/30d/90d. Budget burn rate, provider breakdown, top agents by cost.

Agent Management

Each agent has: cost history (30-day chart), model usage breakdown, budget utilization, health score (0-100), and personalized recommendations.

Alerts & Budgets

Alert types: threshold-based (daily spend exceeds $X) and smart alerts (anomaly detection using statistical analysis). Budget enforcement with kill switch.

Recommendations

Actionable recommendations with “Apply” button:

▸Switch model — e.g., GPT-4o → Claude Haiku for 73% savings
▸Cascade routing — start cheap, escalate on uncertainty
▸Kill idle agents — no output for 24h+
▸Enable prompt caching — repeated prompts detected, save 90%
▸Batch requests — reduce individual API calls

MCP Server Setup

Standard MCP protocol. Works with Claude, GPT, Gemini, LangChain, CrewAI, AutoGen.

// claude_desktop_config.json
{
  "mcpServers": {
    "agentcostpilot": {
      "url": "https://mcp.agentcostpilot.com",
      "headers": { "Authorization": "Bearer acp_sk_..." }
    }
  }
}

Tool	Description
acp_check_budget	Check remaining budget before expensive calls
acp_report_usage	Report token usage after each LLM call
acp_get_recommendation	Get model recommendation for current task
acp_check_limits	Check limits and agent status
acp_heartbeat	Confirm agent is alive and active

REST API Reference

Base URL: https://api.agentcostpilot.com/v1

Check Budget

GET /v1/agents/{agent_id}/budget

Response:
{ "agent_id": "support-bot", "budget_usd": 500.00,
  "spent_usd": 357.50, "remaining_usd": 142.50,
  "utilization_pct": 71.5, "status": "active" }

Report Usage

POST /v1/usage
{ "agent_id": "support-bot", "model": "gpt-4o",
  "input_tokens": 1250, "output_tokens": 380,
  "task": "customer_reply" }

Response:
{ "cost_usd": 0.0284, "budget_remaining_usd": 142.47 }

Cache Analysis

GET /v1/cache/analysis

Response:
{ "total_duplicates": 47, "cacheable_spend_usd": 734.70,
  "potential_savings_usd": 661.23,
  "top_agents": [
    { "agent_id": "support-bot", "duplicates": 127, "savings_usd": 256.05 }
  ] }

Python SDK

pip install agentcostpilot

from agentcostpilot import ACPClient, wrap_openai, track

acp = ACPClient(api_key="acp_sk_...")

# Option 1: OTEL wrapper
client = wrap_openai(OpenAI(), acp, agent_id="my-agent")

# Option 2: Decorator
@track(acp, agent_id="summarizer", task="summarize")
def summarize(text):
    return openai.chat.completions.create(...)

# Option 3: Manual
acp.report_usage(agent_id="bot", model="gpt-4o",
    provider="openai", input_tokens=150,
    output_tokens=20, cost_usd=0.0001)
acp.flush()

Authentication

User Auth (Dashboard)

Supabase Auth with email/password or Google OAuth. JWT tokens, RLS.

Agent Auth (API/MCP)

API keys (acp_sk_...) hashed with SHA-256. Rate limit: 100 req/min.

OpenAI Integration

Endpoint: GET https://api.openai.com/v1/organization/usage
Auth: Bearer token (Admin API key, usage.read scope)
Polling: Every hour | Rate limit: 60 req/min

Models: gpt-4o ($2.50/$10.00/1M), gpt-4o-mini ($0.15/$0.60/1M),
        o1 ($15.00/$60.00/1M), text-embedding-3-small ($0.02/1M)

Anthropic Integration

Endpoint: GET https://api.anthropic.com/v1/organizations/{org}/usage
Auth: x-api-key header (Admin API key)
Polling: Every hour | Rate limit: 60 req/min

Models: claude-opus-4-6 ($15.00/$75.00/1M),
        claude-sonnet-4-6 ($3.00/$15.00/1M),
        claude-haiku-4-5 ($0.80/$4.00/1M)

Google AI Integration

Endpoint: Vertex AI Monitoring API
Auth: Service Account JSON key | Polling: Every hour

Models: gemini-2.5-pro ($1.25/$5.00/1M),
        gemini-2.5-flash ($0.075/$0.30/1M)

LiteLLM Gateway Integration

Using LiteLLM as your AI gateway? Connect ACP with one line in your config.

Option 1: LiteLLM Config (recommended)

# litellm config.yaml
model_list:
  - model_name: gpt-4o
    litellm_params:
      model: openai/gpt-4o

litellm_settings:
  success_callback: ["agentcostpilot"]

environment_variables:
  ACP_API_KEY: "acp_sk_your_key_here"

Option 2: Python callback

from agentcostpilot.litellm_callback import ACPCallback

callback = ACPCallback(api_key="acp_sk_...", agent_id="litellm-proxy")

import litellm
litellm.callbacks = [callback]

Stripe Billing

Plans: Free ($0), Pro ($99/mo), Business ($299/mo), Enterprise ($999+/mo). Checkout via Stripe, self-service portal, webhook sync.

OTEL Integration

Track LLM costs via OpenTelemetry wrapper SDKs.

from agentcostpilot import ACPClient, wrap_openai
from openai import OpenAI

acp = ACPClient(api_key="acp_xxx")
client = wrap_openai(OpenAI(), acp, agent_id="my-agent")

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "Hello"}]
)
# ACP captures: model, tokens, cost, latency

Ingest Endpoint

POST /v1/ingest/otlp
Content-Type: application/json
{ "traces": [...] }  // Standard OTLP format

POST /v1/ingest/batch
{ "events": [
    { "agent_id": "bot", "model": "gpt-4o",
      "input_tokens": 150, "output_tokens": 20,
      "cost_usd": 0.0001, "timestamp": 1711600000 }
  ]
}

Value Reports

Generate PDF/HTML reports showing AI agent ROI for stakeholders.

POST /v1/reports/value
{ "period_start": "2026-03-01", "period_end": "2026-03-31",
  "tasks_completed": 12500, "hours_saved": 480 }

Response:
{ "id": "rpt_xxx", "total_cost_usd": 4872.00,
  "total_savings_usd": 1364.16, "roi_percentage": 312,
  "agentic_cost_ratio": 4.2, "pdf_url": "..." }

Savings API

GET /v1/savings/summary
Response:
{ "total_savings_usd": 2840.00,
  "by_category": {
    "model_downgrade": 1680, "caching": 620,
    "cascade_routing": 380, "waste_elimination": 160 },
  "savings_rate_pct": 28.4 }

Health Scores

GET /v1/agents/{agent_id}/health
Response:
{ "agent_id": "support-bot", "score": 87,
  "efficiency_score": 92, "utilization_score": 85,
  "waste_score": 16, "trend": "improving",
  "recommendations": ["Enable prompt caching", "Try cascade routing"] }

Cache Analysis

ACP scans usage logs every hour to find duplicate requests. Anthropic prompt caching saves up to 90%.

GET /v1/cache/analysis
Response:
{ "total_duplicates": 47,
  "cacheable_spend_usd": 734.70,
  "potential_savings_usd": 661.23,
  "agents": [
    { "agent_id": "support-bot", "model": "gpt-4o",
      "duplicates": 127, "potential_savings_usd": 256.05 }
  ] }

Cascade Routing

Start with cheap models, escalate only when needed (Haiku → Sonnet → Opus).

GET /v1/recommendations?type=cascade_routing
Response:
{ "recommendations": [
    { "agent_id": "support-bot", "cascade_from": "gpt-4o",
      "cascade_to": ["gpt-4o-mini", "gpt-4o", "o1"],
      "estimated_savings_usd": 580.00,
      "estimated_savings_pct": 65 }
  ] }

Webhooks

Events: alert.triggered, budget.exceeded, agent.killed, anomaly.detected, recommendation.available, report.generated, cache.opportunity_found

POST /v1/webhooks
{ "url": "https://hooks.slack.com/services/...",
  "events": ["alert.triggered", "budget.exceeded"],
  "secret": "whsec_xxx" }

x402 Overview

x402 is an open protocol by Coinbase for internet-native payments. AI agents pay for premium ACP tools via HTTP 402 with USDC stablecoins on Base. No accounts, no subscriptions — just request, pay, receive.

How it works

Agent requests premium tool

POST /v1/tools/recommendations/premium without payment

ACP returns 402

Response includes price ($0.05 USDC), network (Base), currency (USDC)

Agent pays via x402

Uses Coinbase CDP facilitator to complete USDC payment

Agent retries with proof

Repeats request with payment authorization header

ACP delivers result

Verifies payment, executes tool, logs in ledger

x402 is production-ready: 75M+ transactions, $24M+ volume. Adopted by Stripe, AWS, Vercel, Cloudflare. Built on open standard — not proprietary.

Paid Tools (x402)

Some ACP tools are free, others are premium and require x402 payment.

Tool	Type	Price
acp_check_budget	Free	—
acp_report_usage	Free	—
acp_heartbeat	Free	—
acp_get_recommendation	Free	—
acp_premium_recommendation	Paid (x402)	$0.05
acp_anomaly_scan	Paid (x402)	$0.50
acp_model_route	Paid (x402)	$0.05

# Agent requests premium recommendation
POST /v1/tools/recommendations/premium

# Without payment:
HTTP 402 Payment Required
{ "x402": {
    "price": "0.05",
    "currency": "USDC",
    "network": "base",
    "facilitator": "https://x402.coinbase.com",
    "description": "Premium model recommendation"
  }
}

# After payment:
HTTP 200 OK
{ "recommendation": {
    "current_model": "gpt-4o",
    "suggested_model": "claude-haiku",
    "estimated_savings_pct": 73,
    "quality_impact": "< 2% degradation",
    "confidence": 0.94
  }
}

Wallet & Policies

Manage USDC balance and spend policies for autonomous agent payments. Dashboard: → Wallet + Policies pages.

Wallet API

GET /v1/wallet/balance
{ "balance_usd": 49.35, "currency": "USDC", "network": "base" }

POST /v1/wallet/deposit
{ "amount_usd": 50.00, "currency": "USDC" }

GET /v1/ledger
{ "entries": [
    { "type": "credit", "amount": 50.00, "desc": "USDC deposit" },
    { "type": "debit", "amount": 0.05, "desc": "Premium recommendation",
      "agent": "support-bot", "tx_hash": "0x7a3f...e291" }
  ]
}

Policy API

GET /v1/policies
{ "max_per_call_usd": 1.00,
  "max_daily_usd": 25.00,
  "approval_mode": "auto",
  "allowed_tools": ["acp_premium_recommendation", "acp_anomaly_scan"] }

POST /v1/policies
{ "max_per_call_usd": 0.50,
  "approval_mode": "approve_above_threshold" }

Approval modes

▸auto — agents pay within policy limits without human approval
▸approve_above_threshold — auto below limit, require approval above
▸manual_first_vendor — first payment to new vendor needs approval
▸simulation_only — agents can request quotes but cannot pay

Wallet available on Business and Enterprise tiers. Free and Pro users can still pay per call without a wallet.

Self-Hosted (Docker)

Run ACP on your own infrastructure. Business ($299/mo) and Enterprise ($999+/mo) plans.

git clone https://github.com/Dmitrze/agentcostpilot.git
cd agentcostpilot
cp .env.example .env
docker-compose up -d

# Dashboard at http://localhost:3000

Circuit Breaker

If ACP is down, agents continue working. SDK buffers data locally and syncs when recovered.

from agentcostpilot import ACPClient

acp = ACPClient(
    api_key="acp_sk_...",
    circuit_breaker_threshold=3,
    circuit_breaker_timeout=60
)

GET /v1/mcp/health
{ "status": "ok", "latency_ms": 5, "uptime_seconds": 86400 }

Questions? Contact support@agentcostpilot.com