← Back to blog

AI API Cost Per Request: How Much Does Each LLM Call Actually Cost?

"How much will this cost me per request?" — the question every developer asks before integrating an AI API. And the answer is never simple, because it depends on your token counts, your model choice, and your provider.

We analyzed 33 models across 10 providers to give you exact cost-per-request breakdowns for real-world scenarios. No estimates. No "it depends." Just numbers.

The Quick Answer: Cost Per Request by Model Tier

Here's what a single request costs for a typical workload: 1,500 input tokens, 400 output tokens (roughly a paragraph in, a paragraph out).

Cost Per Request — 1,500 input + 400 output tokens
GPT-4o mini$0.00047
Claude Haiku 4.5$0.00060
Gemini 2.0 Flash$0.00019
DeepSeek V4 Flash$0.00007
GPT-4o$0.00470
Claude Sonnet 4$0.00450
Gemini 2.5 Pro$0.00325
GPT-5$0.01250
Claude Opus 4$0.02250
GPT-5.5$0.05000
Cheapest → Most Expensive150x range

That 150x difference between the cheapest and most expensive model is why choosing the right model matters so much. The same request that costs $0.00007 on DeepSeek V4 Flash costs $0.05 on GPT-5.5.

Scenario 1: Chatbot (1,000 requests/day)

A customer support chatbot handling 1,000 conversations per day with average messages of 1,500 input tokens and 400 output tokens.

Budget

$2
DeepSeek V4 Flash — $0.07/month

Mid-Range

$14
GPT-4o mini — $14.10/month

Premium

$141
GPT-4o — $141/month

Flagship

$675
Claude Opus 4 — $675/month

For a chatbot, the quality difference between GPT-4o mini and GPT-4o is often negligible. Most users won't notice. But your bank account will notice the 10x cost difference.

Scenario 2: Code Assistant (10,000 requests/day)

A coding assistant processing 10,000 requests daily with 2,000 input tokens and 800 output tokens (code completions are longer).

Budget

$4
DeepSeek V4 Flash

Mid-Range

$39
Claude Sonnet 4

Premium

$125
GPT-4o

Flagship

$2,250
Claude Opus 4

Code assistants are where model routing really pays off. Use a cheap model for simple completions (variable names, boilerplate) and a premium model for complex logic. This can cut costs by 60-70%.

Scenario 3: RAG Pipeline (5,000 requests/day)

A retrieval-augmented generation system processing 5,000 queries daily with 3,000 input tokens (prompt + context) and 600 output tokens.

Budget

$8
GPT-4o mini

Mid-Range

$68
Claude Sonnet 4

Premium

$225
GPT-5

Flagship

$675
Claude Opus 4

RAG pipelines have a hidden cost: the input tokens are large because you're stuffing context into the prompt. At 3,000 input tokens per request, input costs often exceed output costs. This is where prompt optimization saves real money — trimming 500 tokens from your context window saves 17% on input costs.

Hidden Costs Most People Forget

The "Per Request" Trap

Focusing on cost-per-request alone is misleading. The real metric is cost-per-outcome:

The goal isn't to minimize cost per request. It's to maximize value per dollar spent. Sometimes that means using an expensive model. Usually it means using the cheapest model that's good enough.

Calculate your exact cost per request.

Enter your token counts. Get instant cost-per-request for all 33 models.

Try the APIpulse Calculator

Or see per-request breakdowns for every model, or real-world scenarios.

Provider Pricing Comparison (Per 1M Tokens)

For reference, here are the per-1M-token prices across providers for their flagship models:

Flagship Model Pricing — Per 1M Tokens
DeepSeek V4 Flash (input/output)$0.07 / $0.27
Gemini 2.0 Flash$0.10 / $0.40
GPT-4o mini$0.15 / $0.60
Claude Haiku 4.5$1.00 / $5.00
Mistral Small$0.10 / $0.30
GPT-4o$2.50 / $10.00
Claude Sonnet 4$3.00 / $15.00
Gemini 2.5 Pro$1.25 / $10.00
GPT-5$5.00 / $15.00
Claude Opus 4$15.00 / $75.00

How to Cut Your Per-Request Cost

  1. Measure first: Use APIpulse to calculate your actual per-request cost before optimizing
  2. Route smartly: Use cheap models for simple tasks, expensive models for complex reasoning. Multi-model routing can cut costs 40-60%.
  3. Shorten prompts: Remove unnecessary context. Every 100 tokens saved = 100 fewer tokens billed on every request.
  4. Cache aggressively: If you're sending the same prompt repeatedly, cache the response. Batch processing cuts costs 50%.
  5. Compare providers: The same quality tier varies wildly in price. Compare side by side before committing.

The bottom line: your cost per request is determined by your model choice, token counts, and request volume. Get these three right, and you'll spend a fraction of what most teams pay.

Related Reading

Get notified when API prices change

No spam. Only pricing updates and new features. Unsubscribe anytime.