โ† Back to blog

AI API Cost Per Request: How Much Does Each LLM Call Actually Cost?

โš ๏ธ Deprecation alert: Claude 4 Opus and Claude Sonnet 4 retired on June 15, 2026. If you're using these models, see our migration guide for step-by-step instructions.

๐Ÿšจ Claude 4 retired June 15: See all 42 alternatives, calculate your savings, and get migration code on our Claude 4 Migration Hub.

"How much will this cost me per request?" โ€” the question every developer asks before integrating an AI API. And the answer is never simple, because it depends on your token counts, your model choice, and your provider.

We analyzed 42 models across 10 providers to give you exact cost-per-request breakdowns for real-world scenarios. No estimates. No "it depends." Just numbers.

The Quick Answer: Cost Per Request by Model Tier

Here's what a single request costs for a typical workload: 1,500 input tokens, 400 output tokens (roughly a paragraph in, a paragraph out).

Cost Per Request โ€” 1,500 input + 400 output tokens
GPT-4o mini$0.00047
Claude Haiku 4.5$0.00060
Gemini 2.0 Flash$0.00019
DeepSeek V4 Flash$0.00007
GPT-4o$0.00470
Claude Sonnet 4$0.00450
Gemini 2.5 Pro$0.00325
GPT-5$0.01250
Claude Opus 4$0.02250
GPT-5.5$0.05000
Cheapest โ†’ Most Expensive150x range

That 150x difference between the cheapest and most expensive model is why choosing the right model matters so much. The same request that costs $0.00007 on DeepSeek V4 Flash costs $0.05 on GPT-5.5.

Scenario 1: Chatbot (1,000 requests/day)

A customer support chatbot handling 1,000 conversations per day with average messages of 1,500 input tokens and 400 output tokens.

Budget

$2
DeepSeek V4 Flash โ€” $0.07/month

Mid-Range

$14
GPT-4o mini โ€” $14.10/month

Premium

$141
GPT-4o โ€” $141/month

Flagship

$675
Claude Opus 4 โ€” $675/month

For a chatbot, the quality difference between GPT-4o mini and GPT-4o is often negligible. Most users won't notice. But your bank account will notice the 10x cost difference.

Scenario 2: Code Assistant (10,000 requests/day)

A coding assistant processing 10,000 requests daily with 2,000 input tokens and 800 output tokens (code completions are longer).

Budget

$4
DeepSeek V4 Flash

Mid-Range

$39
Claude Sonnet 4

Premium

$125
GPT-4o

Flagship

$2,250
Claude Opus 4

Code assistants are where model routing really pays off. Use a cheap model for simple completions (variable names, boilerplate) and a premium model for complex logic. This can cut costs by 60-70%.

Scenario 3: RAG Pipeline (5,000 requests/day)

A retrieval-augmented generation system processing 5,000 queries daily with 3,000 input tokens (prompt + context) and 600 output tokens.

Budget

$8
GPT-4o mini

Mid-Range

$68
Claude Sonnet 4

Premium

$225
GPT-5

Flagship

$675
Claude Opus 4

RAG pipelines have a hidden cost: the input tokens are large because you're stuffing context into the prompt. At 3,000 input tokens per request, input costs often exceed output costs. This is where prompt optimization saves real money โ€” trimming 500 tokens from your context window saves 17% on input costs.

Hidden Costs Most People Forget

The "Per Request" Trap

Focusing on cost-per-request alone is misleading. The real metric is cost-per-outcome:

The goal isn't to minimize cost per request. It's to maximize value per dollar spent. Sometimes that means using an expensive model. Usually it means using the cheapest model that's good enough.

Calculate your exact cost per request.

Enter your token counts. Get instant cost-per-request for all 42 models.

Try the APIpulse Calculator

Or see per-request breakdowns for every model, or real-world scenarios.

๐Ÿ” Free Cost Audit โ€” See if you're overpaying for AI APIs

๐ŸŽฏ API Cost Score

Rate your API setup โ€” get a letter grade in 30 seconds

Provider Pricing Comparison (Per 1M Tokens)

For reference, here are the per-1M-token prices across providers for their flagship models:

Flagship Model Pricing โ€” Per 1M Tokens
DeepSeek V4 Flash (input/output)$0.07 / $0.27
Gemini 2.0 Flash$0.10 / $0.40
GPT-4o mini$0.15 / $0.60
Claude Haiku 4.5$1.00 / $5.00
Mistral Small$0.10 / $0.30
GPT-4o$2.50 / $10.00
Claude Sonnet 4$3.00 / $15.00
Gemini 2.5 Pro$1.25 / $10.00
GPT-5$5.00 / $15.00
Claude Opus 4$15.00 / $75.00

How to Cut Your Per-Request Cost

  1. Measure first: Use APIpulse to calculate your actual per-request cost before optimizing
  2. ๐ŸŽฏ API Cost Score

    Rate your API setup โ€” get a letter grade in 30 seconds

  3. Route smartly: Use cheap models for simple tasks, expensive models for complex reasoning. Multi-model routing can cut costs 40-60%.
  4. Shorten prompts: Remove unnecessary context. Every 100 tokens saved = 100 fewer tokens billed on every request.
  5. Cache aggressively: If you're sending the same prompt repeatedly, cache the response. Batch processing cuts costs 50%.
  6. Compare providers: The same quality tier varies wildly in price. Compare side by side before committing.

The bottom line: your cost per request is determined by your model choice, token counts, and request volume. Get these three right, and you'll spend a fraction of what most teams pay.

\

๐ŸŽฏ Rate Your API Setup in 30 Seconds

Get an A+ to F grade on your AI API costs. See how you compare and find cheaper alternatives instantly.

Get Your Cost Score โ†’

๐Ÿ“Š Generate Your Personalized API Cost Report

Select your model, enter your monthly spend, and get a custom savings report with cheaper alternatives โ€” free, in 60 seconds.

Generate My Report โ†’

Related Reading

Get notified when API prices change

No spam. Only pricing updates and new features. Unsubscribe anytime.

Want to optimize your AI API costs?

APIpulse Pro ($29 one-time) includes saved scenarios, cost report exports, and personalized recommendations that can save you up to 40%.

Get Pro — $29

Save money: ๐Ÿ“Š Live API Pricing ยท Cost Optimizer โ€” find out how much you could save by switching models. Free tool.

๐Ÿ’ธ Looking for DeepSeek V4 Flash Alternatives?
5 models ranked by cost โ€” some offer better quality at similar prices.
See 5 DeepSeek V4 Flash Alternatives โ†’
๐Ÿ’ธ Looking for Mistral Small 4 Alternatives?
5 models ranked by cost โ€” some are 90% cheaper.
See 5 Mistral Small 4 Alternatives โ†’
๐Ÿ”ง Free Embeddable Pricing Widget
Add live AI API pricing to your docs, blog, or README with one script tag. 42 models, auto-updating.
Get the Free Widget โ†’