← Back to blog

Calculator May 2, 2026

AI API Cost Per Request: How Much Does Each LLM Call Actually Cost?

"How much will this cost me per request?" — the question every developer asks before integrating an AI API. And the answer is never simple, because it depends on your token counts, your model choice, and your provider.

We analyzed 33 models across 10 providers to give you exact cost-per-request breakdowns for real-world scenarios. No estimates. No "it depends." Just numbers.

The Quick Answer: Cost Per Request by Model Tier

Here's what a single request costs for a typical workload: 1,500 input tokens, 400 output tokens (roughly a paragraph in, a paragraph out).

Cost Per Request — 1,500 input + 400 output tokens

GPT-4o mini$0.00047

Claude Haiku 4.5$0.00060

Gemini 2.0 Flash$0.00019

DeepSeek V4 Flash$0.00007

GPT-4o$0.00470

Claude Sonnet 4$0.00450

Gemini 2.5 Pro$0.00325

GPT-5$0.01250

Claude Opus 4$0.02250

GPT-5.5$0.05000

Cheapest → Most Expensive150x range

That 150x difference between the cheapest and most expensive model is why choosing the right model matters so much. The same request that costs $0.00007 on DeepSeek V4 Flash costs $0.05 on GPT-5.5.

Scenario 1: Chatbot (1,000 requests/day)

A customer support chatbot handling 1,000 conversations per day with average messages of 1,500 input tokens and 400 output tokens.

Budget

DeepSeek V4 Flash — $0.07/month

Mid-Range

$14

GPT-4o mini — $14.10/month

Premium

$141

GPT-4o — $141/month

Flagship

$675

Claude Opus 4 — $675/month

For a chatbot, the quality difference between GPT-4o mini and GPT-4o is often negligible. Most users won't notice. But your bank account will notice the 10x cost difference.

Scenario 2: Code Assistant (10,000 requests/day)

A coding assistant processing 10,000 requests daily with 2,000 input tokens and 800 output tokens (code completions are longer).

Budget

DeepSeek V4 Flash

Mid-Range

$39

Claude Sonnet 4

Premium

$125

GPT-4o

Flagship

$2,250

Claude Opus 4

Code assistants are where model routing really pays off. Use a cheap model for simple completions (variable names, boilerplate) and a premium model for complex logic. This can cut costs by 60-70%.

Scenario 3: RAG Pipeline (5,000 requests/day)

A retrieval-augmented generation system processing 5,000 queries daily with 3,000 input tokens (prompt + context) and 600 output tokens.

Budget

GPT-4o mini

Mid-Range

$68

Claude Sonnet 4

Premium

$225

GPT-5

Flagship

$675

Claude Opus 4

RAG pipelines have a hidden cost: the input tokens are large because you're stuffing context into the prompt. At 3,000 input tokens per request, input costs often exceed output costs. This is where prompt optimization saves real money — trimming 500 tokens from your context window saves 17% on input costs.

Hidden Costs Most People Forget

Retries and failures: Budget 10-15% extra for failed requests. Rate limits, timeouts, and errors all cost money without producing output.
System prompts: Your system prompt (instructions, rules, personality) adds 200-500 tokens to every single request. For 10,000 requests/day, that's 2-5M extra input tokens/month.
Long context trap: Using GPT-5's 128K context window? You're paying for all 128K tokens even if you only need 2K. Shorter prompts = lower costs.
Embedding costs: If you're building RAG, add $10-50/month for embedding models on top of generation costs.
Batch vs. real-time: OpenAI and Anthropic offer batch APIs at 50% discount. If you can wait a few hours, halve your costs.

The "Per Request" Trap

Focusing on cost-per-request alone is misleading. The real metric is cost-per-outcome:

A $0.05 request that converts a customer is worth more than a $0.001 request that doesn't
A $0.02 request that produces correct code saves more than a $0.001 request that produces bugs
Cheaper models often need more retries and post-processing, which adds hidden costs

The goal isn't to minimize cost per request. It's to maximize value per dollar spent. Sometimes that means using an expensive model. Usually it means using the cheapest model that's good enough.

Calculate your exact cost per request.

Enter your token counts. Get instant cost-per-request for all 33 models.

Try the APIpulse Calculator

Or see per-request breakdowns for every model, or real-world scenarios.

Provider Pricing Comparison (Per 1M Tokens)

For reference, here are the per-1M-token prices across providers for their flagship models:

Flagship Model Pricing — Per 1M Tokens

DeepSeek V4 Flash (input/output)$0.07 / $0.27

Gemini 2.0 Flash$0.10 / $0.40

GPT-4o mini$0.15 / $0.60

Claude Haiku 4.5$1.00 / $5.00

Mistral Small$0.10 / $0.30

GPT-4o$2.50 / $10.00

Claude Sonnet 4$3.00 / $15.00

Gemini 2.5 Pro$1.25 / $10.00

GPT-5$5.00 / $15.00

Claude Opus 4$15.00 / $75.00

How to Cut Your Per-Request Cost

Measure first: Use APIpulse to calculate your actual per-request cost before optimizing
Route smartly: Use cheap models for simple tasks, expensive models for complex reasoning. Multi-model routing can cut costs 40-60%.
Shorten prompts: Remove unnecessary context. Every 100 tokens saved = 100 fewer tokens billed on every request.
Cache aggressively: If you're sending the same prompt repeatedly, cache the response. Batch processing cuts costs 50%.
Compare providers: The same quality tier varies wildly in price. Compare side by side before committing.

The bottom line: your cost per request is determined by your model choice, token counts, and request volume. Get these three right, and you'll spend a fraction of what most teams pay.

Get notified when API prices change

No spam. Only pricing updates and new features. Unsubscribe anytime.

AI API Cost Per Request: How Much Does Each LLM Call Actually Cost?

The Quick Answer: Cost Per Request by Model Tier

Scenario 1: Chatbot (1,000 requests/day)

Budget

Mid-Range

Premium

Flagship

Scenario 2: Code Assistant (10,000 requests/day)

Budget

Mid-Range

Premium

Flagship

Scenario 3: RAG Pipeline (5,000 requests/day)

Budget

Mid-Range

Premium

Flagship

Hidden Costs Most People Forget

The "Per Request" Trap

Provider Pricing Comparison (Per 1M Tokens)

How to Cut Your Per-Request Cost

Related Reading

Get notified when API prices change