AI API Cost Per Request: How Much Does Each LLM Call Actually Cost?
"How much will this cost me per request?" — the question every developer asks before integrating an AI API. And the answer is never simple, because it depends on your token counts, your model choice, and your provider.
We analyzed 33 models across 10 providers to give you exact cost-per-request breakdowns for real-world scenarios. No estimates. No "it depends." Just numbers.
The Quick Answer: Cost Per Request by Model Tier
Here's what a single request costs for a typical workload: 1,500 input tokens, 400 output tokens (roughly a paragraph in, a paragraph out).
That 150x difference between the cheapest and most expensive model is why choosing the right model matters so much. The same request that costs $0.00007 on DeepSeek V4 Flash costs $0.05 on GPT-5.5.
Scenario 1: Chatbot (1,000 requests/day)
A customer support chatbot handling 1,000 conversations per day with average messages of 1,500 input tokens and 400 output tokens.
Budget
Mid-Range
Premium
Flagship
For a chatbot, the quality difference between GPT-4o mini and GPT-4o is often negligible. Most users won't notice. But your bank account will notice the 10x cost difference.
Scenario 2: Code Assistant (10,000 requests/day)
A coding assistant processing 10,000 requests daily with 2,000 input tokens and 800 output tokens (code completions are longer).
Budget
Mid-Range
Premium
Flagship
Code assistants are where model routing really pays off. Use a cheap model for simple completions (variable names, boilerplate) and a premium model for complex logic. This can cut costs by 60-70%.
Scenario 3: RAG Pipeline (5,000 requests/day)
A retrieval-augmented generation system processing 5,000 queries daily with 3,000 input tokens (prompt + context) and 600 output tokens.
Budget
Mid-Range
Premium
Flagship
RAG pipelines have a hidden cost: the input tokens are large because you're stuffing context into the prompt. At 3,000 input tokens per request, input costs often exceed output costs. This is where prompt optimization saves real money — trimming 500 tokens from your context window saves 17% on input costs.
Hidden Costs Most People Forget
- Retries and failures: Budget 10-15% extra for failed requests. Rate limits, timeouts, and errors all cost money without producing output.
- System prompts: Your system prompt (instructions, rules, personality) adds 200-500 tokens to every single request. For 10,000 requests/day, that's 2-5M extra input tokens/month.
- Long context trap: Using GPT-5's 128K context window? You're paying for all 128K tokens even if you only need 2K. Shorter prompts = lower costs.
- Embedding costs: If you're building RAG, add $10-50/month for embedding models on top of generation costs.
- Batch vs. real-time: OpenAI and Anthropic offer batch APIs at 50% discount. If you can wait a few hours, halve your costs.
The "Per Request" Trap
Focusing on cost-per-request alone is misleading. The real metric is cost-per-outcome:
- A $0.05 request that converts a customer is worth more than a $0.001 request that doesn't
- A $0.02 request that produces correct code saves more than a $0.001 request that produces bugs
- Cheaper models often need more retries and post-processing, which adds hidden costs
The goal isn't to minimize cost per request. It's to maximize value per dollar spent. Sometimes that means using an expensive model. Usually it means using the cheapest model that's good enough.
Calculate your exact cost per request.
Enter your token counts. Get instant cost-per-request for all 33 models.
Try the APIpulse CalculatorOr see per-request breakdowns for every model, or real-world scenarios.
Provider Pricing Comparison (Per 1M Tokens)
For reference, here are the per-1M-token prices across providers for their flagship models:
How to Cut Your Per-Request Cost
- Measure first: Use APIpulse to calculate your actual per-request cost before optimizing
- Route smartly: Use cheap models for simple tasks, expensive models for complex reasoning. Multi-model routing can cut costs 40-60%.
- Shorten prompts: Remove unnecessary context. Every 100 tokens saved = 100 fewer tokens billed on every request.
- Cache aggressively: If you're sending the same prompt repeatedly, cache the response. Batch processing cuts costs 50%.
- Compare providers: The same quality tier varies wildly in price. Compare side by side before committing.
The bottom line: your cost per request is determined by your model choice, token counts, and request volume. Get these three right, and you'll spend a fraction of what most teams pay.
Get notified when API prices change
No spam. Only pricing updates and new features. Unsubscribe anytime.