AI API Cost per Request: Quick Reference Table
How much does a single API call actually cost? We calculated it for all 33 models across 10 providers at four common request sizes. Bookmark this page.
Assumption: Each request sends 3x more input tokens than output tokens (typical for chat, RAG, and code assistant workloads). Costs are per single request. All prices verified May 2026.
All 33 Models — Cost per Request
Sorted cheapest to most expensive. At 1K tokens, costs range from $0.000100 (Llama 3.1 8B) to $0.067500 (GPT-5.5 Pro) — a 675x gap.
| Model | Tier | Provider | 100 tok | 500 tok | 1K tok | 5K tok |
|---|---|---|---|---|---|---|
| Llama 3.1 8B | Budget | Meta (Together.ai) | $0.000010 | $0.000050 | $0.000100 | $0.000500 |
| GPT-oss 20B | Budget | OpenAI | $0.000015 | $0.000074 | $0.000148 | $0.000737 |
| Llama 4 Scout | Budget | Meta (Together.ai) | $0.000017 | $0.000084 | $0.000168 | $0.000838 |
| Gemini 2.0 Flash | Budget | $0.000017 | $0.000087 | $0.000175 | $0.000875 | |
| DeepSeek V4 Flash | Budget | DeepSeek | $0.000018 | $0.000087 | $0.000175 | $0.000875 |
| Mistral Small 4 | Budget | Mistral | $0.000026 | $0.000131 | $0.000262 | $0.001313 |
| GPT-4o mini | Budget | OpenAI | $0.000026 | $0.000131 | $0.000262 | $0.001313 |
| GPT-oss 120B | Budget | OpenAI | $0.000026 | $0.000131 | $0.000262 | $0.001313 |
| Llama 4 Maverick | Budget | Meta (Together.ai) | $0.000030 | $0.000150 | $0.000300 | $0.001500 |
| DeepSeek V3 | Budget | DeepSeek | $0.000048 | $0.000239 | $0.000478 | $0.002387 |
| DeepSeek V4 Pro | Budget | DeepSeek | $0.000055 | $0.000274 | $0.000548 | $0.002737 |
| GPT-5 mini | Budget | OpenAI | $0.000070 | $0.000350 | $0.000700 | $0.003500 |
| Command R | Budget | Cohere | $0.000075 | $0.000375 | $0.000750 | $0.003750 |
| Mistral Large 3 | Budget | Mistral | $0.000075 | $0.000375 | $0.000750 | $0.003750 |
| Llama 3.1 70B | Mid | Meta (Together.ai) | $0.000088 | $0.000440 | $0.000880 | $0.004400 |
| Claude Haiku 4.5 | Budget | Anthropic | $0.000160 | $0.000800 | $0.001600 | $0.008000 |
| Kimi K2.6 | Budget | Moonshot | $0.000161 | $0.000806 | $0.001613 | $0.008063 |
| Gemini 2.5 Pro | Mid | $0.000344 | $0.001719 | $0.003438 | $0.017188 | |
| Grok 3 Mini | Mid | xAI | $0.000350 | $0.001750 | $0.003500 | $0.017500 |
| Jamba 1.5 Large | Mid | AI21 | $0.000350 | $0.001750 | $0.003500 | $0.017500 |
| Command R+ | Mid | Cohere | $0.000438 | $0.002188 | $0.004375 | $0.021875 |
| GPT-4o | Mid | OpenAI | $0.000438 | $0.002188 | $0.004375 | $0.021875 |
| Gemini 3.1 Pro | Mid | $0.000450 | $0.002250 | $0.004500 | $0.022500 | |
| GPT-5.3 Codex | Mid | OpenAI | $0.000481 | $0.002406 | $0.004812 | $0.024063 |
| Claude Sonnet 4 | Mid | Anthropic | $0.000600 | $0.003000 | $0.006000 | $0.030000 |
| Claude Sonnet 4.6 | Mid | Anthropic | $0.000600 | $0.003000 | $0.006000 | $0.030000 |
| Claude Opus 4.7 | Premium | Anthropic | $0.001000 | $0.005000 | $0.010000 | $0.050000 |
| GPT-5.5 | Premium | OpenAI | $0.001125 | $0.005625 | $0.011250 | $0.056250 |
| GPT-5 | Premium | OpenAI | $0.001500 | $0.007500 | $0.015000 | $0.075000 |
| Claude 4 Opus | Premium | Anthropic | $0.003000 | $0.015000 | $0.030000 | $0.150000 |
| Grok 3 | Premium | xAI | $0.006000 | $0.030000 | $0.060000 | $0.300000 |
| GPT-5.5 Pro | Premium | OpenAI | $0.006750 | $0.033750 | $0.067500 | $0.337500 |
Key Takeaways
The 675x Gap
The cheapest model (Llama 3.1 8B at $0.000100/request) costs 675x less than the most expensive (GPT-5.5 Pro at $0.067500/request) for a 1K-token request. At 5K tokens, the gap holds at 675x.
- For high-volume chatbots: Llama 3.1 8B, GPT-oss 20B, or DeepSeek V4 Flash — all under $0.0002 per 1K-token request
- For production code assistants: GPT-4o or Claude Sonnet 4 — $0.004–$0.006 per request with strong reasoning
- For complex research: Claude 4 Opus or GPT-5 — $0.015–$0.03 per request, but best-in-class quality
- Best value in mid-tier: Llama 3.1 70B at $0.00088/request — 5x cheaper than GPT-4o with comparable quality
- Hidden winner: DeepSeek V4 Pro at $0.00055/request — mid-tier quality at budget prices (75% discount through May 2026)
Calculate your exact monthly costs across all 33 models
Open the Calculator — FreeHow to Use This Table
These costs assume a 3:1 input-to-output token ratio. Your actual costs depend on your specific workload:
- Chatbots: Usually 2:1 to 4:1 ratio — this table is accurate
- Code generation: Often 1:3 or higher — output-heavy, so multiply output costs
- Document analysis: Often 10:1 or higher — input-heavy, costs are lower than shown
- RAG pipelines: Usually 5:1 — input-heavy with retrieved context
For exact calculations with your token ratios, use our interactive calculator or token estimator.