AI API Cost per Token Explained: The Complete Pricing Guide 2026
Every AI API charges per token. But what is a token? How do you calculate costs? And why do output tokens cost 3-6x more than input tokens? This guide breaks it all down with real numbers from 34 models across 10 providers.
What Is a Token?
Try It Live — Instant Cost Calculator
See exactly what this model costs for your workload. No signup needed.
A token is a chunk of text that the AI model processes. Roughly:
- 1 token ≈ 4 characters in English
- 1 token ≈ ¾ of a word
- 100 tokens ≈ 75 words
- 1,000 tokens ≈ 750 words (about 1.5 pages)
When you send a request to an AI API, the model counts your input tokens (what you send) and generates output tokens (what it returns). You pay for both — but at different rates.
Input vs Output Tokens: Why the Price Difference?
Every AI API has two prices:
- Input price: Cost per 1M tokens you send to the model (your prompt + context)
- Output price: Cost per 1M tokens the model generates (its response)
Output tokens always cost more — typically 3-6x the input price. Here's why:
- Compute intensity: Generating each output token requires running the full model forward pass. Input tokens can be processed in parallel (batched), but output tokens must be generated one at a time.
- Memory requirements: The model must maintain attention over all previous tokens while generating each new one.
- Latency: Output generation is the bottleneck — users wait for it, so providers charge more.
Pro Tip: Control Output Length
Since output tokens cost 3-6x more, setting a max_tokens limit is the single easiest way to reduce costs. Most responses don't need 4,096 tokens — set it to 500-1000 and save 50-75% on output costs.
The Cost Formula
Example: 1,000 input tokens + 500 output tokens on GPT-4o mini ($0.15/$0.60 per 1M):
Input cost: 1,000 ÷ 1,000,000 × $0.15 = $0.00015
Output cost: 500 ÷ 1,000,000 × $0.60 = $0.00030
Total per request: $0.00045
At 1,000 requests/day × 30 days = $13.50/month
Pricing Across 34 Models (Per 1M Tokens)
| Model | Provider | Input | Output | Output/Input Ratio | Context |
|---|---|---|---|---|---|
| Gemini 2.0 Flash Lite | $0.075 | $0.30 | 4.0x | 1M | |
| Llama 3.1 8B | Meta | $0.10 | $0.10 | 1.0x | 128K |
| Gemini 2.0 Flash | $0.10 | $0.40 | 4.0x | 1M | |
| Llama 4 Scout | Meta | $0.11 | $0.34 | 3.1x | 10M |
| DeepSeek V4 Flash | DeepSeek | $0.14 | $0.28 | 2.0x | 1M |
| GPT-4o mini | OpenAI | $0.15 | $0.60 | 4.0x | 128K |
| GPT-5 mini | OpenAI | $0.25 | $2.00 | 8.0x | 272K |
| Gemini 2.5 Pro | $1.25 | $10.00 | 8.0x | 1M | |
| GPT-5 | OpenAI | $1.25 | $10.00 | 8.0x | 272K |
| Claude Haiku 4.5 | Anthropic | $1.00 | $5.00 | 5.0x | 200K |
| Claude Sonnet 4.6 | Anthropic | $3.00 | $15.00 | 5.0x | 1M |
| GPT-5.5 | OpenAI | $5.00 | $30.00 | 6.0x | 1M |
| Claude Opus 4.8 | Anthropic | $5.00 | $25.00 | 5.0x | 1M |
| Grok 3 | xAI | $30.00 | $150.00 | 5.0x | 128K |
| GPT-5.5 Pro | OpenAI | $30.00 | $180.00 | 6.0x | 1M |
Key observation: The cheapest input tokens (Gemini Flash Lite at $0.075) are 400x cheaper than the most expensive (GPT-5.5 Pro at $30.00). The output spread is even wider at 600x. Model choice is the single biggest lever for controlling costs.
How Tokens Add Up in Real Applications
Chatbot (Simple Q&A)
System prompt: 200 tokens (fixed instructions)
User message: 100 tokens (the question)
Model response: 300 tokens (the answer)
Total: 600 tokens per request
Cost on GPT-4o mini: $0.00027/request
Cost on GPT-5: $0.00375/request (14x more)
RAG Pipeline (Search + Generate)
System prompt: 300 tokens
Retrieved context: 2,000 tokens (5 documents)
User question: 100 tokens
Model response: 500 tokens
Total: 2,900 tokens per request
Cost on GPT-4o mini: $0.00174/request
Cost on GPT-5: $0.02415/request (14x more)
Coding Assistant
System prompt: 500 tokens (code instructions)
Code context: 3,000 tokens (file contents)
User instruction: 200 tokens
Model response: 1,500 tokens (code generation)
Total: 5,200 tokens per request
Cost on Claude Sonnet 4.6: $0.039/request
Cost on GPT-5.5: $0.0725/request (1.9x more)
5 Ways to Reduce Your Token Costs
- Shorter prompts: Remove unnecessary instructions, use concise system prompts. Every token in your prompt costs money.
- Conversation pruning: Don't send 50 messages of history. Keep the last 5-10 and summarize the rest.
- Output limits: Set
max_tokensto what you actually need. Most chat responses don't need 4,096 tokens. - Model routing: Use cheap models for simple tasks, expensive ones for complex reasoning.
- Prompt caching: OpenAI and Anthropic offer prompt caching — identical prefixes cost 50-90% less.
Calculate Your Costs
Don't guess — calculate. Enter your exact usage into our calculator to see what every model costs you per month.
See your exact costs across all 34 models
Enter your daily requests and token counts. Get instant cost comparisons sorted cheapest-first.
Try the Monthly Spend EstimatorTry it free: APIpulse Cost Calculator — estimate your monthly spend across 34 models and 10 providers in 30 seconds.