State of AI API Pricing — Q2 2026 Report
The AI API market in Q2 2026 is the most competitive it's ever been. With 10 providers, 33 models, and prices ranging from $0.10 to $75 per million tokens, the choices — and the savings — are enormous.
This report covers every major provider, every pricing change since Q1, and the trends that will shape your costs for the rest of 2026.
Key Findings
Full Pricing Table — All 33 Models
Prices are per 1 million tokens. Sorted by input cost (cheapest first).
| Model | Provider | Input | Output | Context | Q1→Q2 |
|---|---|---|---|---|---|
| Mistral Small 4 | Mistral | $0.10 | $0.30 | 32K | — |
| Gemini 2.0 Flash | $0.10 | $0.40 | 1M | -50% | |
| DeepSeek V4 Flash | DeepSeek | $0.14 | $0.28 | 128K | NEW |
| GPT-4o mini | OpenAI | $0.15 | $0.60 | 128K | — |
| Gemini 2.5 Flash | $0.15 | $0.60 | 1M | NEW | |
| Claude Haiku 4.5 | Anthropic | $1.00 | $5.00 | 200K | — |
| Llama 4 Scout | Together | $0.10 | $0.30 | 128K | NEW |
| Kimi K2.6 | Moonshot | $0.60 | $2.50 | 128K | NEW |
| Jamba 1.5 Mini | AI21 | $0.20 | $0.40 | 256K | — |
| Command R | Cohere | $0.15 | $0.60 | 128K | — |
| Grok 3 Mini | xAI | $0.30 | $0.50 | 128K | NEW |
| Mistral Large 3 | Mistral | $2.00 | $6.00 | 128K | — |
| DeepSeek V4 Pro | DeepSeek | $0.55 | $2.19 | 128K | NEW |
| Command R+ | Cohere | $2.50 | $10.00 | 128K | — |
| GPT-4o | OpenAI | $2.50 | $10.00 | 128K | — |
| Gemini 2.5 Pro | $1.25 | $10.00 | 1M | -67% | |
| Claude Sonnet 4 | Anthropic | $3.00 | $15.00 | 200K | NEW |
| Grok 3 | xAI | $3.00 | $15.00 | 128K | — |
| GPT-5 | OpenAI | $1.25 | $10.00 | 272K | NEW |
| Llama 4 Maverick | Together | $0.30 | $0.90 | 1M | NEW |
| Claude 4 Opus | Anthropic | $15.00 | $75.00 | 200K | NEW |
| GPT-5.5 | OpenAI | $5.00 | $30.00 | 256K | NEW |
Full pricing for all 33 models available on our Pricing Index Model Matrix page.
Q1 → Q2 Price Changes
The biggest story of Q2 2026 is price compression at the bottom. Budget models are getting cheaper and better, while premium models are holding steady or dropping slightly.
Biggest Price Drops
- Gemini 2.0 Flash: -50% — Google's aggressive pricing on their fast model
- Gemini 2.5 Pro: -67% — From $3.75 to $1.25 input, making it competitive with mid-tier models
- DeepSeek V4 Flash: New entry at $0.14/$0.28 — cheapest model with 128K context
New Entries
- GPT-5.5: OpenAI's latest flagship at $5/$30
- Claude 4 Opus: Anthropic's premium at $15/$75 (the most expensive model)
- Llama 4 Scout & Maverick: Meta's open-source models on Together.ai
- Kimi K2.6: Moonshot's budget contender at $0.60/$2.50
- Grok 3 & Mini: xAI's models at $3/$15 and $0.30/$0.50
Provider Market Share
Based on our usage data and community surveys:
OpenAI still leads, but Anthropic and Google are gaining ground fast. DeepSeek and Together.ai are the fastest-growing providers, driven by aggressive pricing.
Cost per Use Case
Here's what you'll actually pay for common use cases, based on 100K requests/month:
| Use Case | Cheapest Option | Best Value | Premium Pick |
|---|---|---|---|
| Chatbot | Mistral Small 4: $8/mo | GPT-4o mini: $15/mo | Claude Sonnet 4: $45/mo |
| Code Generation | GPT-4o mini: $15/mo | Claude Sonnet 4: $45/mo | GPT-5: $200/mo |
| Content Writing | Gemini Flash: $6/mo | GPT-4o: $75/mo | Claude Opus: $300/mo |
| Data Extraction | Mistral Small 4: $8/mo | GPT-4o mini: $15/mo | Gemini 2.5 Pro: $55/mo |
| Customer Support | DeepSeek Flash: $7/mo | GPT-4o mini: $15/mo | Claude Sonnet 4: $45/mo |
Optimization Strategies That Work
- Model routing: Route 80% of traffic to budget models, 20% to premium. Average savings: 45%.
- Prompt compression: Reduce input tokens by 40% through better prompts. At scale, this saves $500+/month.
- Response caching: Cache 20-30% of identical requests. Near-zero marginal cost.
- Batch processing: Non-urgent tasks use batch APIs at 50% discount.
- Multi-provider strategy: Use 2-3 providers and route based on price + quality.
The most expensive model is not always the best choice. A $0.15 model that works 90% of the time is cheaper than a $10 model that works 100% of the time — if you handle the 10% edge cases separately.
Predictions for H2 2026
- Budget models will reach $0.05/input — DeepSeek and Llama 4 are driving prices down fast.
- Premium models will stabilize — GPT-5, Claude 4 Opus, and GPT-5.5 won't drop much further.
- Open-source will grow — Llama 4, Mistral, and new entrants will capture 30%+ of the market.
- Model routing will become standard — Every serious AI app will use 2+ models.
- Embedding costs will drop 50% — Competition in the embedding space is just starting.
Track these prices in real-time.
APIpulse tracks all 33 models across 10 providers. Get notified when prices change.
Try the Free CalculatorWant to save scenarios and export cost reports from this data?
APIpulse Pro lets you save up to 10 configurations, export professional HTML reports, and get personalized optimization recommendations.
Get Pro — $29 one-timeGet notified when API prices change
No spam. Only pricing updates and new features. Unsubscribe anytime.