State of LLM Pricing: Q2 2026
33 models. 10 providers. The definitive quarterly report on what's cheap, what's expensive, and what's changed since last quarter.
The LLM API market in Q2 2026 looks nothing like it did six months ago. Prices have cratered for some models and skyrocketed for others. Budget-tier models now match 2024 flagship quality. And the smartest teams have stopped picking one model โ they're routing dynamically based on task complexity.
We track every price across every provider. Here's the full picture.
Q2 2026 at a Glance
(Mistral Large, DeepSeek V4 Pro)
(Grok 3)
(Gemini Flash Lite)
across 10 providers
The Biggest Price Moves
If you haven't re-evaluated your AI provider in the last quarter, you're almost certainly overpaying. Here are the moves that matter:
The Drops
| Model | Old Price (input) | New Price (input) | Change |
|---|---|---|---|
| Mistral Large 3 | $2.00/1M | $0.50/1M | -75% |
| DeepSeek V4 Pro | $1.75/1M | $0.44/1M | -75% |
| GPT-4o | $10.00/1M | $2.50/1M | -67% |
| Claude Opus 4.7 | $15.00/1M | $5.00/1M | -67% |
The Hike
| Model | Old Price (input) | New Price (input) | Change |
|---|---|---|---|
| Grok 3 | $3.00/1M | $30.00/1M | +10x |
Grok 3's 10x price increase is the largest single-model price hike we've ever tracked. At $30/$150 per 1M tokens, it's now the most expensive model in our database โ pricier than GPT-5.5 and Claude Opus 4.7 combined. If you were using Grok 3 for production workloads, it's time to switch.
Full Pricing Matrix: Every Model, Every Provider
Here's the complete current pricing as of May 2026, organized by tier.
Maximum capability. Best for complex reasoning, multimodal tasks, and high-stakes outputs.
| Model | Provider | Input | Output | Context |
|---|---|---|---|---|
| GPT-5.5 Pro | OpenAI | $30.00 | $180.00 | 1M |
| Grok 3 | xAI | $30.00 | $150.00 | 128K |
| Claude 4 Opus | Anthropic | $15.00 | $75.00 | 200K |
| GPT-5.5 | OpenAI | $5.00 | $30.00 | 1M |
| Claude Opus 4.7 | Anthropic | $5.00 | $25.00 | 1M |
Strong performance at reasonable cost. The sweet spot for most production workloads.
| Model | Provider | Input | Output | Context |
|---|---|---|---|---|
| Claude Sonnet 4.6 | Anthropic | $3.00 | $15.00 | 1M |
| GPT-4o | OpenAI | $2.50 | $10.00 | 128K |
| Command R+ | Cohere | $2.50 | $10.00 | 128K |
| Gemini 3.1 Pro | $2.00 | $12.00 | 1M | |
| Jamba 1.5 Large | AI21 | $2.00 | $8.00 | 256K |
| GPT-5.3 Codex | OpenAI | $1.75 | $14.00 | 400K |
| GPT-5 | OpenAI | $1.25 | $10.00 | 272K |
| Gemini 2.5 Pro | $1.25 | $10.00 | 1M | |
| Claude Haiku 4.5 | Anthropic | $1.00 | $5.00 | 200K |
| Llama 3.1 70B | Meta (Together.ai) | $0.88 | $0.88 | 128K |
Production-viable quality at rock-bottom prices. The biggest story of Q2 2026.
| Model | Provider | Input | Output | Context |
|---|---|---|---|---|
| Kimi K2.6 | Moonshot | $0.90 | $3.75 | 256K |
| Mistral Large 3 | Mistral | $0.50 | $1.50 | 128K |
| DeepSeek V4 Pro | DeepSeek | $0.44 | $0.87 | 1M |
| Command R | Cohere | $0.50 | $1.50 | 128K |
| Gemini 2.0 Flash | $0.10 | $0.40 | 1M | |
| DeepSeek V4 Flash | DeepSeek | $0.14 | $0.28 | 1M |
| GPT-5 mini | OpenAI | $0.25 | $2.00 | 272K |
| GPT-4o mini | OpenAI | $0.15 | $0.60 | 128K |
| GPT-oss 120B | OpenAI | $0.15 | $0.60 | 128K |
| Mistral Small 4 | Mistral | $0.15 | $0.60 | 128K |
| Llama 4 Scout | Meta (Together.ai) | $0.11 | $0.34 | 10M |
| Llama 4 Maverick | Meta (Together.ai) | $0.20 | $0.60 | 10M |
| Llama 3.1 8B | Meta (Together.ai) | $0.10 | $0.10 | 128K |
| GPT-oss 20B | OpenAI | $0.08 | $0.35 | 128K |
| Gemini 2.0 Flash Lite | $0.075 | $0.30 | 1M |
Price difference between the most expensive model (GPT-5.5 Pro at $30/$180) and the cheapest (Gemini Flash Lite at $0.075/$0.30)
Five Trends Defining Q2 2026
1. Budget Models Are Now Production-Viable
This is the biggest story of the quarter. Models like Gemini 2.0 Flash Lite ($0.075/1M), Llama 3.1 8B ($0.10/1M), and DeepSeek V4 Flash ($0.14/1M) deliver quality that matches or exceeds 2024's GPT-4 โ at 1/20th the price. For classification, summarization, and simple chat, there's no reason to pay premium prices anymore.
2. Context Windows Exploded
1M token context windows are now standard across premium and mid-tier models. Budget models are catching up: Llama 4 Scout offers 10M tokens of context via Together.ai. DeepSeek V4 Pro and V4 Flash both support 1M. If you're still chunking documents to fit 128K windows, you're leaving money on the table.
3. Price Volatility Is the New Normal
Grok 3 went up 10x. DeepSeek V4 Pro went down 75%. GPT-4o dropped 67%. These aren't incremental adjustments โ they're seismic shifts. The era of stable, predictable AI pricing is over. If you're not checking prices quarterly, you're flying blind.
4. Multi-Model Routing Is the Optimal Strategy
The best teams in 2026 don't pick one model. They route dynamically:
- Simple tasks (classification, extraction) โ Gemini Flash Lite ($0.075/1M) or Llama 3.1 8B ($0.10/1M)
- Standard workloads (chat, summarization) โ DeepSeek V4 Pro ($0.44/1M) or GPT-4o mini ($0.15/1M)
- Complex reasoning (code generation, analysis) โ GPT-5 ($1.25/1M) or Claude Sonnet 4.6 ($3.00/1M)
- Critical outputs (customer-facing, high-stakes) โ Claude Opus 4.7 ($5.00/1M) or GPT-5.5 ($5.00/1M)
A blended cost of under $2/1M tokens is achievable for most workloads.
5. Batch APIs Changed the Math
OpenAI's Batch API offers a 50% discount. Anthropic and Google offer similar batch pricing. If your workload isn't time-sensitive โ data labeling, content generation, document processing โ batch everything. The savings are massive at scale.
Cost Comparison: Real Workloads
Let's see what these prices actually mean for production systems. Here are four common workloads compared across price tiers:
AI Coding Assistant
RAG Pipeline
Customer Support Chatbot
Content Generation
Annual savings switching from GPT-5.5 to DeepSeek V4 Pro at 100M tokens/day
Provider-by-Provider Breakdown
OpenAI
The broadest model lineup (8 models). GPT-4o's 67% price drop made it a mid-tier option. GPT-5.5 remains the reasoning king at $5/$30. The new GPT-oss models (20B and 120B) are OpenAI's first budget-tier open-weight offerings. Best for: Complex reasoning, multimodal tasks, code generation.
Anthropic
Claude Opus 4.7 dropped from $15 to $5 input โ a 67% reduction. Claude Sonnet 4.6 at $3/$15 is the best mid-tier value for long-context work (1M tokens). Haiku 4.5 at $1/$5 fills the budget gap. Best for: Long-form writing, analysis, extended context tasks.
Gemini 2.0 Flash Lite ($0.075/$0.30) is the cheapest model in our database. Gemini 3.1 Pro ($2/$12) offers flagship quality at mid-tier pricing. All models support 1M context. Best for: High-volume budget workloads, long-context analysis.
DeepSeek
The price-to-performance champion. V4 Pro at $0.44/$0.87 with 1M context is absurdly cheap for its capability level. V4 Flash at $0.14/$0.28 is even cheaper. Best for: Cost-sensitive production workloads, high-volume processing.
Mistral
Mistral Large 3's 75% price drop (from $2 to $0.50) repositioned it as a budget model. Mistral Small 4 at $0.15/$0.60 competes directly with GPT-4o mini. Best for: European compliance needs, budget workloads.
xAI
Grok 3's 10x price hike ($3 โ $30) makes it the most expensive model in our database. Grok 3 Mini at $3/$5 is more reasonable. Verdict: Hard to recommend at current pricing unless you need Grok-specific capabilities.
Meta (via Together.ai)
Llama 4 Scout ($0.11/$0.34) with 10M context is excellent for long-document workloads. Llama 3.1 8B ($0.10/$0.10) remains the cheapest model for simple tasks. Best for: Self-hosted flexibility, massive context windows.
Decision Framework
Which Model Should You Use?
- Tightest budget, simple tasks: Gemini 2.0 Flash Lite ($0.075/1M) โ cheapest option, 1M context
- Best value for general use: DeepSeek V4 Pro ($0.44/1M) โ 91% cheaper than GPT-5.5 with 1M context
- Best mid-tier quality: Claude Sonnet 4.6 ($3/1M) or GPT-5 ($1.25/1M) โ strong reasoning at reasonable cost
- Maximum capability: GPT-5.5 ($5/1M) or Claude Opus 4.7 ($5/1M) โ top-tier for complex tasks
- Longest context: Llama 4 Scout ($0.11/1M) โ 10M context via Together.ai
- Code-heavy workloads: DeepSeek V4 Pro ($0.44/1M) or GPT-5.3 Codex ($1.75/1M)
- Batch processing: Any model via Batch API for 50% off โ route to cheapest that meets quality needs
What to Watch in Q3 2026
- OpenAI GPT-6: Rumored for late Q2/Q3. Expect premium pricing at launch, with GPT-5.5 likely dropping to mid-tier.
- Anthropic Claude 5: Next-generation model expected to push context windows further.
- Google Gemini 4: Could reset the budget tier entirely if Flash pricing holds.
- DeepSeek V5: If the V4 trend continues, expect another 50%+ price cut.
- More batch APIs: Every provider will likely offer batch discounts by Q3.
The trend is clear: prices will keep falling for budget and mid-tier models, while premium models hold steady or increase. The gap between "cheapest viable" and "best available" will keep widening.
Methodology
All pricing data is sourced directly from provider documentation and verified against API responses. Prices shown are per 1M tokens unless otherwise noted. Context windows reflect the maximum supported by each model. Data was last verified on May 14, 2026.
We track 33 models across 10 providers: OpenAI, Anthropic, Google, DeepSeek, Mistral, Cohere, Meta (via Together.ai), Moonshot, xAI, and AI21. Prices are checked monthly and updated in our pricing changelog.
Calculate your exact costs across all 33 models
Interactive calculators, savings comparisons, and model recommendations โ free, no signup.
Try the Calculator โ FreeRelated Articles
- 2026 Flagship LLM Cost Comparison โ GPT-5.5 vs Claude Opus 4.7 vs Gemini 3.1 Pro vs DeepSeek V4 Pro
- Cheapest LLM APIs in 2026 โ Full ranking of every model by price
- The Complete Guide to LLM Cost Optimization โ 10 strategies to cut your API spend
- Multi-Model Routing โ How to save 60% by routing requests intelligently
- Best Budget LLM APIs โ If you need the cheapest option, start here