May 16, 2026 · 12 min read

State of LLM Pricing: Q2 2026

33 models. 10 providers. The definitive quarterly report on what's cheap, what's expensive, and what's changed since last quarter.

The LLM API market in Q2 2026 looks nothing like it did six months ago. Prices have cratered for some models and skyrocketed for others. Budget-tier models now match 2024 flagship quality. And the smartest teams have stopped picking one model — they're routing dynamically based on task complexity.

We track every price across every provider. Here's the full picture.

Q2 2026 at a Glance

-75%

Biggest price drop
(Mistral Large, DeepSeek V4 Pro)

+10x

Biggest price hike
(Grok 3)

$0.075

Cheapest input
(Gemini Flash Lite)

Models tracked
across 10 providers

The Biggest Price Moves

If you haven't re-evaluated your AI provider in the last quarter, you're almost certainly overpaying. Here are the moves that matter:

The Drops

Model	Old Price (input)	New Price (input)	Change
Mistral Large 3	$2.00/1M	$0.50/1M	-75%
DeepSeek V4 Pro	$1.75/1M	$0.44/1M	-75%
GPT-4o	$10.00/1M	$2.50/1M	-67%
Claude Opus 4.7	$15.00/1M	$5.00/1M	-67%

The Hike

Model	Old Price (input)	New Price (input)	Change
Grok 3	$3.00/1M	$30.00/1M	+10x

Grok 3's 10x price increase is the largest single-model price hike we've ever tracked. At $30/$150 per 1M tokens, it's now the most expensive model in our database — pricier than GPT-5.5 and Claude Opus 4.7 combined. If you were using Grok 3 for production workloads, it's time to switch.

Full Pricing Matrix: Every Model, Every Provider

Here's the complete current pricing as of May 2026, organized by tier.

Premium Tier

Maximum capability. Best for complex reasoning, multimodal tasks, and high-stakes outputs.

Model	Provider	Input	Output	Context
GPT-5.5 Pro	OpenAI	$30.00	$180.00	1M
Grok 3	xAI	$30.00	$150.00	128K
Claude 4 Opus	Anthropic	$15.00	$75.00	200K
GPT-5.5	OpenAI	$5.00	$30.00	1M
Claude Opus 4.7	Anthropic	$5.00	$25.00	1M

Mid Tier

Strong performance at reasonable cost. The sweet spot for most production workloads.

Model	Provider	Input	Output	Context
Claude Sonnet 4.6	Anthropic	$3.00	$15.00	1M
GPT-4o	OpenAI	$2.50	$10.00	128K
Command R+	Cohere	$2.50	$10.00	128K
Gemini 3.1 Pro	Google	$2.00	$12.00	1M
Jamba 1.5 Large	AI21	$2.00	$8.00	256K
GPT-5.3 Codex	OpenAI	$1.75	$14.00	400K
GPT-5	OpenAI	$1.25	$10.00	272K
Gemini 2.5 Pro	Google	$1.25	$10.00	1M
Claude Haiku 4.5	Anthropic	$1.00	$5.00	200K
Llama 3.1 70B	Meta (Together.ai)	$0.88	$0.88	128K

Budget Tier

Production-viable quality at rock-bottom prices. The biggest story of Q2 2026.

Model	Provider	Input	Output	Context
Kimi K2.6	Moonshot	$0.90	$3.75	256K
Mistral Large 3	Mistral	$0.50	$1.50	128K
DeepSeek V4 Pro	DeepSeek	$0.44	$0.87	1M
Command R	Cohere	$0.50	$1.50	128K
Gemini 2.0 Flash	Google	$0.10	$0.40	1M
DeepSeek V4 Flash	DeepSeek	$0.14	$0.28	1M
GPT-5 mini	OpenAI	$0.25	$2.00	272K
GPT-4o mini	OpenAI	$0.15	$0.60	128K
GPT-oss 120B	OpenAI	$0.15	$0.60	128K
Mistral Small 4	Mistral	$0.15	$0.60	128K
Llama 4 Scout	Meta (Together.ai)	$0.11	$0.34	10M
Llama 4 Maverick	Meta (Together.ai)	$0.20	$0.60	10M
Llama 3.1 8B	Meta (Together.ai)	$0.10	$0.10	128K
GPT-oss 20B	OpenAI	$0.08	$0.35	128K
Gemini 2.0 Flash Lite	Google	$0.075	$0.30	1M

400x

Price difference between the most expensive model (GPT-5.5 Pro at $30/$180) and the cheapest (Gemini Flash Lite at $0.075/$0.30)

Five Trends Defining Q2 2026

1. Budget Models Are Now Production-Viable

This is the biggest story of the quarter. Models like Gemini 2.0 Flash Lite ($0.075/1M), Llama 3.1 8B ($0.10/1M), and DeepSeek V4 Flash ($0.14/1M) deliver quality that matches or exceeds 2024's GPT-4 — at 1/20th the price. For classification, summarization, and simple chat, there's no reason to pay premium prices anymore.

2. Context Windows Exploded

1M token context windows are now standard across premium and mid-tier models. Budget models are catching up: Llama 4 Scout offers 10M tokens of context via Together.ai. DeepSeek V4 Pro and V4 Flash both support 1M. If you're still chunking documents to fit 128K windows, you're leaving money on the table.

3. Price Volatility Is the New Normal

Grok 3 went up 10x. DeepSeek V4 Pro went down 75%. GPT-4o dropped 67%. These aren't incremental adjustments — they're seismic shifts. The era of stable, predictable AI pricing is over. If you're not checking prices quarterly, you're flying blind.

4. Multi-Model Routing Is the Optimal Strategy

The best teams in 2026 don't pick one model. They route dynamically:

Simple tasks (classification, extraction) → Gemini Flash Lite ($0.075/1M) or Llama 3.1 8B ($0.10/1M)
Standard workloads (chat, summarization) → DeepSeek V4 Pro ($0.44/1M) or GPT-4o mini ($0.15/1M)
Complex reasoning (code generation, analysis) → GPT-5 ($1.25/1M) or Claude Sonnet 4.6 ($3.00/1M)
Critical outputs (customer-facing, high-stakes) → Claude Opus 4.7 ($5.00/1M) or GPT-5.5 ($5.00/1M)

A blended cost of under $2/1M tokens is achievable for most workloads.

5. Batch APIs Changed the Math

OpenAI's Batch API offers a 50% discount. Anthropic and Google offer similar batch pricing. If your workload isn't time-sensitive — data labeling, content generation, document processing — batch everything. The savings are massive at scale.

Cost Comparison: Real Workloads

Let's see what these prices actually mean for production systems. Here are four common workloads compared across price tiers:

AI Coding Assistant

2K input + 1.5K output tokens, 500 requests/day

Premium (GPT-5.5)$247.50/mo

Mid (Claude Sonnet 4.6)$142.50/mo

Budget (DeepSeek V4 Pro)$7.88/mo

RAG Pipeline

5K input + 800 output tokens, 1K requests/day

Premium (GPT-5.5)$750.00/mo

Mid (Gemini 3.1 Pro)$264.00/mo

Budget (DeepSeek V4 Pro)$21.33/mo

Customer Support Chatbot

1.5K input + 500 output tokens, 2K requests/day

Premium (Claude Opus 4.7)$420.00/mo

Mid (GPT-4o)$195.00/mo

Budget (Gemini Flash)$13.20/mo

Content Generation

1K input + 3K output tokens, 200 requests/day

Premium (GPT-5.5)$570.00/mo

Mid (Claude Sonnet 4.6)$288.00/mo

Budget (DeepSeek V4 Pro)$16.27/mo

$564K

Annual savings switching from GPT-5.5 to DeepSeek V4 Pro at 100M tokens/day

Provider-by-Provider Breakdown

OpenAI

The broadest model lineup (8 models). GPT-4o's 67% price drop made it a mid-tier option. GPT-5.5 remains the reasoning king at $5/$30. The new GPT-oss models (20B and 120B) are OpenAI's first budget-tier open-weight offerings. Best for: Complex reasoning, multimodal tasks, code generation.

Anthropic

Claude Opus 4.7 dropped from $15 to $5 input — a 67% reduction. Claude Sonnet 4.6 at $3/$15 is the best mid-tier value for long-context work (1M tokens). Haiku 4.5 at $1/$5 fills the budget gap. Best for: Long-form writing, analysis, extended context tasks.

Google

Gemini 2.0 Flash Lite ($0.075/$0.30) is the cheapest model in our database. Gemini 3.1 Pro ($2/$12) offers flagship quality at mid-tier pricing. All models support 1M context. Best for: High-volume budget workloads, long-context analysis.

DeepSeek

The price-to-performance champion. V4 Pro at $0.44/$0.87 with 1M context is absurdly cheap for its capability level. V4 Flash at $0.14/$0.28 is even cheaper. Best for: Cost-sensitive production workloads, high-volume processing.

Mistral

Mistral Large 3's 75% price drop (from $2 to $0.50) repositioned it as a budget model. Mistral Small 4 at $0.15/$0.60 competes directly with GPT-4o mini. Best for: European compliance needs, budget workloads.

xAI

Grok 3's 10x price hike ($3 → $30) makes it the most expensive model in our database. Grok 3 Mini at $3/$5 is more reasonable. Verdict: Hard to recommend at current pricing unless you need Grok-specific capabilities.

Meta (via Together.ai)

Llama 4 Scout ($0.11/$0.34) with 10M context is excellent for long-document workloads. Llama 3.1 8B ($0.10/$0.10) remains the cheapest model for simple tasks. Best for: Self-hosted flexibility, massive context windows.

Decision Framework

Which Model Should You Use?

Tightest budget, simple tasks: Gemini 2.0 Flash Lite ($0.075/1M) — cheapest option, 1M context
Best value for general use: DeepSeek V4 Pro ($0.44/1M) — 91% cheaper than GPT-5.5 with 1M context
Best mid-tier quality: Claude Sonnet 4.6 ($3/1M) or GPT-5 ($1.25/1M) — strong reasoning at reasonable cost
Maximum capability: GPT-5.5 ($5/1M) or Claude Opus 4.7 ($5/1M) — top-tier for complex tasks
Longest context: Llama 4 Scout ($0.11/1M) — 10M context via Together.ai
Code-heavy workloads: DeepSeek V4 Pro ($0.44/1M) or GPT-5.3 Codex ($1.75/1M)
Batch processing: Any model via Batch API for 50% off — route to cheapest that meets quality needs

What to Watch in Q3 2026

OpenAI GPT-6: Rumored for late Q2/Q3. Expect premium pricing at launch, with GPT-5.5 likely dropping to mid-tier.
Anthropic Claude 5: Next-generation model expected to push context windows further.
Google Gemini 4: Could reset the budget tier entirely if Flash pricing holds.
DeepSeek V5: If the V4 trend continues, expect another 50%+ price cut.
More batch APIs: Every provider will likely offer batch discounts by Q3.

The trend is clear: prices will keep falling for budget and mid-tier models, while premium models hold steady or increase. The gap between "cheapest viable" and "best available" will keep widening.

Methodology

All pricing data is sourced directly from provider documentation and verified against API responses. Prices shown are per 1M tokens unless otherwise noted. Context windows reflect the maximum supported by each model. Data was last verified on May 14, 2026.

We track 33 models across 10 providers: OpenAI, Anthropic, Google, DeepSeek, Mistral, Cohere, Meta (via Together.ai), Moonshot, xAI, and AI21. Prices are checked monthly and updated in our pricing changelog.

Calculate your exact costs across all 33 models

Interactive calculators, savings comparisons, and model recommendations — free, no signup.

Try the Calculator — Free

2026 Flagship LLM Cost Comparison — GPT-5.5 vs Claude Opus 4.7 vs Gemini 3.1 Pro vs DeepSeek V4 Pro
Cheapest LLM APIs in 2026 — Full ranking of every model by price
The Complete Guide to LLM Cost Optimization — 10 strategies to cut your API spend
Multi-Model Routing — How to save 60% by routing requests intelligently
Best Budget LLM APIs — If you need the cheapest option, start here