State of LLM Pricing: Q2 2026

33 models. 10 providers. The definitive quarterly report on what's cheap, what's expensive, and what's changed since last quarter.

The LLM API market in Q2 2026 looks nothing like it did six months ago. Prices have cratered for some models and skyrocketed for others. Budget-tier models now match 2024 flagship quality. And the smartest teams have stopped picking one model โ€” they're routing dynamically based on task complexity.

We track every price across every provider. Here's the full picture.

Q2 2026 at a Glance

-75%
Biggest price drop
(Mistral Large, DeepSeek V4 Pro)
+10x
Biggest price hike
(Grok 3)
$0.075
Cheapest input
(Gemini Flash Lite)
33
Models tracked
across 10 providers

The Biggest Price Moves

If you haven't re-evaluated your AI provider in the last quarter, you're almost certainly overpaying. Here are the moves that matter:

The Drops

Model Old Price (input) New Price (input) Change
Mistral Large 3 $2.00/1M $0.50/1M -75%
DeepSeek V4 Pro $1.75/1M $0.44/1M -75%
GPT-4o $10.00/1M $2.50/1M -67%
Claude Opus 4.7 $15.00/1M $5.00/1M -67%

The Hike

Model Old Price (input) New Price (input) Change
Grok 3 $3.00/1M $30.00/1M +10x

Grok 3's 10x price increase is the largest single-model price hike we've ever tracked. At $30/$150 per 1M tokens, it's now the most expensive model in our database โ€” pricier than GPT-5.5 and Claude Opus 4.7 combined. If you were using Grok 3 for production workloads, it's time to switch.

Full Pricing Matrix: Every Model, Every Provider

Here's the complete current pricing as of May 2026, organized by tier.

Premium Tier

Maximum capability. Best for complex reasoning, multimodal tasks, and high-stakes outputs.

Model Provider Input Output Context
GPT-5.5 Pro OpenAI $30.00 $180.00 1M
Grok 3 xAI $30.00 $150.00 128K
Claude 4 Opus Anthropic $15.00 $75.00 200K
GPT-5.5 OpenAI $5.00 $30.00 1M
Claude Opus 4.7 Anthropic $5.00 $25.00 1M
Mid Tier

Strong performance at reasonable cost. The sweet spot for most production workloads.

Model Provider Input Output Context
Claude Sonnet 4.6 Anthropic $3.00 $15.00 1M
GPT-4o OpenAI $2.50 $10.00 128K
Command R+ Cohere $2.50 $10.00 128K
Gemini 3.1 Pro Google $2.00 $12.00 1M
Jamba 1.5 Large AI21 $2.00 $8.00 256K
GPT-5.3 Codex OpenAI $1.75 $14.00 400K
GPT-5 OpenAI $1.25 $10.00 272K
Gemini 2.5 Pro Google $1.25 $10.00 1M
Claude Haiku 4.5 Anthropic $1.00 $5.00 200K
Llama 3.1 70B Meta (Together.ai) $0.88 $0.88 128K
Budget Tier

Production-viable quality at rock-bottom prices. The biggest story of Q2 2026.

Model Provider Input Output Context
Kimi K2.6 Moonshot $0.90 $3.75 256K
Mistral Large 3 Mistral $0.50 $1.50 128K
DeepSeek V4 Pro DeepSeek $0.44 $0.87 1M
Command R Cohere $0.50 $1.50 128K
Gemini 2.0 Flash Google $0.10 $0.40 1M
DeepSeek V4 Flash DeepSeek $0.14 $0.28 1M
GPT-5 mini OpenAI $0.25 $2.00 272K
GPT-4o mini OpenAI $0.15 $0.60 128K
GPT-oss 120B OpenAI $0.15 $0.60 128K
Mistral Small 4 Mistral $0.15 $0.60 128K
Llama 4 Scout Meta (Together.ai) $0.11 $0.34 10M
Llama 4 Maverick Meta (Together.ai) $0.20 $0.60 10M
Llama 3.1 8B Meta (Together.ai) $0.10 $0.10 128K
GPT-oss 20B OpenAI $0.08 $0.35 128K
Gemini 2.0 Flash Lite Google $0.075 $0.30 1M
400x

Price difference between the most expensive model (GPT-5.5 Pro at $30/$180) and the cheapest (Gemini Flash Lite at $0.075/$0.30)

Five Trends Defining Q2 2026

1. Budget Models Are Now Production-Viable

This is the biggest story of the quarter. Models like Gemini 2.0 Flash Lite ($0.075/1M), Llama 3.1 8B ($0.10/1M), and DeepSeek V4 Flash ($0.14/1M) deliver quality that matches or exceeds 2024's GPT-4 โ€” at 1/20th the price. For classification, summarization, and simple chat, there's no reason to pay premium prices anymore.

2. Context Windows Exploded

1M token context windows are now standard across premium and mid-tier models. Budget models are catching up: Llama 4 Scout offers 10M tokens of context via Together.ai. DeepSeek V4 Pro and V4 Flash both support 1M. If you're still chunking documents to fit 128K windows, you're leaving money on the table.

3. Price Volatility Is the New Normal

Grok 3 went up 10x. DeepSeek V4 Pro went down 75%. GPT-4o dropped 67%. These aren't incremental adjustments โ€” they're seismic shifts. The era of stable, predictable AI pricing is over. If you're not checking prices quarterly, you're flying blind.

4. Multi-Model Routing Is the Optimal Strategy

The best teams in 2026 don't pick one model. They route dynamically:

A blended cost of under $2/1M tokens is achievable for most workloads.

5. Batch APIs Changed the Math

OpenAI's Batch API offers a 50% discount. Anthropic and Google offer similar batch pricing. If your workload isn't time-sensitive โ€” data labeling, content generation, document processing โ€” batch everything. The savings are massive at scale.

Cost Comparison: Real Workloads

Let's see what these prices actually mean for production systems. Here are four common workloads compared across price tiers:

AI Coding Assistant

2K input + 1.5K output tokens, 500 requests/day
Premium (GPT-5.5)$247.50/mo
Mid (Claude Sonnet 4.6)$142.50/mo
Budget (DeepSeek V4 Pro)$7.88/mo

RAG Pipeline

5K input + 800 output tokens, 1K requests/day
Premium (GPT-5.5)$750.00/mo
Mid (Gemini 3.1 Pro)$264.00/mo
Budget (DeepSeek V4 Pro)$21.33/mo

Customer Support Chatbot

1.5K input + 500 output tokens, 2K requests/day
Premium (Claude Opus 4.7)$420.00/mo
Mid (GPT-4o)$195.00/mo
Budget (Gemini Flash)$13.20/mo

Content Generation

1K input + 3K output tokens, 200 requests/day
Premium (GPT-5.5)$570.00/mo
Mid (Claude Sonnet 4.6)$288.00/mo
Budget (DeepSeek V4 Pro)$16.27/mo
$564K

Annual savings switching from GPT-5.5 to DeepSeek V4 Pro at 100M tokens/day

Provider-by-Provider Breakdown

OpenAI

The broadest model lineup (8 models). GPT-4o's 67% price drop made it a mid-tier option. GPT-5.5 remains the reasoning king at $5/$30. The new GPT-oss models (20B and 120B) are OpenAI's first budget-tier open-weight offerings. Best for: Complex reasoning, multimodal tasks, code generation.

Anthropic

Claude Opus 4.7 dropped from $15 to $5 input โ€” a 67% reduction. Claude Sonnet 4.6 at $3/$15 is the best mid-tier value for long-context work (1M tokens). Haiku 4.5 at $1/$5 fills the budget gap. Best for: Long-form writing, analysis, extended context tasks.

Google

Gemini 2.0 Flash Lite ($0.075/$0.30) is the cheapest model in our database. Gemini 3.1 Pro ($2/$12) offers flagship quality at mid-tier pricing. All models support 1M context. Best for: High-volume budget workloads, long-context analysis.

DeepSeek

The price-to-performance champion. V4 Pro at $0.44/$0.87 with 1M context is absurdly cheap for its capability level. V4 Flash at $0.14/$0.28 is even cheaper. Best for: Cost-sensitive production workloads, high-volume processing.

Mistral

Mistral Large 3's 75% price drop (from $2 to $0.50) repositioned it as a budget model. Mistral Small 4 at $0.15/$0.60 competes directly with GPT-4o mini. Best for: European compliance needs, budget workloads.

xAI

Grok 3's 10x price hike ($3 โ†’ $30) makes it the most expensive model in our database. Grok 3 Mini at $3/$5 is more reasonable. Verdict: Hard to recommend at current pricing unless you need Grok-specific capabilities.

Meta (via Together.ai)

Llama 4 Scout ($0.11/$0.34) with 10M context is excellent for long-document workloads. Llama 3.1 8B ($0.10/$0.10) remains the cheapest model for simple tasks. Best for: Self-hosted flexibility, massive context windows.

Decision Framework

Which Model Should You Use?

  • Tightest budget, simple tasks: Gemini 2.0 Flash Lite ($0.075/1M) โ€” cheapest option, 1M context
  • Best value for general use: DeepSeek V4 Pro ($0.44/1M) โ€” 91% cheaper than GPT-5.5 with 1M context
  • Best mid-tier quality: Claude Sonnet 4.6 ($3/1M) or GPT-5 ($1.25/1M) โ€” strong reasoning at reasonable cost
  • Maximum capability: GPT-5.5 ($5/1M) or Claude Opus 4.7 ($5/1M) โ€” top-tier for complex tasks
  • Longest context: Llama 4 Scout ($0.11/1M) โ€” 10M context via Together.ai
  • Code-heavy workloads: DeepSeek V4 Pro ($0.44/1M) or GPT-5.3 Codex ($1.75/1M)
  • Batch processing: Any model via Batch API for 50% off โ€” route to cheapest that meets quality needs

What to Watch in Q3 2026

The trend is clear: prices will keep falling for budget and mid-tier models, while premium models hold steady or increase. The gap between "cheapest viable" and "best available" will keep widening.

Methodology

All pricing data is sourced directly from provider documentation and verified against API responses. Prices shown are per 1M tokens unless otherwise noted. Context windows reflect the maximum supported by each model. Data was last verified on May 14, 2026.

We track 33 models across 10 providers: OpenAI, Anthropic, Google, DeepSeek, Mistral, Cohere, Meta (via Together.ai), Moonshot, xAI, and AI21. Prices are checked monthly and updated in our pricing changelog.

Calculate your exact costs across all 33 models

Interactive calculators, savings comparisons, and model recommendations โ€” free, no signup.

Try the Calculator โ€” Free

Related Articles