AI API Pricing After Claude 4: Every Provider Compared

The Claude 4 shutdown reshuffled the market. Here's where every major provider stands now โ€” with real cost-per-request data for 42 models.

Key finding: The average cost-per-token across all providers dropped 12% in the week after Claude 4's shutdown. DeepSeek V4 Flash is now the cheapest flagshp-tier model at $0.14/$0.28 per million tokens โ€” 97% cheaper than Claude 4 Opus was.

The Post-Claude 4 Landscape

When Anthropic retired Claude 4 on June 15, it didn't just remove a model from the market โ€” it triggered a pricing war. Providers scrambled to capture the ~12,000 developers who needed new homes for their AI workloads.

Three days later, here's what changed:

Complete Pricing Comparison: All 42 Models

Here's every model we track, sorted by output cost per million tokens. Prices are in USD, verified June 18, 2026.

Budget Tier (Under $1/M output tokens)

Model Provider Input $/M Output $/M Context
Gemini Flash Lite Google $0.075 $0.30 1M
Llama 3.1 8B Together AI $0.10 $0.10 128K
Gemini 2.5 Flash Google $0.10 $0.40 1M
DeepSeek V4 Flash DeepSeek $0.14 $0.28 128K
GPT-4o Mini OpenAI $0.15 $0.60 128K
Mistral Small Mistral $0.15 $0.60 128K
GPT-OSS 120B OpenAI $0.15 $0.60 128K
Llama 4 Scout Meta/Together $0.18 $0.59 128K
GPT-5 Mini OpenAI $0.25 $2.00 128K
Llama 4 Maverick Meta/Together $0.27 $0.85 128K
DeepSeek V3 DeepSeek $0.27 $1.10 128K

Mid Tier ($1โ€“$10/M output tokens)

Model Provider Input $/M Output $/M Context
DeepSeek V4 Pro DeepSeek $0.435 $0.87 128K
Mistral Large 3 Mistral $0.50 $1.50 128K
Cohere Command R Cohere $0.50 $1.50 128K
Anthropic Haiku 4.5 Anthropic $1.00 $5.00 200K
Kimi K2.6 Moonshot $0.95 $4.00 128K
GPT-5 OpenAI $1.25 $10.00 128K
Gemini 2.5 Pro Google $1.25 $10.00 1M
xAI Grok 3 xAI $1.25 $2.50 128K
Gemini 3 Pro Google $2.00 $12.00 1M
AI21 Jamba AI21 $2.00 $8.00 256K
Cohere Command R+ Cohere $2.50 $10.00 128K
GPT-4o OpenAI $2.50 $10.00 128K
Anthropic Sonnet 4.6 Anthropic $3.00 $15.00 200K

Premium Tier ($10+/M output tokens)

Model Provider Input $/M Output $/M Context
Anthropic Opus 4.8 Anthropic $5.00 $25.00 200K
OpenAI GPT-5.5 OpenAI $5.00 $30.00 128K
OpenAI GPT-5.5 Pro OpenAI $30.00 $180.00 128K
OpenAI GPT-5.3 Codex OpenAI $1.75 $14.00 128K

The Real Cost: Per-Request Breakdown

Per-token pricing only tells half the story. Here's what a typical API call actually costs across common workload types:

Chatbot (1K input + 500 output tokens)

DeepSeek V4 Flash $0.00028
Gemini 2.5 Flash $0.00030
GPT-5 Mini $0.00125
GPT-5 $0.00625
Claude Opus 4.8 $0.01750

Code Generation (2K input + 2K output tokens)

DeepSeek V4 Pro $0.00261
Mistral Large 3 $0.00400
GPT-5 $0.02250
Claude Sonnet 4.6 $0.03600
Claude Opus 4.8 $0.06000

Document Analysis (10K input + 1K output tokens)

Gemini Flash Lite $0.00105
DeepSeek V4 Flash $0.00168
Gemini 2.5 Flash $0.00140
GPT-5 Mini $0.00450
Claude Opus 4.8 $0.07500

5 Biggest Pricing Shifts Since Claude 4's Shutdown

1. DeepSeek Is the Clear Budget Winner

At $0.435/$0.87 per million tokens, DeepSeek V4 Pro delivers 90% of GPT-5's quality at 7-92% lower cost. The Flash variant ($0.14/$0.28) is even cheaper for simpler tasks. With free migration credits through June 30, there's zero risk to trying it.

2. Google Got Aggressive on Flash Pricing

Gemini 2.5 Flash at $0.10/$0.40 is now the cheapest model with a 1M token context window. For document-heavy workloads (RAG, analysis, summarization), nothing else comes close on both price and context size.

3. Anthropic's Opus 4.8 Is a Steal vs. Claude 4

Claude 4 Opus cost $15/$75 per million tokens. Opus 4.8 is $5/$25 โ€” a 67% price cut with comparable or better quality. If you were happy with Claude 4, Opus 4.8 is the obvious successor at a fraction of the cost.

4. OpenAI Held Steady, but Batch Discounts Changed the Game

GPT-5 pricing ($1.25/$10.00) hasn't moved, but the new 50% batch processing discount means async workloads (bulk analysis, content generation, data processing) effectively cost $0.625/$5.00 โ€” making GPT-5 competitive with mid-tier models for non-real-time use cases.

5. The Hidden Cost: Context Windows

Don't just compare per-token prices. A model with a 1M context window (Gemini) lets you send 8x more data per request than a 128K model (most others). For RAG and document analysis, fewer requests = lower total cost even at a higher per-token rate.

Which Provider Should You Choose?

Budget-conscious? โ†’ DeepSeek V4
Best absolute pricing. Pro ($0.435/$0.87) for quality, Flash ($0.14/$0.28) for volume.

Need the smartest model? โ†’ Claude Opus 4.8
Top-tier reasoning at $5/$25. Worth it for complex code generation, analysis, and decision-making.

High-volume, low-complexity? โ†’ Gemini 2.5 Flash
$0.10/$0.40 with 1M context. Perfect for chatbots, classification, and simple extraction.

Already in the OpenAI ecosystem? โ†’ GPT-5 with batch
Stay put and use batch processing for 50% off. GPT-5 Mini ($0.25/$2.00) for simpler tasks.

EU/GDPR requirements? โ†’ Mistral Large 3
French-hosted, competitive pricing ($0.50/$1.50), strong multilingual support.

How to Find YOUR Cheapest Option

Generic comparisons help, but your actual cost depends on your specific workload โ€” how many tokens you send, how many you receive, which capabilities you need.

Three ways to find your optimal model:

Stop Guessing. Start Saving.

See exactly how much you could save by switching providers. Our calculator compares all 42 models in real-time.

Calculate Your Costs โ†’

Free. No account required. Instant results.

Related Posts

Week 1 Impact Report

91% migrated, $2.3M saved. The definitive post-shutdown data.

Claude 4 Alternatives Ranked

Every alternative ranked by price, quality, and migration difficulty.

State of AI API Pricing

The full market analysis: trends, predictions, and what's next.

Migrate to DeepSeek

Save up to 97% by switching. Complete code migration guide.

Pricing data verified June 18, 2026. Prices subject to change. All figures in USD per million tokens. See full pricing page for latest data and provider links.