📅 May 31, 2026 ⏱️ 9 min read 🏷️ Pricing Report

AI API Pricing July 2026: Complete Guide to All 32 Models

32 models. 10 providers. Prices from $0.075/M to $180/M. Two legacy models gone. Here's the post-deprecation pricing landscape.

July 2026 is the first month without Claude 4 Opus and Claude Sonnet 4. Anthropic retired both on June 15, leaving the market with 32 models across 10 providers. The budget tier continues to compress — Gemini Flash Lite still leads at $0.075/M — while the mid-tier has never been more competitive.

This guide covers every AI API model's current pricing, the post-deprecation landscape, mid-year pricing trends, and where the best deals are right now.

32 Models Available
10 Providers
$0.075 Cheapest / 1M tokens
0 Active Deprecations

📋 Post-Deprecation: What Changed on June 15

Claude 4 Opus ($15/$75 per 1M tokens, 200K context) and Claude Sonnet 4 ($3/$15 per 1M tokens, 200K context) were retired on June 15, 2026. The official replacements:

  • Claude 4 Opus → Claude Opus 4.7 or 4.8 ($5/$25) — 67% cheaper, 5x context
  • Claude Sonnet 4 → Claude Sonnet 4.6 ($3/$15) — same price, 5x context

Both replacements are strictly better: same or lower price, better quality, 1M context windows. If you haven't migrated yet, do it now.

Complete Pricing: All 32 Models

Every major AI API model ranked by input price. All prices per 1 million tokens.

Budget Tier — Under $0.60/1M Input

Rock-bottom prices for everyday tasks. Chatbots, classifiers, content tools — start here.

Model Provider Input / 1M Output / 1M Context
Gemini 2.0 Flash Lite Google $0.075 $0.30 1M
GPT-oss 20B OpenAI $0.08 $0.35 128K
Llama 3.1 8B Meta (Together.ai) $0.10 $0.10 128K
Gemini 2.0 Flash Google $0.10 $0.40 1M
Llama 4 Scout Meta (Together.ai) $0.11 $0.34 10M
DeepSeek V4 Flash DeepSeek $0.14 $0.28 1M
GPT-4o mini OpenAI $0.15 $0.60 128K
GPT-oss 120B OpenAI $0.15 $0.60 128K
Mistral Small 4 Mistral $0.15 $0.60 128K
Llama 4 Maverick Meta (Together.ai) $0.20 $0.60 10M
GPT-5 mini OpenAI $0.25 $2.00 272K
DeepSeek V3 DeepSeek $0.27 $1.10 128K
DeepSeek V4 Pro DeepSeek $0.44 $0.87 1M
Mistral Large 3 Mistral $0.50 $1.50 128K
Command R Cohere $0.50 $1.50 128K
Grok Build 0.1 xAI $0.30 $0.50 256K

Mid Tier — $0.50–$3.00/1M Input

The sweet spot for production workloads. Strong reasoning at reasonable prices.

Model Provider Input / 1M Output / 1M Context
Llama 3.1 70B Meta (Together.ai) $0.88 $0.88 128K
Kimi K2.6 Moonshot $0.90 $3.75 256K
Claude Haiku 4.5 Anthropic $1.00 $5.00 200K
Gemini 2.5 Pro Google $1.25 $10.00 1M
GPT-5 OpenAI $1.25 $10.00 272K
GPT-5.3 Codex OpenAI $1.75 $14.00 400K
Gemini 3.1 Pro Google $2.00 $12.00 1M
Jamba 1.5 Large AI21 $2.00 $8.00 256K
GPT-4o OpenAI $2.50 $10.00 128K
Command R+ Cohere $2.50 $10.00 128K
Claude Sonnet 4.6 Anthropic $3.00 $15.00 1M
Grok 4.3 xAI $1.25 $2.50 1M

Premium Tier — $5.00+/1M Input

For complex reasoning, code generation, and high-stakes tasks where quality is non-negotiable.

Model Provider Input / 1M Output / 1M Context
Claude Opus 4.8 Anthropic $5.00 $25.00 1M
Claude Opus 4.7 Anthropic $5.00 $25.00 1M
GPT-5.5 OpenAI $5.00 $30.00 1M
GPT-5.5 Pro OpenAI $30.00 $180.00 1M

Mid-Year Pricing Trends

1. The Post-Deprecation Market

With Claude 4 Opus gone, the most expensive remaining model is GPT-5.5 Pro at $30/$180 per 1M tokens. But nobody should be paying $15/M for a 200K-context model anymore — Claude Opus 4.8 at $5/M with 1M context is strictly superior to the old Opus in every dimension.

2. Budget Tier Compression

The gap between the cheapest and most capable budget models continues to shrink. Gemini 2.0 Flash Lite at $0.075/M is now within 2x of Llama 3.1 8B ($0.10/M), and both support 1M+ context. A year ago, $0.15/M was the floor. The 133,000-tokens-per-penny mark has been broken.

3. The Mid-Tier Sweet Spot

Models in the $1-3/M range now dominate production use cases. Claude Haiku 4.5 ($1/M) and GPT-5 ($1.25/M) deliver near-premium quality at 4-10x lower cost. For most teams, there's no reason to go premium unless you need the absolute best reasoning.

4. Open Source Catches Up

Meta's Llama 4 Scout ($0.11/M, 10M context) is the only model offering 10M token context at any price. Llama 4 Maverick ($0.20/M) handles general workloads well. Together.ai's managed hosting makes these zero-ops options for teams that want open-source economics without infrastructure burden.

Best Deals by Use Case

Use Case Best Model Why
High-volume chatbot Gemini 2.0 Flash Lite Cheapest at $0.075/M, handles most chat tasks
Quality chatbot Claude Haiku 4.5 $1/M with Anthropic's quality
Code generation Claude Sonnet 4.6 Best code quality at $3/M, 1M context
Document analysis Gemini 2.5 Pro 1M context at $1.25/M
Classification / extraction GPT-4o mini $0.15/M, fast, reliable structured output
RAG / retrieval DeepSeek V4 Flash $0.14/M with 1M context
Content writing GPT-5 mini $0.25/M, strong writing at budget price
Complex reasoning Claude Opus 4.7 Best reasoning quality at $5/M
Agent / multi-step GPT-5 $1.25/M, strong tool use, 272K context
Long documents (10M+) Llama 4 Scout Only model with 10M context at $0.11/M
Budget all-around DeepSeek V4 Pro $0.44/M with 1M context — best value pick

What $100/Month Gets You in July 2026

Assuming 1,000 tokens per request with a 50/50 input/output split:

Tier Model Requests for $100 Daily Average
Budget Gemini 2.0 Flash Lite ~571,000 ~19,000/day
Budget DeepSeek V4 Flash ~476,000 ~15,900/day
Budget Llama 4 Scout ~434,000 ~14,500/day
Mid Claude Haiku 4.5 ~62,500 ~2,100/day
Mid GPT-5 ~30,800 ~1,030/day
Mid Claude Sonnet 4.6 ~22,200 ~740/day
Premium Claude Opus 4.7 ~8,000 ~267/day
Premium GPT-5.5 ~7,700 ~257/day

The range: 19,000 requests/day to 257 requests/day for the same $100 budget. Model selection is the single biggest cost lever you have.

Provider Comparison at a Glance

Provider Models Cheapest Most Expensive Best For
OpenAI 9 $0.08/M $180/M Widest range, agents
Anthropic 4 $1.00/M $25/M Code, reasoning
Google 4 $0.075/M $12/M Budget, long context
DeepSeek 3 $0.14/M $1.10/M Budget all-around
Meta (Together.ai) 4 $0.10/M $0.88/M Open source, 10M context
Mistral 2 $0.15/M $1.50/M European compliance
Cohere 2 $0.50/M $10/M RAG, enterprise search
Moonshot 1 $0.90/M $3.75/M Long context (256K)
xAI 2 $3.00/M $150/M Real-time data
AI21 1 $2.00/M $8/M Long context (256K)

What to Watch in August 2026

Methodology

All pricing data comes from official provider pricing pages, verified as of May 29, 2026. We track 32 active models across 10 providers: OpenAI, Anthropic, Google, DeepSeek, Mistral, Cohere, Meta (via Together.ai), Moonshot, xAI, and AI21.

Prices are per 1 million tokens unless otherwise noted. Context window sizes reflect the maximum supported. Some providers offer batch pricing or committed-use discounts not reflected here.

Calculate your exact costs

Use our free tools to see what these prices mean for your workload. No signup required.

Open Cost Calculator →

Related Tools

← June 2026 Pricing Guide August 2026 Pricing Guide →