What is the cheapest AI API in July 2026?

The cheapest active AI API in July 2026 is GPT-oss 20B at $0.08 per 1M input tokens and $0.35 per 1M output tokens. Other budget options include Gemini 2.5 Flash-Lite ($0.10/$0.40), Llama 3.1 8B ($0.10/$0.10), and DeepSeek V4 Flash ($0.14/$0.28).

How many AI API models are available in July 2026?

As of July 2026, there are 36 active AI API models available across 10 providers: OpenAI (9 models), Anthropic (3), Google (7), DeepSeek (3), Mistral (3), Cohere (3), Meta/Together.ai (4), Moonshot (1), xAI (2), and AI21 (1). Including 6 deprecated models still accessible via API, the total is 42. Prices range from $0.075/M to $180/M input tokens. Claude 4 Opus and Claude Sonnet 4 were retired on June 15, 2026.

Which AI models were deprecated in July 2026?

Two Anthropic models were retired on June 15, 2026: Claude 4 Opus ($15/$75 per 1M tokens) and Claude Sonnet 4 ($3/$15 per 1M tokens). Both have been replaced by newer, cheaper models: Claude Opus 4.7/4.8 ($5/$25) and Claude Sonnet 4.6 ($3/$15).

Are AI API prices still dropping in 2026?

Yes. Budget model prices have dropped ~50% since mid-2025. Gemini Flash Lite went from $0.15/M to $0.075/M. The retirement of legacy models like Claude 4 Opus ($15/M) in favor of cheaper successors like Claude Opus 4.7 ($5/M) continues the downward trend. Open-source models from Meta and DeepSeek are also driving prices lower.

AI API Pricing July 2026: Complete Guide to All 59 Models

July 2026 is the first month without Claude 4 Opus and Claude Sonnet 4. Anthropic retired both on June 15, leaving the market with 67 models across 10 providers. The budget tier continues to compress — Gemini Flash Lite still leads at $0.075/M — while the mid-tier has never been more competitive.

This guide covers every AI API model's current pricing, the post-deprecation landscape, mid-year pricing trends, and where the best deals are right now.

49 Models Available

10 Providers

$0.075 Cheapest / 1M tokens

6 Deprecated Models

📋 Post-Deprecation: What Changed on June 15

Claude 4 Opus ($15/$75 per 1M tokens, 200K context) and Claude Sonnet 4 ($3/$15 per 1M tokens, 200K context) were retired on June 15, 2026. The official replacements:

Claude 4 Opus → Claude Opus 4.7 or 4.8 ($5/$25) — 67% cheaper, 5x context
Claude Sonnet 4.6 → Claude Sonnet 4.6 ($3/$15) — same price, 5x context

Both replacements are strictly better: same or lower price, better quality, 1M context windows. If you haven't migrated yet, do it now.

Complete Pricing: All 59 Models

Every major AI API model ranked by input price. All prices per 1 million tokens.

Budget Tier — Under $0.60/1M Input

Rock-bottom prices for everyday tasks. Chatbots, classifiers, content tools — start here.

Model	Provider	Input / 1M	Output / 1M	Context
Gemini 2.5 Flash-Lite	Google	$0.10	$0.40	1M
GPT-oss 20B	OpenAI	$0.08	$0.35	128K
Llama 3.1 8B	Meta (Together.ai)	$0.10	$0.10	128K
Gemini 2.5 Flash-Lite	Google	$0.10	$0.40	1M (deprecated)
Llama 4 Scout	Meta (Together.ai)	$0.18	$0.59	1M
DeepSeek V4 Flash	DeepSeek	$0.14	$0.28	1M
GPT-4o mini	OpenAI	$0.15	$0.60	128K
GPT-oss 120B	OpenAI	$0.15	$0.60	128K
Mistral Small 4	Mistral	$0.10	$0.30	128K
Llama 4 Maverick	Meta (Together.ai)	$0.27	$0.85	1M
DeepSeek V3.2	DeepSeek	$0.23	$0.34	128K
Gemini 3.1 Flash-Lite	Google	$0.25	$1.50	1M
GPT-5 mini	OpenAI	$0.25	$2.00	272K
DeepSeek V3	DeepSeek	$0.27	$1.10	128K (deprecated)
DeepSeek V4 Pro	DeepSeek	$0.44	$0.87	1M
Mistral Large 3	Mistral	$0.50	$1.50	262K
Gemini 3 Flash	Google	$0.50	$3.00	1M
Command R	Cohere	$0.50	$1.50	128K
Grok Build 0.1	xAI	$0.30	$0.50	256K

Mid Tier — $0.50–$3.00/1M Input

The sweet spot for production workloads. Strong reasoning at reasonable prices.

Model	Provider	Input / 1M	Output / 1M	Context
Llama 3.1 70B	Meta (Together.ai)	$0.88	$0.88	128K
Kimi K2.6	Moonshot	$0.95	$4.00	256K
Claude Haiku 4.5	Anthropic	$1.00	$5.00	200K
Gemini 2.5 Pro	Google	$1.25	$10.00	1M
GPT-5	OpenAI	$1.25	$10.00	272K
Gemini 3.5 Flash	Google	$1.50	$9.00	1M
Mistral Medium 3.5	Mistral	$1.50	$7.50	128K
GPT-5.3 Codex	OpenAI	$1.75	$14.00	400K
Gemini 3.1 Pro	Google	$2.00	$12.00	1M
Jamba 1.5 Large	AI21	$2.00	$8.00	256K (deprecated)
Jamba 1.7 Large	AI21	$2.00	$8.00	256K
GPT-4o	OpenAI	$2.50	$10.00	128K
Command R+	Cohere	$2.50	$10.00	128K
Command A	Cohere	$2.50	$10.00	128K
Claude Sonnet 4.6	Anthropic	$3.00	$15.00	1M
Grok 4.3	xAI	$1.25	$2.50	1M

Premium Tier — $5.00+/1M Input

For complex reasoning, code generation, and high-stakes tasks where quality is non-negotiable.

Model	Provider	Input / 1M	Output / 1M	Context
Claude Opus 4.8	Anthropic	$5.00	$25.00	1M
Claude Opus 4.7	Anthropic	$5.00	$25.00	1M
GPT-5.5	OpenAI	$5.00	$30.00	1M
GPT-5.5 Pro	OpenAI	$30.00	$180.00	1M

Mid-Year Pricing Trends

1. The Post-Deprecation Market

With Claude 4 Opus gone, the most expensive remaining model is GPT-5.5 Pro at $30/$180 per 1M tokens. But nobody should be paying $15/M for a 200K-context model anymore — Claude Opus 4.8 at $5/M with 1M context is strictly superior to the old Opus in every dimension.

2. Budget Tier Compression

The gap between the cheapest and most capable budget models continues to shrink. Gemini 2.5 Flash-Lite at $0.075/M is now within 2x of Llama 3.1 8B ($0.10/M), and both support 1M+ context. A year ago, $0.15/M was the floor. The 133,000-tokens-per-penny mark has been broken.

3. The Mid-Tier Sweet Spot

Models in the $1-3/M range now dominate production use cases. Claude Haiku 4.5 ($1/M) and GPT-5 ($1.25/M) deliver near-premium quality at 4-10x lower cost. For most teams, there's no reason to go premium unless you need the absolute best reasoning.

4. Open Source Catches Up

Meta's Llama 4 Scout (1M context)) is the only model offering 10M token context at any price. Llama 4 Maverick ($0.20/M) handles general workloads well. Together.ai's managed hosting makes these zero-ops options for teams that want open-source economics without infrastructure burden.

Best Deals by Use Case

Use Case	Best Model	Why
High-volume chatbot	Gemini 2.5 Flash-Lite	Cheapest at $0.075/M, handles most chat tasks
Quality chatbot	Claude Haiku 4.5	$1/M with Anthropic's quality
Code generation	Claude Sonnet 4.6	Best code quality at $3/M, 1M context
Document analysis	Gemini 2.5 Pro	1M context at $1.25/M
Classification / extraction	GPT-4o mini	$0.15/M, fast, reliable structured output
RAG / retrieval	DeepSeek V4 Flash	$0.14/M with 1M context
Content writing	GPT-5 mini	$0.25/M, strong writing at budget price
Complex reasoning	Claude Opus 4.7	Best reasoning quality at $5/M
Agent / multi-step	GPT-5	$1.25/M, strong tool use, 272K context
Long documents (10M+)	Llama 4 Scout	Only model with 1M context at $0.18/M
Budget all-around	DeepSeek V4 Pro	$0.44/M with 1M context — best value pick

What $100/Month Gets You in July 2026

Assuming 1,000 tokens per request with a 50/50 input/output split:

Tier	Model	Requests for $100	Daily Average
Budget	Gemini 2.5 Flash-Lite	~571,000	~19,000/day
Budget	DeepSeek V4 Flash	~476,000	~15,900/day
Budget	Llama 4 Scout	~434,000	~14,500/day
Mid	Claude Haiku 4.5	~62,500	~2,100/day
Mid	GPT-5	~30,800	~1,030/day
Mid	Claude Sonnet 4.6	~22,200	~740/day
Premium	Claude Opus 4.7	~8,000	~267/day
Premium	GPT-5.5	~7,700	~257/day

The range: 19,000 requests/day to 257 requests/day for the same $100 budget. Model selection is the single biggest cost lever you have.

Provider Comparison at a Glance

Provider	Models	Cheapest	Most Expensive	Best For
OpenAI	9	$0.08/M	$180/M	Widest range, agents
Anthropic	4	$1.00/M	$25/M	Code, reasoning
Google	4	$0.075/M	$12/M	Budget, long context
DeepSeek	3	$0.14/M	$1.10/M	Budget all-around
Meta (Together.ai)	4	$0.10/M	$0.88/M	Open source, 1M context
Mistral	2	$0.15/M	$1.50/M	European compliance
Cohere	2	$0.50/M	$10/M	RAG, enterprise search
Moonshot	1	$0.90/M	$3.75/M	Long context (256K)
xAI	2	$3.00/M	$150/M	Real-time data
AI21	1	$2.00/M	$8/M	Long context (256K)

What to Watch in August 2026

Anthropic post-deprecation pricing — will Opus 4.8 pricing drop now that the old Opus is gone and competition intensifies?
DeepSeek V5 — V4 Pro at $0.44/M is aggressive; V5 could disrupt the budget tier further
xAI pricing rebrand — Grok 4.3 at $1.25/$2.50 is now competitive; Grok Build 0.1 at $1.00/$2.00 joins the budget tier
OpenAI's open-source response — GPT-oss models may see deeper cuts to compete with Llama 4
New model launches — Q3 typically sees major releases from OpenAI and Google

Methodology

All pricing data comes from official provider pricing pages, verified as of Jun 25, 2026. We track 49 active models across 10 providers: OpenAI, Anthropic, Google, DeepSeek, Mistral, Cohere, Meta (via Together.ai), Moonshot, xAI, and AI21.

Prices are per 1 million tokens unless otherwise noted. Context window sizes reflect the maximum supported. Some providers offer batch pricing or committed-use discounts not reflected here.

Calculate your exact costs

Use our free tools to see what these prices mean for your workload. No signup required.

Open Cost Calculator →

Related Tools

AI API Cost Calculator — estimate costs for any model
Cost Explorer — see all models ranked by cost
Model Compare — side-by-side model comparison
Pricing Index — complete sortable pricing database
Cheapest AI API Finder — find the lowest-cost option
AI API Pricing June 2026 — previous month's guide
State of LLM Pricing Report — interactive July report

🔌 Free MCP Server →

← June 2026 Pricing Guide August 2026 Pricing Guide →

Want to optimize your AI API costs?

APIpulse includes free cost comparisons, exports, and recommendations that can save you up to 40%.

Free Cost Audit →