AI API Pricing September 2026: Complete Guide to All 32 Models
32 models. 10 providers. Prices from $0.075/M to $180/M. Q3 is solidifying — here's where every model stands before Q4 launches.
September 2026 brings the third full month of the post-deprecation era. The 32-model market continues to stabilize, Q3 trends are now firmly established, and the big question is what Q4 will bring. OpenAI, Google, and Anthropic historically make their biggest announcements in Q4 — and the pricing landscape could shift dramatically.
This guide covers every AI API model's current pricing, the established Q3 landscape, and what to watch as we head into the final quarter of 2026.
📊 Market Stability: Three Months Post-Deprecation
Three months after Claude 4 Opus and Claude Sonnet 4 were retired, the market is firmly settled. Claude Opus 4.7/4.8 ($5/$25) and Claude Sonnet 4.6 ($3/$15) are established as the replacements. No major deprecations are expected through Q3. But Q4 is the traditional launch window — start planning your migration strategy now.
Complete Pricing: All 32 Models
Every major AI API model ranked by input price. All prices per 1 million tokens.
Budget Tier — Under $0.60/1M Input
Rock-bottom prices for everyday tasks. Chatbots, classifiers, content tools — start here.
| Model | Provider | Input / 1M | Output / 1M | Context |
|---|---|---|---|---|
| Gemini 2.0 Flash Lite | $0.075 | $0.30 | 1M | |
| GPT-oss 20B | OpenAI | $0.08 | $0.35 | 128K |
| Llama 3.1 8B | Meta (Together.ai) | $0.10 | $0.10 | 128K |
| Gemini 2.0 Flash | $0.10 | $0.40 | 1M | |
| Llama 4 Scout | Meta (Together.ai) | $0.11 | $0.34 | 10M |
| DeepSeek V4 Flash | DeepSeek | $0.14 | $0.28 | 1M |
| GPT-4o mini | OpenAI | $0.15 | $0.60 | 128K |
| GPT-oss 120B | OpenAI | $0.15 | $0.60 | 128K |
| Mistral Small 4 | Mistral | $0.15 | $0.60 | 128K |
| Llama 4 Maverick | Meta (Together.ai) | $0.20 | $0.60 | 10M |
| GPT-5 mini | OpenAI | $0.25 | $2.00 | 272K |
| DeepSeek V3 | DeepSeek | $0.27 | $1.10 | 128K |
| DeepSeek V4 Pro | DeepSeek | $0.44 | $0.87 | 1M |
| Mistral Large 3 | Mistral | $0.50 | $1.50 | 128K |
| Command R | Cohere | $0.50 | $1.50 | 128K |
| Grok Build 0.1 | xAI | $0.30 | $0.50 | 256K |
Mid Tier — $0.50–$3.00/1M Input
The sweet spot for production workloads. Strong reasoning at reasonable prices.
| Model | Provider | Input / 1M | Output / 1M | Context |
|---|---|---|---|---|
| Llama 3.1 70B | Meta (Together.ai) | $0.88 | $0.88 | 128K |
| Kimi K2.6 | Moonshot | $0.90 | $3.75 | 256K |
| Claude Haiku 4.5 | Anthropic | $1.00 | $5.00 | 200K |
| Gemini 2.5 Pro | $1.25 | $10.00 | 1M | |
| GPT-5 | OpenAI | $1.25 | $10.00 | 272K |
| GPT-5.3 Codex | OpenAI | $1.75 | $14.00 | 400K |
| Gemini 3.1 Pro | $2.00 | $12.00 | 1M | |
| Jamba 1.5 Large | AI21 | $2.00 | $8.00 | 256K |
| GPT-4o | OpenAI | $2.50 | $10.00 | 128K |
| Command R+ | Cohere | $2.50 | $10.00 | 128K |
| Claude Sonnet 4.6 | Anthropic | $3.00 | $15.00 | 1M |
| Grok 4.3 | xAI | $1.25 | $2.50 | 1M |
Premium Tier — $5.00+/1M Input
For complex reasoning, code generation, and high-stakes tasks where quality is non-negotiable.
| Model | Provider | Input / 1M | Output / 1M | Context |
|---|---|---|---|---|
| Claude Opus 4.8 | Anthropic | $5.00 | $25.00 | 1M |
| Claude Opus 4.7 | Anthropic | $5.00 | $25.00 | 1M |
| GPT-5.5 | OpenAI | $5.00 | $30.00 | 1M |
| GPT-5.5 Pro | OpenAI | $30.00 | $180.00 | 1M |
Q3 2026 Pricing Trends
1. The Market Has Fully Stabilized
Three months after the Claude 4 Opus and Claude Sonnet 4 deprecation, September shows a calm, predictable market. No major model retirements are expected through the rest of Q3. The 32-model landscape is the established order — but Q4 is historically when providers reset the curve.
2. Budget Tier Is Packed
The sub-$0.15/M tier remains crowded with 6 models competing for the budget crown. Gemini 2.0 Flash Lite ($0.075/M) still holds the cheapest spot. The 133,000-tokens-per-penny threshold hasn't been broken since July — it may take a new generation model to push below $0.05/M.
3. Mid-Tier Continues to Deliver
The $1–3/M tier remains the sweet spot for production workloads. Claude Haiku 4.5 ($1/M) continues to handle 90% of tasks that used to require premium models. Claude Sonnet 4.6 ($3/M) with 1M context remains the top code generation model. Going premium ($5+/M) is only justified for the most complex reasoning tasks.
4. Open Source Steady, Awaiting Llama 5
Meta's Llama 4 Scout ($0.11/M, 10M context) still holds the longest-context crown. No new Llama releases in Q3 — all eyes are on Q4/early 2027 for Llama 5. Together.ai's managed hosting continues to make open-source models zero-ops.
5. xAI: The Outlier
Grok 4.3 at $1.25/$2.50 per 1M tokens is now competitive after xAI's rebrand and repricing. Grok Build 0.1 ($0.30/$0.50) joins the budget tier. The old Grok 3 ($30/$150) has been retired.
Best Deals by Use Case
| Use Case | Best Model | Why |
|---|---|---|
| High-volume chatbot | Gemini 2.0 Flash Lite | Cheapest at $0.075/M, handles most chat tasks |
| Quality chatbot | Claude Haiku 4.5 | $1/M with Anthropic's quality |
| Code generation | Claude Sonnet 4.6 | Best code quality at $3/M, 1M context |
| Document analysis | Gemini 2.5 Pro | 1M context at $1.25/M |
| Classification / extraction | GPT-4o mini | $0.15/M, fast, reliable structured output |
| RAG / retrieval | DeepSeek V4 Flash | $0.14/M with 1M context |
| Content writing | GPT-5 mini | $0.25/M, strong writing at budget price |
| Complex reasoning | Claude Opus 4.7 | Best reasoning quality at $5/M |
| Agent / multi-step | GPT-5 | $1.25/M, strong tool use, 272K context |
| Long documents (10M+) | Llama 4 Scout | Only model with 10M context at $0.11/M |
| Budget all-around | DeepSeek V4 Pro | $0.44/M with 1M context — best value pick |
What $100/Month Gets You in September 2026
Assuming 1,000 tokens per request with a 50/50 input/output split:
| Tier | Model | Requests for $100 | Daily Average |
|---|---|---|---|
| Budget | Gemini 2.0 Flash Lite | ~571,000 | ~19,000/day |
| Budget | DeepSeek V4 Flash | ~476,000 | ~15,900/day |
| Budget | Llama 4 Scout | ~434,000 | ~14,500/day |
| Mid | Claude Haiku 4.5 | ~62,500 | ~2,100/day |
| Mid | GPT-5 | ~30,800 | ~1,030/day |
| Mid | Claude Sonnet 4.6 | ~22,200 | ~740/day |
| Premium | Claude Opus 4.7 | ~8,000 | ~267/day |
| Premium | GPT-5.5 | ~7,700 | ~257/day |
The range: 19,000 requests/day to 257 requests/day for the same $100 budget. Model selection is the single biggest cost lever you have.
Provider Comparison at a Glance
| Provider | Models | Cheapest | Most Expensive | Best For |
|---|---|---|---|---|
| OpenAI | 9 | $0.08/M | $180/M | Widest range, agents |
| Anthropic | 4 | $1.00/M | $25/M | Code, reasoning |
| 4 | $0.075/M | $12/M | Budget, long context | |
| DeepSeek | 3 | $0.14/M | $1.10/M | Budget all-around |
| Meta (Together.ai) | 4 | $0.10/M | $0.88/M | Open source, 10M context |
| Mistral | 2 | $0.15/M | $1.50/M | European compliance |
| Cohere | 2 | $0.50/M | $10/M | RAG, enterprise search |
| Moonshot | 1 | $0.90/M | $3.75/M | Long context (256K) |
| xAI | 2 | $3.00/M | $150/M | Real-time data |
| AI21 | 1 | $2.00/M | $8/M | Long context (256K) |
What to Watch in Q4 2026
⚠️ Q4 Is the Big One
Historically, Q4 is when OpenAI, Google, and Anthropic make their biggest announcements. If you're planning a major AI integration, consider locking in current pricing with committed-use discounts before potential disruption.
- OpenAI GPT-6 — The traditional Q4 launch window. A new generation could reset the entire pricing curve. If GPT-6 arrives, expect GPT-5 prices to drop.
- Google Gemini 3.0 Flash — Gemini 2.0 Flash Lite has held the budget crown for months. A successor could push below $0.05/M and break the 133K-tokens-per-penny threshold.
- DeepSeek V5 — V4 Pro at $0.44/M is already aggressive. V5 could disrupt the mid-tier and pressure Anthropic/OpenAI to respond.
- xAI pricing rebrand — Grok 4.3 at $1.25/$2.50 is now competitive; Grok Build 0.1 at $0.30/$0.50 joins the budget tier.
- Meta Llama 5 — Llama 4 Scout's 10M context was a breakthrough. Llama 5 could push context further or improve quality to compete with mid-tier models.
- Anthropic Claude 5 — The Claude 4 family is mature. A new generation could arrive in late 2026, potentially reshuffling the premium tier.
💡 Cost Optimization Tip
If you're still using GPT-4o ($2.50/M) for general tasks, switching to Claude Haiku 4.5 ($1/M) saves 60% with comparable quality. For classification/extraction, GPT-4o mini ($0.15/M) is 17x cheaper than GPT-4o. Use our Cost Optimizer to find savings in your current setup.
Methodology
All pricing data comes from official provider pricing pages, verified as of May 29, 2026. We track 32 active models across 10 providers: OpenAI, Anthropic, Google, DeepSeek, Mistral, Cohere, Meta (via Together.ai), Moonshot, xAI, and AI21.
Prices are per 1 million tokens unless otherwise noted. Context window sizes reflect the maximum supported. Some providers offer batch pricing or committed-use discounts not reflected here.
Calculate your exact costs
Use our free tools to see what these prices mean for your workload. No signup required.
Open Cost Calculator →Related Tools
- AI API Cost Calculator — estimate costs for any model
- Cost Optimizer — find savings in your current setup
- Cost Explorer — see all models ranked by cost
- Model Compare — side-by-side model comparison
- Pricing Index — complete sortable pricing database
- Cheapest AI API Finder — find the lowest-cost option
- AI API Pricing August 2026 — previous month's guide
- State of LLM Pricing Report — interactive June report