LLM API Pricing Report Q2 2026: Every Model, Every Provider
It's Q2 2026, and the LLM API market has never been more competitive. With 10 providers offering 33 models, prices have continued their downward trend while context windows have expanded dramatically. This report covers every model, every provider, and every price point — so you can make the right choice for your application.
The Complete Pricing Landscape
Here's every model available as of April 2026, organized by tier:
Premium Tier — Maximum Quality
| Model | Provider | Input (per 1M tokens) | Output (per 1M tokens) | Context Window |
|---|---|---|---|---|
| Claude 4 Opus | Anthropic | $15.00 | $75.00 | 200K |
| GPT-5 | OpenAI | $1.25 | $10.00 | 272K |
| Gemini 2.5 Pro | $1.25 | $10.00 | 1M | |
| Claude Sonnet 4 | Anthropic | $3.00 | $15.00 | 200K |
| GPT-4o | OpenAI | $2.50 | $10.00 | 128K |
| Mistral Large 3 | Mistral | $2.00 | $6.00 | 128K |
| Cohere Command R+ | Cohere | $2.50 | $10.00 | 128K |
| AI21 Jamba 1.5 Large | AI21 | $2.00 | $8.00 | 256K |
Budget Tier — Maximum Value
| Model | Provider | Input (per 1M tokens) | Output (per 1M tokens) | Context Window |
|---|---|---|---|---|
| Gemini 2.0 Flash | $0.10 | $0.40 | 1M | |
| GPT-4o mini | OpenAI | $0.15 | $0.60 | 128K |
| Claude Haiku 4.5 | Anthropic | $0.80 | $4.00 | 200K |
| Mistral Small 4 | Mistral | $0.10 | $0.30 | 32K |
| Cohere Command R | Cohere | $0.15 | $0.60 | 128K |
| Llama 3.1 70B | Together.ai | $0.88 | $0.88 | 128K |
| Llama 3.1 8B | Together.ai | $0.18 | $0.18 | 128K |
| GPT-5 mini | OpenAI | $0.40 | $1.60 | 256K |
Key Changes Since Q1 2026
What's New This Quarter
- GPT-5 launched — $1.25/$10 per 1M tokens, 272K context. Premium quality at a competitive price point.
- GPT-5 mini launched — $0.25/$2.00, 272K context. Cheaper than GPT-4o with more context.
- Claude 4 Opus — Anthropic's flagship at $15/$75. Best-in-class for complex reasoning.
- Gemini 2.5 Pro — Google's premium model at $1.25/$10 with 1M context. Best value premium model.
- Mistral Small 4 — $0.10/$0.30, cheapest output tokens on the market.
Cheapest Model by Use Case
Chatbot (1K requests/day, 500 input + 200 output tokens each)
Monthly Cost Comparison
Code Generation (100 requests/day, 1K input + 500 output tokens each)
Monthly Cost Comparison
Document Analysis (50 requests/day, 10K input + 2K output tokens each)
Monthly Cost Comparison
Provider Scorecard
| Provider | Cheapest Model | Best Premium | Max Context | Best For |
|---|---|---|---|---|
| Gemini 2.0 Flash ($0.10/$0.40) | Gemini 2.5 Pro ($1.25/$10) | 1M tokens | Best value, longest context | |
| OpenAI | GPT-4o mini ($0.15/$0.60) | GPT-5 ($1.25/$10) | 272K tokens | Ecosystem, tool use, vision |
| Anthropic | Claude Haiku 4.5 ($1.00/$5) | Claude 4 Opus ($15/$75) | 200K tokens | Code generation, reasoning |
| Mistral | Mistral Small 4 ($0.10/$0.30) | Mistral Large 3 ($2/$6) | 128K tokens | Cheapest output, European |
| Cohere | Command R ($0.15/$0.60) | Command R+ ($2.50/$10) | 128K tokens | RAG, enterprise |
| Together.ai | Llama 3.1 8B ($0.18/$0.18) | Llama 3.1 70B ($0.88/$0.88) | 128K tokens | Open source, symmetric pricing |
| AI21 | — | Jamba 1.5 Large ($2/$8) | 256K tokens | Long context, hybrid architecture |
Context Window Comparison
| Context Window | Models | Best For |
|---|---|---|
| 1M tokens | Gemini 2.5 Pro, Gemini 2.0 Flash | Full codebase analysis, book-length documents |
| 272K tokens | GPT-5, GPT-5 mini, AI21 Jamba 1.5 | Large document analysis, long conversations |
| 200K tokens | Claude 4 Opus, Claude Sonnet 4, Claude Haiku 4.5 | Complex multi-file tasks, extended reasoning |
| 128K tokens | GPT-4o, GPT-4o mini, Mistral Large 3, Cohere, Llama | Most applications, chatbots, code generation |
| 32K tokens | Mistral Small 4 | Short-form tasks, classification, simple Q&A |
Recommendations
For Startups (Under $50/month budget)
Start with Gemini 2.0 Flash for most tasks. It's the cheapest model with the largest context window. Use GPT-4o mini as a fallback for tasks that need OpenAI's ecosystem (function calling, vision). Total cost: $2-10/month for typical startup usage.
For Growing Companies ($50-500/month budget)
Use a tiered approach: Gemini 2.0 Flash for high-volume simple tasks, Claude Sonnet 4 or GPT-4o for complex reasoning, and Claude 4 Opus only for the most demanding tasks. This hybrid strategy can save 60-80% compared to using a single premium model.
For Enterprise (500+ developers)
Negotiate volume discounts with multiple providers. Use Gemini 2.5 Pro for document-heavy workloads (1M context), Claude 4 Opus for code review and complex reasoning, and GPT-5 for tool-use-heavy workflows. Implement model routing to automatically select the cheapest model for each task.
Calculate your exact costs. Enter your usage patterns into our calculator to see which model and provider saves you the most.
Try the APIpulse Calculator or View Full Pricing Index