← Back to blog

LLM API Pricing Report Q2 2026: Every Model, Every Provider

It's Q2 2026, and the LLM API market has never been more competitive. With 10 providers offering 33 models, prices have continued their downward trend while context windows have expanded dramatically. This report covers every model, every provider, and every price point — so you can make the right choice for your application.

Updated May 2, 2026: Prices have changed since this report was published. Grok 3 increased 10x, DeepSeek V4 Pro dropped 75%, Mistral Large 3 dropped 75%. See the May 2026 Pricing Shakeup and Pricing Changelog for the latest data.
35
Models Available
10
Providers
90%
Avg. Price Drop Since 2023

The Complete Pricing Landscape

Here's every model available as of April 2026, organized by tier:

Premium Tier — Maximum Quality

Model Provider Input (per 1M tokens) Output (per 1M tokens) Context Window
Claude 4 Opus Anthropic $15.00 $75.00 200K
GPT-5 OpenAI $1.25 $10.00 272K
Gemini 2.5 Pro Google $1.25 $10.00 1M
Claude Sonnet 4 Anthropic $3.00 $15.00 200K
GPT-4o OpenAI $2.50 $10.00 128K
Mistral Large 3 Mistral $2.00 $6.00 128K
Cohere Command R+ Cohere $2.50 $10.00 128K
AI21 Jamba 1.5 Large AI21 $2.00 $8.00 256K

Budget Tier — Maximum Value

Model Provider Input (per 1M tokens) Output (per 1M tokens) Context Window
Gemini 2.0 Flash Google $0.10 $0.40 1M
GPT-4o mini OpenAI $0.15 $0.60 128K
Claude Haiku 4.5 Anthropic $0.80 $4.00 200K
Mistral Small 4 Mistral $0.10 $0.30 32K
Cohere Command R Cohere $0.15 $0.60 128K
Llama 3.1 70B Together.ai $0.88 $0.88 128K
Llama 3.1 8B Together.ai $0.18 $0.18 128K
GPT-5 mini OpenAI $0.40 $1.60 256K

Key Changes Since Q1 2026

What's New This Quarter

  • GPT-5 launched — $1.25/$10 per 1M tokens, 272K context. Premium quality at a competitive price point.
  • GPT-5 mini launched — $0.25/$2.00, 272K context. Cheaper than GPT-4o with more context.
  • Claude 4 Opus — Anthropic's flagship at $15/$75. Best-in-class for complex reasoning.
  • Gemini 2.5 Pro — Google's premium model at $1.25/$10 with 1M context. Best value premium model.
  • Mistral Small 4 — $0.10/$0.30, cheapest output tokens on the market.

Cheapest Model by Use Case

Chatbot (1K requests/day, 500 input + 200 output tokens each)

Monthly Cost Comparison

Gemini 2.0 Flash$1.80/mo
Mistral Small 4$1.35/mo
GPT-4o mini$2.70/mo
Cohere Command R$2.70/mo
Llama 3.1 8B$1.62/mo
Claude Haiku 4.5$6.75/mo
GPT-4o$13.50/mo
Claude Sonnet 4$18.00/mo

Code Generation (100 requests/day, 1K input + 500 output tokens each)

Monthly Cost Comparison

Gemini 2.0 Flash$3.90/mo
GPT-4o mini$5.40/mo
Claude Haiku 4.5$10.50/mo
GPT-4o$22.50/mo
Claude Sonnet 4$31.50/mo
GPT-5$18.75/mo
Claude 4 Opus$97.50/mo

Document Analysis (50 requests/day, 10K input + 2K output tokens each)

Monthly Cost Comparison

Gemini 2.0 Flash$4.50/mo
GPT-4o mini$6.75/mo
Gemini 2.5 Pro$22.50/mo
GPT-4o$52.50/mo
Claude Sonnet 4$67.50/mo
GPT-5$48.75/mo
Claude 4 Opus$262.50/mo

Provider Scorecard

Provider Cheapest Model Best Premium Max Context Best For
Google Gemini 2.0 Flash ($0.10/$0.40) Gemini 2.5 Pro ($1.25/$10) 1M tokens Best value, longest context
OpenAI GPT-4o mini ($0.15/$0.60) GPT-5 ($1.25/$10) 272K tokens Ecosystem, tool use, vision
Anthropic Claude Haiku 4.5 ($1.00/$5) Claude 4 Opus ($15/$75) 200K tokens Code generation, reasoning
Mistral Mistral Small 4 ($0.10/$0.30) Mistral Large 3 ($2/$6) 128K tokens Cheapest output, European
Cohere Command R ($0.15/$0.60) Command R+ ($2.50/$10) 128K tokens RAG, enterprise
Together.ai Llama 3.1 8B ($0.18/$0.18) Llama 3.1 70B ($0.88/$0.88) 128K tokens Open source, symmetric pricing
AI21 Jamba 1.5 Large ($2/$8) 256K tokens Long context, hybrid architecture

Context Window Comparison

Context Window Models Best For
1M tokens Gemini 2.5 Pro, Gemini 2.0 Flash Full codebase analysis, book-length documents
272K tokens GPT-5, GPT-5 mini, AI21 Jamba 1.5 Large document analysis, long conversations
200K tokens Claude 4 Opus, Claude Sonnet 4, Claude Haiku 4.5 Complex multi-file tasks, extended reasoning
128K tokens GPT-4o, GPT-4o mini, Mistral Large 3, Cohere, Llama Most applications, chatbots, code generation
32K tokens Mistral Small 4 Short-form tasks, classification, simple Q&A

Recommendations

For Startups (Under $50/month budget)

Start with Gemini 2.0 Flash for most tasks. It's the cheapest model with the largest context window. Use GPT-4o mini as a fallback for tasks that need OpenAI's ecosystem (function calling, vision). Total cost: $2-10/month for typical startup usage.

For Growing Companies ($50-500/month budget)

Use a tiered approach: Gemini 2.0 Flash for high-volume simple tasks, Claude Sonnet 4 or GPT-4o for complex reasoning, and Claude 4 Opus only for the most demanding tasks. This hybrid strategy can save 60-80% compared to using a single premium model.

For Enterprise (500+ developers)

Negotiate volume discounts with multiple providers. Use Gemini 2.5 Pro for document-heavy workloads (1M context), Claude 4 Opus for code review and complex reasoning, and GPT-5 for tool-use-heavy workflows. Implement model routing to automatically select the cheapest model for each task.

Calculate your exact costs. Enter your usage patterns into our calculator to see which model and provider saves you the most.

Try the APIpulse Calculator or View Full Pricing Index