What is the cheapest LLM API?

DeepSeek V4 Flash ($0.14/$0.28) is the cheapest. Gemini 2.5 Flash ($0.075/$0.30) is also very affordable. Both handle most workloads well.

What is the best premium LLM API?

GPT-5 ($1.25/$10) and Claude Opus 4.7 ($5/$25) are the top premium options. GPT-5 offers better value, while Claude excels at complex reasoning.

How do I compare LLM API prices?

Use APIpulse's free cost calculator to compare 42 models from 10 providers based on your specific usage pattern.

🔥 Limited time: Pro lifetime access $29 — price goes up July 12 →

← Back to blog

Reference Guide April 23, 2026

LLM API Pricing Cheat Sheet: Every Model, Every Provider (April 2026)

⚠️ Deprecation alert: Claude 4 Opus and Claude Sonnet 4 retired on June 15, 2026. If you're using these models, see our migration guide for step-by-step instructions.

🚨 Claude 4 retired June 15: See all 42 alternatives, calculate your savings, and get migration code on our Claude 4 Migration Hub.

Stop jumping between pricing pages. Here's every major LLM API priced side by side — input costs, output costs, context windows, and real cost-per-use examples. Bookmark this page and check back when providers update their rates.

Complete Pricing Table

Try It Live — Instant Cost Calculator

See exactly what this model costs for your workload. No signup needed.

Model

Tokens/req

Requests/day

All prices are per 1M tokens. Data verified .

Provider	Model	Input	Output	Context	Tier
OpenAI	GPT-4o	$2.50	$10.00	128K	Premium
OpenAI	GPT-4o mini	$0.15	$0.60	128K	Budget
Anthropic	Claude Sonnet 4	$3.00	$15.00	200K	Premium
Anthropic	Claude Haiku 4.5	$1.00	$5.00	200K	Budget
Google	Gemini 2.5 Pro	$1.25	$10.00	1M	Premium
Google	Gemini 2.0 Flash	$0.10	$0.40	1M	Budget
Mistral	Large	$2.00	$6.00	128K	Premium
Mistral	Small	$0.10	$0.30	32K	Budget
Cohere	Command R+	$2.50	$10.00	128K	Premium
Cohere	Command R	$0.15	$0.60	128K	Budget
Meta (Together.ai)	Llama 3.1 70B	$0.88	$0.88	128K	Budget
Meta (Together.ai)	Llama 3.1 8B	$0.18	$0.18	128K	Budget
AI21	Jamba 1.5 Large	$2.00	$8.00	256K	Premium

Cheapest Models by Tier

Budget Tier (Under $1/M input)

Ranked by total cost per 1M tokens (input + output)

1. Mistral Small 4 $0.40 total ($0.10 in / $0.30 out)

2. Gemini 2.0 Flash $0.50 total ($0.10 in / $0.40 out)

3. Llama 3.1 8B (Together) $0.36 total ($0.18 in / $0.18 out)

4. GPT-4o mini $0.75 total ($0.15 in / $0.60 out)

5. Cohere Command R $0.75 total ($0.15 in / $0.60 out)

6. Claude Haiku 4.5 $6.00 total ($1.00 in / $5.00 out)

7. Llama 3.1 70B (Together) $1.76 total ($0.88 in / $0.88 out)

Premium Tier ($1+/M input)

Ranked by total cost per 1M tokens (input + output)

1. Gemini 2.5 Pro $11.25 total ($1.25 in / $10.00 out)

2. Mistral Large 3 $8.00 total ($2.00 in / $6.00 out)

3. AI21 Jamba 1.5 Large $10.00 total ($2.00 in / $8.00 out)

4. GPT-4o $12.50 total ($2.50 in / $10.00 out)

5. Cohere Command R+ $12.50 total ($2.50 in / $10.00 out)

6. Claude Sonnet 4 $18.00 total ($3.00 in / $15.00 out)

Real-World Cost Examples

Here's what you'd actually pay for common workloads. Assumes 1,000 requests/day with 500 input tokens and 200 output tokens per request.

Chatbot (1K requests/day)

Monthly cost at 500 input + 200 output tokens per request

Gemini 2.0 Flash $1.05/mo

GPT-4o mini $1.58/mo

Claude Haiku 4.5 $6.90/mo

GPT-4o $26.25/mo

Claude Sonnet 4 $37.50/mo

Budget pick: Gemini 2.0 Flash $1.05/mo

Code Generation (1K requests/day)

Monthly cost at 1,000 input + 500 output tokens per request

Gemini 2.0 Flash $3.75/mo

Llama 3.1 70B $7.92/mo

GPT-4o $75.00/mo

Claude Sonnet 4 $112.50/mo

Budget pick: Gemini 2.0 Flash $3.75/mo

Document Analysis (100 requests/day)

Monthly cost at 10,000 input + 2,000 output tokens per request

Gemini 2.0 Flash $3.30/mo

Gemini 2.5 Pro $9.75/mo

GPT-4o $13.50/mo

Claude Sonnet 4 $18.00/mo

Best value for long docs: Gemini 2.5 Pro $9.75/mo (1M context)

Context Window Comparison

Context Window	Models	Best For
32K	Mistral Small 4	Short prompts, classification, simple Q&A
128K	GPT-4o, GPT-4o mini, Mistral Large 3, Cohere Command R/R+, Llama 3.1	Most use cases, multi-turn chat, code generation
200K	Claude Sonnet 4, Claude Haiku 4.5	Long documents, large codebases, book-length analysis
256K	AI21 Jamba 1.5 Large	Very long documents, legal contracts, research papers
1M	Gemini 2.5 Pro, Gemini 2.0 Flash	Entire codebases, video analysis, massive datasets

Quick Decision Guide

Cheapest overall: Mistral Small 4 ($0.10/$0.30) — but only 32K context
Cheapest with decent context: Gemini 2.0 Flash ($0.10/$0.40) — 1M context at budget price
Best quality per dollar (premium): Gemini 2.5 Pro ($1.25/$10.00) — cheapest premium with 1M context
Best for code: Claude Sonnet 4 ($3.00/$15.00) — strongest coding benchmarks
Best for chat: GPT-4o ($2.50/$10.00) — most natural conversation
Best open-source option: Llama 3.1 70B via Together.ai ($0.88/$0.88) — symmetric pricing
Best for long documents: Gemini 2.5 Pro — 1M context window eliminates chunking

How to Use This Data

Don't just pick the cheapest model. Use the APIpulse Calculator to model your specific usage pattern. The right model depends on your input/output ratio, request volume, and quality requirements.

A model that costs 5x more but produces results that need no editing can actually be cheaper than a budget model that requires human review.

Calculate your exact monthly cost with your real usage numbers.

Try the APIpulse Calculator

🔍 Free Cost Audit — See if you're overpaying for AI APIs