AI API Cost per Token Explained: The Complete Pricing Guide 2026

Published May 30, 2026 · Updated May 30, 2026 · By APIpulse

Every AI API charges per token. But what is a token? How do you calculate costs? And why do output tokens cost 3-6x more than input tokens? This guide breaks it all down with real numbers from 34 models across 10 providers.

What Is a Token?

Try It Live — Instant Cost Calculator

See exactly what this model costs for your workload. No signup needed.

A token is a chunk of text that the AI model processes. Roughly:

When you send a request to an AI API, the model counts your input tokens (what you send) and generates output tokens (what it returns). You pay for both — but at different rates.

Input vs Output Tokens: Why the Price Difference?

Every AI API has two prices:

Output tokens always cost more — typically 3-6x the input price. Here's why:

  1. Compute intensity: Generating each output token requires running the full model forward pass. Input tokens can be processed in parallel (batched), but output tokens must be generated one at a time.
  2. Memory requirements: The model must maintain attention over all previous tokens while generating each new one.
  3. Latency: Output generation is the bottleneck — users wait for it, so providers charge more.

Pro Tip: Control Output Length

Since output tokens cost 3-6x more, setting a max_tokens limit is the single easiest way to reduce costs. Most responses don't need 4,096 tokens — set it to 500-1000 and save 50-75% on output costs.

The Cost Formula

Cost per request =
(input_tokens ÷ 1,000,000 × input_price) + (output_tokens ÷ 1,000,000 × output_price)

Example: 1,000 input tokens + 500 output tokens on GPT-4o mini ($0.15/$0.60 per 1M):

Input cost:  1,000 ÷ 1,000,000 × $0.15 = $0.00015
Output cost:   500 ÷ 1,000,000 × $0.60 = $0.00030
Total per request:                        $0.00045

At 1,000 requests/day × 30 days = $13.50/month

Pricing Across 34 Models (Per 1M Tokens)

Model Provider Input Output Output/Input Ratio Context
Gemini 2.0 Flash Lite Google $0.075 $0.30 4.0x 1M
Llama 3.1 8B Meta $0.10 $0.10 1.0x 128K
Gemini 2.0 Flash Google $0.10 $0.40 4.0x 1M
Llama 4 Scout Meta $0.11 $0.34 3.1x 10M
DeepSeek V4 Flash DeepSeek $0.14 $0.28 2.0x 1M
GPT-4o mini OpenAI $0.15 $0.60 4.0x 128K
GPT-5 mini OpenAI $0.25 $2.00 8.0x 272K
Gemini 2.5 Pro Google $1.25 $10.00 8.0x 1M
GPT-5 OpenAI $1.25 $10.00 8.0x 272K
Claude Haiku 4.5 Anthropic $1.00 $5.00 5.0x 200K
Claude Sonnet 4.6 Anthropic $3.00 $15.00 5.0x 1M
GPT-5.5 OpenAI $5.00 $30.00 6.0x 1M
Claude Opus 4.8 Anthropic $5.00 $25.00 5.0x 1M
Grok 3 xAI $30.00 $150.00 5.0x 128K
GPT-5.5 Pro OpenAI $30.00 $180.00 6.0x 1M

Key observation: The cheapest input tokens (Gemini Flash Lite at $0.075) are 400x cheaper than the most expensive (GPT-5.5 Pro at $30.00). The output spread is even wider at 600x. Model choice is the single biggest lever for controlling costs.

How Tokens Add Up in Real Applications

Chatbot (Simple Q&A)

System prompt:     200 tokens (fixed instructions)
User message:      100 tokens (the question)
Model response:    300 tokens (the answer)
Total:             600 tokens per request

Cost on GPT-4o mini: $0.00027/request
Cost on GPT-5:       $0.00375/request (14x more)

RAG Pipeline (Search + Generate)

System prompt:      300 tokens
Retrieved context: 2,000 tokens (5 documents)
User question:      100 tokens
Model response:     500 tokens
Total:             2,900 tokens per request

Cost on GPT-4o mini: $0.00174/request
Cost on GPT-5:       $0.02415/request (14x more)

Coding Assistant

System prompt:      500 tokens (code instructions)
Code context:      3,000 tokens (file contents)
User instruction:   200 tokens
Model response:   1,500 tokens (code generation)
Total:            5,200 tokens per request

Cost on Claude Sonnet 4.6: $0.039/request
Cost on GPT-5.5:           $0.0725/request (1.9x more)

5 Ways to Reduce Your Token Costs

  1. Shorter prompts: Remove unnecessary instructions, use concise system prompts. Every token in your prompt costs money.
  2. Conversation pruning: Don't send 50 messages of history. Keep the last 5-10 and summarize the rest.
  3. Output limits: Set max_tokens to what you actually need. Most chat responses don't need 4,096 tokens.
  4. Model routing: Use cheap models for simple tasks, expensive ones for complex reasoning.
  5. Prompt caching: OpenAI and Anthropic offer prompt caching — identical prefixes cost 50-90% less.

Calculate Your Costs

Don't guess — calculate. Enter your exact usage into our calculator to see what every model costs you per month.

See your exact costs across all 34 models

Enter your daily requests and token counts. Get instant cost comparisons sorted cheapest-first.

Try the Monthly Spend Estimator

Try it free: APIpulse Cost Calculator — estimate your monthly spend across 34 models and 10 providers in 30 seconds.