What is the cost per token for AI APIs?

AI API pricing ranges from $0.075 to $30 per 1M input tokens, and $0.10 to $180 per 1M output tokens. Budget models like Gemini Flash Lite cost $0.075/$0.30 per 1M tokens, while premium models like GPT-5.5 Pro cost $30/$180 per 1M tokens. Output tokens are always more expensive than input tokens.

How do I calculate my AI API cost?

Formula: (input_tokens / 1,000,000 × input_price) + (output_tokens / 1,000,000 × output_price) = cost per request. Multiply by number of requests for total cost. For example, 1,000 input tokens and 500 output tokens on GPT-4o mini: (1000/1M × $0.15) + (500/1M × $0.60) = $0.00015 + $0.00030 = $0.00045 per request.

Why are output tokens more expensive than input tokens?

Output tokens cost 3-6x more because generating text requires more compute than processing it. The model must run inference for each output token sequentially, while input tokens can be processed in parallel. This is why keeping responses short and using max_tokens limits can significantly reduce costs.

How many tokens is a typical API request?

A typical request includes: system prompt (100-500 tokens), conversation history (500-5000 tokens), user message (50-500 tokens), and model response (100-2000 tokens). Total: 750-8000 tokens per request. A simple chatbot uses ~1,000 tokens total; a RAG pipeline with context can use 5,000-10,000.

AI API Cost per Token Explained: The Complete Pricing Guide 2026

A token is a chunk of text that the AI model processes. Roughly:

1 token ≈ 4 characters in English
1 token ≈ ¾ of a word
100 tokens ≈ 75 words
1,000 tokens ≈ 750 words (about 1.5 pages)

When you send a request to an AI API, the model counts your input tokens (what you send) and generates output tokens (what it returns). You pay for both — but at different rates.

Input vs Output Tokens: Why the Price Difference?

Every AI API has two prices:

Input price: Cost per 1M tokens you send to the model (your prompt + context)
Output price: Cost per 1M tokens the model generates (its response)

Output tokens always cost more — typically 3-6x the input price. Here's why:

Compute intensity: Generating each output token requires running the full model forward pass. Input tokens can be processed in parallel (batched), but output tokens must be generated one at a time.
Memory requirements: The model must maintain attention over all previous tokens while generating each new one.
Latency: Output generation is the bottleneck — users wait for it, so providers charge more.

Pro Tip: Control Output Length

Since output tokens cost 3-6x more, setting a max_tokens limit is the single easiest way to reduce costs. Most responses don't need 4,096 tokens — set it to 500-1000 and save 50-75% on output costs.

The Cost Formula

Cost per request =

(input_tokens ÷ 1,000,000 × input_price) + (output_tokens ÷ 1,000,000 × output_price)

Example: 1,000 input tokens + 500 output tokens on GPT-4o mini ($0.15/$0.60 per 1M):

Input cost:  1,000 ÷ 1,000,000 × $0.15 = $0.00015
Output cost:   500 ÷ 1,000,000 × $0.60 = $0.00030
Total per request:                        $0.00045

At 1,000 requests/day × 30 days = $13.50/month

Pricing Across 59 Models (Per 1M Tokens)

Model	Provider	Input	Output	Output/Input Ratio	Context
Gemini 2.5 Flash-Lite	Google	$0.075	$0.30	4.0x	1M
Llama 3.1 8B	Meta	$0.10	$0.10	1.0x	128K
Gemini 2.5 Flash-Lite	Google	$0.10	$0.40	4.0x	1M
Llama 4 Scout	Meta	$0.11	$0.34	3.1x	10M
DeepSeek V4 Flash	DeepSeek	$0.14	$0.28	2.0x	1M
GPT-4o mini	OpenAI	$0.15	$0.60	4.0x	128K
GPT-5 mini	OpenAI	$0.25	$2.00	8.0x	272K
Gemini 2.5 Pro	Google	$1.25	$10.00	8.0x	1M
GPT-5	OpenAI	$1.25	$10.00	8.0x	272K
Claude Haiku 4.5	Anthropic	$1.00	$5.00	5.0x	200K
Claude Sonnet 4.6	Anthropic	$3.00	$15.00	5.0x	1M
GPT-5.5	OpenAI	$5.00	$30.00	6.0x	1M
Claude Opus 4.8	Anthropic	$5.00	$25.00	5.0x	1M
Grok 4.3	xAI	$1.25	$2.50	2.0x	1M
GPT-5.5 Pro	OpenAI	$30.00	$180.00	6.0x	1M

Key observation: The cheapest input tokens (Gemini Flash Lite at $0.075) are 400x cheaper than the most expensive (GPT-5.5 Pro at $30.00). The output spread is even wider at 600x. Model choice is the single biggest lever for controlling costs.

How Tokens Add Up in Real Applications

Chatbot (Simple Q&A)

System prompt:     200 tokens (fixed instructions)
User message:      100 tokens (the question)
Model response:    300 tokens (the answer)
Total:             600 tokens per request

Cost on GPT-4o mini: $0.00027/request
Cost on GPT-5:       $0.00375/request (14x more)

RAG Pipeline (Search + Generate)

System prompt:      300 tokens
Retrieved context: 2,000 tokens (5 documents)
User question:      100 tokens
Model response:     500 tokens
Total:             2,900 tokens per request

Cost on GPT-4o mini: $0.00174/request
Cost on GPT-5:       $0.02415/request (14x more)

Coding Assistant

System prompt:      500 tokens (code instructions)
Code context:      3,000 tokens (file contents)
User instruction:   200 tokens
Model response:   1,500 tokens (code generation)
Total:            5,200 tokens per request

Cost on Claude Sonnet 4.6: $0.039/request
Cost on GPT-5.5:           $0.0725/request (1.9x more)

5 Ways to Reduce Your Token Costs

Shorter prompts: Remove unnecessary instructions, use concise system prompts. Every token in your prompt costs money.
Conversation pruning: Don't send 50 messages of history. Keep the last 5-10 and summarize the rest.
Output limits: Set max_tokens to what you actually need. Most chat responses don't need 4,096 tokens.
Model routing: Use cheap models for simple tasks, expensive ones for complex reasoning.
Prompt caching: OpenAI and Anthropic offer prompt caching — identical prefixes cost 50-90% less.

Calculate Your Costs

Don't guess — calculate. Enter your exact usage into our calculator to see what every model costs you per month.

See your exact costs across all 67 models

Enter your daily requests and token counts. Get instant cost comparisons sorted cheapest-first.

Try the Monthly Spend Estimator

🎯 Rate Your API Setup in 30 Seconds

Get an A+ to F grade on your AI API costs. See how you compare and find cheaper alternatives instantly.

Get Your Cost Score →

📊 Generate Your Personalized API Cost Report

Select your model, enter your monthly spend, and get a custom savings report with cheaper alternatives — free, in 60 seconds.

Try it free: APIpulse Cost Calculator — estimate your monthly spend across 67 models and 10 providers in 30 seconds.

Want to optimize your AI API costs?

APIpulse includes free cost comparisons, exports, and recommendations that can save you up to 40%.

Free Cost Audit →

This was a snapshot. What about next month?

Prices change. New models launch. Our tools catch what a one-time calculation can't — and saves you money every month.

Free Tools → 🔍 Free audit first