What is a token in LLM API pricing?

A token is a piece of a word that an AI model processes. Roughly, 1 token ≈ 0.75 words in English, or about 4 characters. For example, 'hello world' is 2 tokens. API providers charge per 1 million tokens — both for input (text you send) and output (text the model generates). Understanding tokens is essential for estimating your API costs.

What is the difference between input and output token pricing?

Input tokens are the text you send to the model (your prompt). Output tokens are the text the model generates (its response). Most providers charge more for output tokens — typically 3-10x more than input. For example, GPT-5 costs $1.25/M input but $10/M output. This is because generating output requires more computational resources than processing input.

What is a context window?

A context window is the maximum number of tokens a model can process in a single API call — including both your input and the model's output. For example, a 200K context window means you can send up to ~150K tokens of input and receive up to ~50K tokens of output (depending on the model's output limit). Larger context windows allow you to process longer documents, larger codebases, and more extensive conversation histories.

What does 'per 1M tokens' mean in LLM pricing?

LLM API pricing is typically quoted as cost per 1 million tokens. For example, '$3/M input' means $3 for every 1 million input tokens processed. To calculate your cost: (tokens used ÷ 1,000,000) × price per 1M. If you use 500K input tokens at $3/M, your cost is (500,000 ÷ 1,000,000) × $3 = $1.50.

LLM API Pricing Glossary

Every term you need to understand LLM API pricing — from tokens to context windows to rate limits. Know what you're paying for.

Last updated: Jun 9, 2026

Token

A token is the fundamental unit of text that an AI model processes. It's a piece of a word — roughly 1 token ≈ 0.75 words in English, or about 4 characters. Common words like "the" and "and" are 1 token. Longer or uncommon words may be split into 2-3 tokens.

Example: "Hello, world!" = 4 tokens. "The quick brown fox jumps over the lazy dog" = 9 tokens. A 500-page book ≈ 250,000 tokens. A typical email ≈ 300-500 tokens. A code file ≈ 5,000-20,000 tokens.

Input vs Output Pricing

LLM APIs charge separately for input tokens (text you send) and output tokens (text the model generates). Output tokens almost always cost more — typically 3-10x more than input — because generating text requires more computation than processing it.

Example: GPT-5 costs $1.25/M input but $10/M output (8x more). Claude Opus 4.8 costs $5/M input but $25/M output (5x more). DeepSeek V4 Flash costs $0.14/M input but $0.28/M output (2x more). The ratio varies by provider.

Per 1M Tokens (per Million Tokens)

The standard pricing unit for LLM APIs. Prices are quoted as cost per 1 million tokens. To calculate your cost: (tokens used ÷ 1,000,000) × price per 1M. This makes it easy to compare models — just look at the price per 1M tokens.

Example: If a model costs $3/M input and you use 500K input tokens, your cost is (500,000 ÷ 1,000,000) × $3 = $1.50. If you use 2M input tokens at $3/M, your cost is $6.00.

Context Window

The maximum number of tokens a model can process in a single API call — including both your input (prompt) and the model's output (response). A 200K context window means you can send up to ~200K tokens total. Larger context windows let you process longer documents, bigger codebases, and more conversation history without splitting content.

Example: Claude Opus 4.8 has a 200K context window (~150K words). DeepSeek V4 Flash has a 1M context window (~750K words). A 1M context can hold an entire novel, a large codebase, or hours of conversation history in a single API call.

TPS (Tokens per Second)

The speed at which a model generates output tokens. Higher TPS means faster responses. TPS is affected by the model's size, the provider's infrastructure, and the current load. Some providers offer "turbo" or "fast" modes that increase TPS at a higher price.

Example: A model generating 100 TPS will produce a 500-token response in about 5 seconds. A model at 50 TPS takes about 10 seconds for the same response. Speed matters for real-time applications like chatbots and code assistants.

RPM (Requests per Minute)

The maximum number of API calls you can make per minute. This is a rate limit imposed by the provider to prevent abuse and ensure fair usage. If you exceed RPM, you'll get a 429 (Too Many Requests) error. Higher-tier accounts or paid plans typically have higher RPM limits.

Example: If your RPM limit is 500, you can make up to 500 API calls per minute. For a chatbot handling 100 users with 1 request each per minute, you'd need RPM ≥ 100. For batch processing, you might need RPM ≥ 1000.

Rate Limits

Restrictions imposed by API providers on how many requests or tokens you can use within a given time period. Rate limits protect the provider's infrastructure and ensure fair usage across all customers. Common rate limit types include RPM (requests per minute), TPM (tokens per minute), and concurrent requests.

Example: OpenAI's GPT-5 has rate limits of 10,000 RPM and 2M TPM for Tier 5 accounts. Anthropic's Claude has limits based on your spending tier. If you hit a rate limit, you'll receive a 429 error and should implement exponential backoff in your code.

TPM (Tokens per Minute)

The maximum number of tokens you can process per minute across all your API calls. This includes both input and output tokens. TPM is often the more relevant limit for high-throughput applications because it accounts for the actual computational load.

Example: If your TPM limit is 2,000,000, you can process up to 2M tokens per minute. If each request uses 1,000 tokens, you can make 2,000 requests per minute. If each request uses 10,000 tokens, you can make 200 requests per minute.

Max Output Tokens

The maximum number of tokens a model can generate in a single response. This is separate from the context window — it's the output portion. If your max output is 8,192 tokens, the model can generate up to ~6,000 words in one response. Longer responses require multiple API calls or streaming.

Example: GPT-5 has max output of 16,384 tokens (~12,000 words). Claude Opus 4.8 has max output of 32,768 tokens (~24,000 words). For a 500-word response, you need max output ≥ ~670 tokens.

Pricing Tiers

Most providers offer different pricing tiers based on your usage volume or account type. Higher tiers typically offer lower per-token prices, higher rate limits, and access to premium features. Some providers also have free tiers with limited usage for testing and development.

Example: OpenAI has Free, Tier 1-5 tiers with increasing rate limits and decreasing prices. Anthropic has Usage Tiers 1-4 based on cumulative spending. Google Cloud has on-demand, committed use, and provisioned pricing tiers.

Cost per Request

The total cost of a single API call, calculated as: (input tokens × input price) + (output tokens × output price). This is the most practical metric for estimating your monthly costs — multiply cost per request by your expected number of requests.

Example: A request with 1,000 input tokens and 500 output tokens on GPT-5 ($1.25/M input, $10/M output) costs: (1,000 × $1.25/1M) + (500 × $10/1M) = $0.00125 + $0.005 = $0.00625 per request. At 10,000 requests/day, that's $62.50/day or $1,875/month.

Batch API

A discounted API tier for processing large volumes of requests asynchronously. Batch APIs typically offer 50% lower prices but process requests in the background (within hours, not seconds). Ideal for non-time-sensitive workloads like data processing, content generation, and analytics.

Example: OpenAI's Batch API processes requests within 24 hours at 50% off standard pricing. If you need to process 10M tokens of content generation overnight, the Batch API saves 50% compared to real-time API calls. Not suitable for chatbots or real-time applications.

Current Pricing at a Glance

Compare input and output pricing across popular models (per 1M tokens)

Model	Provider	Input	Output	Context

Calculate Your Exact Costs

Use our free calculator to estimate your monthly API spend across all 39 models from 10 providers.

Open Cost Calculator →