LLM API Pricing Glossary

Every term you need to understand LLM API pricing — from tokens to context windows to rate limits. Know what you're paying for.

Last updated: Jun 9, 2026

Quick Navigation

Token

A token is the fundamental unit of text that an AI model processes. It's a piece of a word — roughly 1 token ≈ 0.75 words in English, or about 4 characters. Common words like "the" and "and" are 1 token. Longer or uncommon words may be split into 2-3 tokens.

Example: "Hello, world!" = 4 tokens. "The quick brown fox jumps over the lazy dog" = 9 tokens. A 500-page book ≈ 250,000 tokens. A typical email ≈ 300-500 tokens. A code file ≈ 5,000-20,000 tokens.

See also: Per 1M Tokens, Input vs Output Pricing

Input vs Output Pricing

LLM APIs charge separately for input tokens (text you send) and output tokens (text the model generates). Output tokens almost always cost more — typically 3-10x more than input — because generating text requires more computation than processing it.

Example: GPT-5 costs $1.25/M input but $10/M output (8x more). Claude Opus 4.8 costs $5/M input but $25/M output (5x more). DeepSeek V4 Flash costs $0.14/M input but $0.28/M output (2x more). The ratio varies by provider.

See also: Tokens, Cost per Request

Per 1M Tokens (per Million Tokens)

The standard pricing unit for LLM APIs. Prices are quoted as cost per 1 million tokens. To calculate your cost: (tokens used ÷ 1,000,000) × price per 1M. This makes it easy to compare models — just look at the price per 1M tokens.

Example: If a model costs $3/M input and you use 500K input tokens, your cost is (500,000 ÷ 1,000,000) × $3 = $1.50. If you use 2M input tokens at $3/M, your cost is $6.00.

See also: Tokens, Cost per Request

Context Window

The maximum number of tokens a model can process in a single API call — including both your input (prompt) and the model's output (response). A 200K context window means you can send up to ~200K tokens total. Larger context windows let you process longer documents, bigger codebases, and more conversation history without splitting content.

Example: Claude Opus 4.8 has a 200K context window (~150K words). DeepSeek V4 Flash has a 1M context window (~750K words). A 1M context can hold an entire novel, a large codebase, or hours of conversation history in a single API call.

See also: Max Output Tokens, Tokens

TPS (Tokens per Second)

The speed at which a model generates output tokens. Higher TPS means faster responses. TPS is affected by the model's size, the provider's infrastructure, and the current load. Some providers offer "turbo" or "fast" modes that increase TPS at a higher price.

Example: A model generating 100 TPS will produce a 500-token response in about 5 seconds. A model at 50 TPS takes about 10 seconds for the same response. Speed matters for real-time applications like chatbots and code assistants.

See also: RPM, Rate Limits

RPM (Requests per Minute)

The maximum number of API calls you can make per minute. This is a rate limit imposed by the provider to prevent abuse and ensure fair usage. If you exceed RPM, you'll get a 429 (Too Many Requests) error. Higher-tier accounts or paid plans typically have higher RPM limits.

Example: If your RPM limit is 500, you can make up to 500 API calls per minute. For a chatbot handling 100 users with 1 request each per minute, you'd need RPM ≥ 100. For batch processing, you might need RPM ≥ 1000.

See also: TPM, Rate Limits

Rate Limits

Restrictions imposed by API providers on how many requests or tokens you can use within a given time period. Rate limits protect the provider's infrastructure and ensure fair usage across all customers. Common rate limit types include RPM (requests per minute), TPM (tokens per minute), and concurrent requests.

Example: OpenAI's GPT-5 has rate limits of 10,000 RPM and 2M TPM for Tier 5 accounts. Anthropic's Claude has limits based on your spending tier. If you hit a rate limit, you'll receive a 429 error and should implement exponential backoff in your code.

See also: RPM, TPM

TPM (Tokens per Minute)

The maximum number of tokens you can process per minute across all your API calls. This includes both input and output tokens. TPM is often the more relevant limit for high-throughput applications because it accounts for the actual computational load.

Example: If your TPM limit is 2,000,000, you can process up to 2M tokens per minute. If each request uses 1,000 tokens, you can make 2,000 requests per minute. If each request uses 10,000 tokens, you can make 200 requests per minute.

See also: RPM, Rate Limits

Max Output Tokens

The maximum number of tokens a model can generate in a single response. This is separate from the context window — it's the output portion. If your max output is 8,192 tokens, the model can generate up to ~6,000 words in one response. Longer responses require multiple API calls or streaming.

Example: GPT-5 has max output of 16,384 tokens (~12,000 words). Claude Opus 4.8 has max output of 32,768 tokens (~24,000 words). For a 500-word response, you need max output ≥ ~670 tokens.

See also: Context Window, Max Output Tokens

Pricing Tiers

Most providers offer different pricing tiers based on your usage volume or account type. Higher tiers typically offer lower per-token prices, higher rate limits, and access to premium features. Some providers also have free tiers with limited usage for testing and development.

Example: OpenAI has Free, Tier 1-5 tiers with increasing rate limits and decreasing prices. Anthropic has Usage Tiers 1-4 based on cumulative spending. Google Cloud has on-demand, committed use, and provisioned pricing tiers.

See also: Rate Limits, Batch API

Cost per Request

The total cost of a single API call, calculated as: (input tokens × input price) + (output tokens × output price). This is the most practical metric for estimating your monthly costs — multiply cost per request by your expected number of requests.

Example: A request with 1,000 input tokens and 500 output tokens on GPT-5 ($1.25/M input, $10/M output) costs: (1,000 × $1.25/1M) + (500 × $10/1M) = $0.00125 + $0.005 = $0.00625 per request. At 10,000 requests/day, that's $62.50/day or $1,875/month.

See also: Per 1M Tokens, Input vs Output Pricing

Batch API

A discounted API tier for processing large volumes of requests asynchronously. Batch APIs typically offer 50% lower prices but process requests in the background (within hours, not seconds). Ideal for non-time-sensitive workloads like data processing, content generation, and analytics.

Example: OpenAI's Batch API processes requests within 24 hours at 50% off standard pricing. If you need to process 10M tokens of content generation overnight, the Batch API saves 50% compared to real-time API calls. Not suitable for chatbots or real-time applications.

See also: Pricing Tiers, Rate Limits

Current Pricing at a Glance

Compare input and output pricing across popular models (per 1M tokens)

ModelProviderInputOutputContext

Calculate Your Exact Costs

Use our free calculator to estimate your monthly API spend across all 39 models from 10 providers.

Open Cost Calculator →