How much does GPT-oss cost?

GPT-oss 120B costs $0.15/$0.60 per 1M tokens via API providers. GPT-oss 20B costs $0.08/$0.35. Both are free to self-host.

How does GPT-oss compare to Llama 4?

GPT-oss and Llama 4 are both open-source options. GPT-oss benefits from OpenAI's training methodology. Llama 4 offers a larger community and more fine-tuned variants.

Is GPT-oss cheaper than commercial APIs?

Yes, GPT-oss is significantly cheaper than commercial APIs like GPT-5 ($1.25/$10) and Claude Sonnet ($3/$15). Self-hosting eliminates per-token costs entirely.

🔥 Limited time: Pro lifetime access $29 — price goes up July 12 →

Budget May 10, 2026 6 min read

OpenAI GPT-oss Pricing: Open-Source Models at $0.08/1M Tokens

OpenAI enters the open-source API market with two models priced to compete with Llama and DeepSeek. Here's what they cost and when to use them.

🚨 Claude 4 retired June 15: See all 42 alternatives, calculate your savings, and get migration code on our Claude 4 Migration Hub.

⚠️ Deprecation alert: Claude 4 Opus and Claude Sonnet 4 retired on June 15, 2026. If you're using these models, see our migration guide for step-by-step instructions.

Pricing at a Glance

GPT-oss 120B

$0.15 / $0.60

Input / Output per 1M tokens

128K context window

GPT-oss 20B

$0.08 / $0.35

Input / Output per 1M tokens

128K context window

OpenAI's GPT-oss models are a departure from the company's typical closed-source approach. These are open-weight models available through OpenAI's API at budget-tier pricing — designed to compete directly with Meta's Llama and DeepSeek's V4 lineup.

The 120B model is priced identically to GPT-4o mini ($0.15/$0.60), while the 20B model undercuts almost everything on the market at $0.08/$0.35.

How GPT-oss Compares to Competitors

Model	Input (per 1M)	Output (per 1M)	Context	Type
GPT-oss 20B	$0.08	$0.35	128K	Open-weight
GPT-oss 120B	$0.15	$0.60	128K	Open-weight
Llama 3.1 8B (Together.ai)	$0.10	$0.10	128K	Open-weight
Llama 3.1 70B (Together.ai)	$0.88	$0.88	128K	Open-weight
DeepSeek V4 Flash	$0.14	$0.28	1M	Closed
DeepSeek V4 Pro	$0.44	$0.87	1M	Closed
GPT-4o mini	$0.15	$0.60	128K	Closed
Mistral Small 4	$0.10	$0.30	128K	Closed

Key Takeaway

GPT-oss 120B is priced identically to GPT-4o mini and Mistral Small 4. The 20B model is the cheapest OpenAI model available, undercutting even Llama 3.1 8B on input pricing. However, Llama 3.1 8B has cheaper output tokens ($0.10 vs $0.35), which matters for generation-heavy workloads.

Monthly Cost Scenarios

Here's what you'd pay for common usage patterns:

Scenario	GPT-oss 120B	GPT-oss 20B	GPT-4o mini	DeepSeek V4 Flash
100K req/day, 2K in / 500 out	$135/mo	$68/mo	$135/mo	$95/mo
1M req/day, 1K in / 200 out	$450/mo	$240/mo	$450/mo	$360/mo
10M req/day, 500 in / 100 out	$2,250/mo	$1,200/mo	$2,250/mo	$1,800/mo

At high volume, GPT-oss 20B saves $1,050/mo over GPT-4o mini for the same workload. That's real money for startups burning through API budgets.

When to Use GPT-oss

High-volume, low-complexity tasks: Classification, routing, simple Q&A, content moderation
Batch processing: When you need to process millions of documents cheaply
Prototyping: Test ideas without burning through expensive API credits
Self-hosting option: As open-weight models, you can also self-host for even lower costs at scale

When to Avoid GPT-oss

Complex reasoning: The 20B model especially may underperform on multi-step logic
Long context: 128K context is adequate but not competitive with Gemini's 1M or DeepSeek V4's 1M
Code generation: GPT-5.3 Codex or Claude Sonnet 4 will produce better code
Critical applications: For production systems where quality matters more than cost, stick with proven models

Calculate your exact savings: Compare GPT-oss against your current model to see how much you'd save.

Try the APIpulse Calculator

🔍 Free Cost Audit — See if you're overpaying for AI APIs

The Bigger Picture

OpenAI entering the open-weight market signals that the budget LLM tier is getting crowded. With GPT-oss, Llama 4, DeepSeek V4, Mistral Small, and Gemini Flash all competing under $0.20/1M input tokens, developers have more affordable options than ever.

The real winner isn't any single model — it's the downward pressure on pricing across the board. Use APIpulse to find the cheapest option for your specific workload.