Together.ai API Pricing
Open-source Meta Llama models with managed API hosting — Llama 4 Scout, Maverick, and Llama 3.1
Why Together.ai?
Together.ai provides managed inference for Meta's open-source Llama models, eliminating the need for your own GPU infrastructure.
Open-Source Freedom
Meta's Llama models are fully open-weight. Together.ai provides managed inference so you don't need GPU infrastructure.
Massive Context
Llama 4 models support up to 10M context windows, dwarfing most commercial alternatives. Process entire codebases or document libraries.
Lowest Prices
Starting at $0.11/$0.34 per 1M tokens for Llama 4 Scout. Among the cheapest quality models available anywhere.
Self-Host Option
Not locked in. Deploy the same models on your own GPUs if you outgrow managed inference.
All Together.ai Models
Pricing per 1M tokens, as of May 2026.
| Model | Tier | Input (per 1M tokens) | Output (per 1M tokens) | Context Window |
|---|---|---|---|---|
| Llama 4 Scout | Budget | $0.11 | $0.34 | 10M |
| Llama 4 Maverick | Budget | $0.20 | $0.60 | 10M |
| Llama 3.1 70B | Mid | $0.88 | $0.88 | 128K |
| Llama 3.1 8B | Budget | $0.18 | $0.18 | 128K |
Which Together.ai Model Should You Use?
Match your use case to the right model and tier.
High-Throughput Tasks
$0.11/$0.34 per 1M tokens with 10M context. Cheapest quality model available. Best for high-throughput, classification, and summarization workloads.
Code Generation & Analysis
$0.20/$0.60 per 1M tokens with 10M context. Balanced performance for code generation, analysis, and complex reasoning tasks.
General-Purpose & RAG
$0.88/$0.88 per 1M tokens with 128K context. Proven and stable. Best for general-purpose, RAG, and chatbot applications.
Edge Deployment
$0.18/$0.18 per 1M tokens with 128K context. Ultra-lightweight. Best for edge deployment and quick inference scenarios.
Together.ai Cost Calculator
Estimate your monthly spend for any Together.ai model.
Monthly Cost Estimate
Based on published Together.ai API pricing. Actual costs may vary.
Llama 4 Scout vs Budget Tier
How does Together.ai's cheapest model compare to other budget options?
| Provider | Model | Input / Output |
|---|---|---|
| Together.ai | Llama 4 Scout | $0.11 / $0.34 |
| DeepSeek | V4 Flash | $0.14 / $0.28 |
| Gemini 2.0 Flash | $0.10 / $0.40 | |
| OpenAI | GPT-oss 20B | $0.08 / $0.35 |
| OpenAI | GPT-4o mini | $0.15 / $0.60 |
Prices per 1M tokens. Llama 4 Scout offers the best value with 10M context at budget pricing.
Llama 4 Maverick vs Mid Tier
How does Together.ai's balanced model compare to mid-range alternatives?
| Provider | Model | Input / Output |
|---|---|---|
| Together.ai | Llama 4 Maverick | $0.20 / $0.60 |
| OpenAI | GPT-4o | $2.50 / $10.00 |
| Anthropic | Claude Sonnet 4 | $3.00 / $15.00 |
| DeepSeek | V4 Pro | $2.18 / $8.72 |
Prices per 1M tokens. Llama 4 Maverick dramatically undercuts GPT-4o and Claude Sonnet 4 while offering 10M context.
See how Together.ai models stack up against OpenAI, Anthropic, and more
Related Reading
Deep dives on open-source LLM pricing, startup costs, and RAG pipelines.
Open Source vs Commercial LLMs
Compare open-source models like Llama against commercial APIs from OpenAI and Anthropic. When does managed inference make sense?
PricingThe Cheapest LLM APIs in 2026
Ranking every LLM API by cost per token. See which providers offer the best value for budget-conscious teams.
StartupsBest LLM APIs for Startups
Which LLM providers give startups the best combination of price, performance, and scalability?
RAGThe True Cost of RAG
Break down the costs of RAG pipelines: embedding, vector search, and generation. Compare Llama models against alternatives.
Calculate Your Together.ai Costs
Enter your usage, pick a model, and see exactly what you'll spend per month.
Try the APIpulse CalculatorGet notified when API prices change
No spam. Only pricing updates and new features. Unsubscribe anytime.