Blog · Jun 7, 2026

5 New AI Models Added in June 2026: Pricing, Context Windows & When to Use Each

The LLM landscape just got 5 new options — from DeepSeek V3.2 at $0.23/M input to Cohere Command A at $2.50/M. Here's what each one costs and when to use it.

June 2026 brought 5 new models to the APIpulse pricing database, bringing our total to 39 models across 10 providers. Whether you're building a chatbot, running RAG pipelines, or optimizing costs at scale, there's likely a new option that fits your budget better than what you're using today.

The 5 New Models at a Glance

Model Provider Input Output Context
DeepSeek V3.2 DeepSeek $0.23 $0.34 128K
Gemini 3.5 Flash Google $1.50 $9.00 1M
Mistral Medium 3.5 Mistral $1.50 $7.50 128K
AI21 Jamba 1.7 Large AI21 $2.00 $8.00 256K
Cohere Command A Cohere $2.50 $10.00 128K

All prices per 1M tokens. Verified Jun 7, 2026.

1. DeepSeek V3.2 — The Budget King

At $0.23/$0.34 per 1M tokens, DeepSeek V3.2 is the cheapest new model by a wide margin. It's 6x cheaper on input than the next cheapest new model (Gemini 3.5 Flash at $1.50) and 26x cheaper on output.

Best for: High-volume classification, content moderation, chatbot backends, code autocomplete — anything where you're processing millions of tokens and cost is the primary concern.

Trade-off: 128K context (not 1M). If you need to process entire codebases or long documents in a single prompt, you'll need a model with a larger window.

2. Gemini 3.5 Flash — Google's Speed Demon

Google's latest Flash model at $1.50/$9.00 per 1M tokens with a massive 1M token context window. It's designed for speed — fast inference, low latency, and native multimodal support.

Best for: Google Cloud workloads, enterprise applications needing SOC 2/HIPAA compliance, multimodal tasks (image + text), and long-document processing.

Trade-off: Output pricing at $9.00/M is steep. If your workload is output-heavy (long responses, content generation), costs add up fast.

3. Mistral Medium 3.5 — Europe's Mid-Tier Contender

Mistral's Medium 3.5 at $1.50/$7.50 per 1M tokens is priced identically to Gemini 3.5 Flash on input but 17% cheaper on output. The key differentiator: European data sovereignty.

Best for: EU-based companies with GDPR requirements, applications needing European data residency, and teams that want a mid-tier model without US cloud lock-in.

Trade-off: 128K context vs Gemini's 1M. If context window size matters, Gemini wins.

4. AI21 Jamba 1.7 Large — The Hybrid Architecture

AI21's Jamba 1.7 Large at $2.00/$8.00 per 1M tokens uses a hybrid Transformer-Mamba architecture, which means faster inference on long sequences. The 256K context is a solid middle ground.

Best for: Long-document summarization, enterprise RAG pipelines, and applications where inference speed on long contexts matters more than raw cost.

Trade-off: More expensive than DeepSeek and Mistral. The hybrid architecture shines on long sequences but may not be cost-competitive for short prompts.

5. Cohere Command A — Enterprise RAG Specialist

Cohere's Command A at $2.50/$10.00 per 1M tokens is the most expensive new model, but it's optimized for enterprise RAG and retrieval-heavy workloads. Cohere's embedding + generation ecosystem is tightly integrated.

Best for: Enterprise RAG pipelines using Cohere Embed, applications needing strong grounding and citation, and teams already in the Cohere ecosystem.

Trade-off: Highest cost of the 5 new models. Only makes sense if you're leveraging Cohere's full stack (embed + rerank + generate).

Monthly Cost at 1,000 Requests/Day

Assuming 1,000 input tokens and 500 output tokens per request:

DeepSeek V3.2

$12.00

Mistral Medium

$157.50

Gemini 3.5 Flash

$180.00

Jamba 1.7

$180.00

Command A

$225.00

How These Compare to Existing Models

Here's how the new models stack up against the popular incumbents:

  • DeepSeek V3.2 vs GPT-5: 97% cheaper on input ($0.23 vs $7.50). GPT-5 has better reasoning, but for classification/summarization, DeepSeek is a no-brainer.
  • Gemini 3.5 Flash vs Claude Sonnet 4.6: 50% cheaper on input ($1.50 vs $3.00), 40% cheaper on output ($9.00 vs $15.00). Gemini has 1M context vs Claude's 1M — similar specs, lower price.
  • Mistral Medium 3.5 vs Claude Sonnet 4.6: Same input price ($1.50 vs $3.00 — actually 50% cheaper), 50% cheaper output. European alternative with comparable capabilities.
  • AI21 Jamba 1.7 vs Gemini 2.5 Pro: Similar pricing ($2.00 vs $1.25 input), but Jamba's hybrid architecture may be faster on long contexts.
  • Cohere Command A vs GPT-5: 67% cheaper on input ($2.50 vs $7.50). Best for teams already using Cohere's embedding/reranking stack.

Which One Should You Use?

Cost is #1 priority

DeepSeek V3.2 — no contest at $0.23/$0.34

Need 1M context

Gemini 3.5 Flash — only new model with 1M window

EU data sovereignty

Mistral Medium 3.5 — European provider, GDPR-native

Enterprise RAG

Cohere Command A — built for retrieval + generation

The Bigger Picture

These 5 models push the APIpulse database to 39 models across 10 providers. The trend is clear: prices keep falling, context windows keep growing, and developers have more choices than ever.

The best strategy in 2026 isn't picking one model — it's using a multi-model approach. Use DeepSeek V3.2 for high-volume cheap tasks, Gemini 3.5 Flash for long-context work, and GPT-5 or Claude Opus 4.8 for complex reasoning. Our decision tree can help you pick.

Compare all 39 models side by side

Open Cost Calculator

Pricing data verified Jun 7, 2026. Prices per 1M tokens. See our full pricing table for all 39 models.