Gemini 3.1 Flash-Lite pricing, Gemini 3 Flash API cost, Gemini 2.5 Flash-Lite, new Gemini models June 2026, cheapest Gemini API, Google AI pricing 2026">

Blog · Jun 12, 2026

3 New Gemini Models: Flash-Lite, 3 Flash & 2.5 Flash-Lite — Pricing & Comparison

⚠️ Claude 4 Deprecation Alert: Claude 4 models retire on June 15, 2026 (). If you use Claude 4, see our last-chance migration guide or use the deprecation calculator.

Google dropped 3 new Gemini models this week — all with 1M context windows, all cheaper than you'd expect. Here's what each one costs and when to use it.

Google quietly released three new Gemini API models in June 2026, bringing the total Gemini lineup to 8 models. The big news: all three have 1M token context windows and aggressive budget pricing. If you're still using Gemini 2.0 Flash (now deprecated), it's time to upgrade.

The 3 New Models at a Glance

Model Tier Input Output Context Replaces
Gemini 2.5 Flash-Lite Budget $0.10 $0.40 1M Gemini 2.0 Flash-Lite
Gemini 3.1 Flash-Lite Budget $0.25 $1.50 1M
Gemini 3 Flash Budget $0.50 $3.00 1M Gemini 2.0 Flash

All prices per 1M tokens. Verified Jun 12, 2026. Gemini 2.0 Flash and Gemini 2.0 Flash Lite are now deprecated by Google.

1. Gemini 2.5 Flash-Lite — The Cheapest Option

At $0.10/$0.40 per 1M tokens, this is the cheapest Gemini model ever — and one of the cheapest LLM APIs on the market. It matches the deprecated Gemini 2.0 Flash pricing but adds a massive 1M token context window (up from 1M on 2.0 Flash, same size but newer architecture).

Best for: High-volume classification, content moderation, chatbot backends, embeddings preprocessing — anything where you're processing millions of tokens and cost is the primary concern.

Trade-off: As a "Lite" model, it may not handle complex reasoning as well as Gemini 3 Flash or Gemini 2.5 Pro. Use it for simple, high-volume tasks.

2. Gemini 3.1 Flash-Lite — The Production Budget Pick

At $0.25/$1.50 per 1M tokens, Gemini 3.1 Flash-Lite sits in a sweet spot: 2.5x more expensive than 2.5 Flash-Lite on input, but with better quality output (1.50 vs 0.40 — that's 3.75x more, reflecting better generation quality). The 1M context window handles long documents and codebases with ease.

Best for: Production workloads that need good quality at low cost — chatbots, content generation, summarization, RAG pipelines. The go-to replacement for deprecated Gemini 2.0 Flash Lite.

Trade-off: Output at $1.50/M is 3.75x the input cost. For output-heavy workloads, Gemini 2.5 Flash-Lite ($0.40 output) is cheaper — but lower quality.

3. Gemini 3 Flash — The Speed Demon

At $0.50/$3.00 per 1M tokens, Gemini 3 Flash is the replacement for deprecated Gemini 2.0 Flash. It's 5x more expensive on input ($0.50 vs $0.10) but brings the Gemini 3 architecture: faster inference, better multimodal support, and the full 1M context window.

Best for: Applications that need speed + quality at a reasonable price. Great for real-time chat, code generation, multimodal tasks (image + text), and Google Cloud workloads.

Trade-off: 5x the cost of Gemini 2.5 Flash-Lite. If you don't need the Gemini 3 architecture improvements, the cheaper models are fine.

Monthly Cost at 1,000 Requests/Day

Assuming 1,000 input tokens and 500 output tokens per request:

2.5 Flash-Lite

$4.00

3.1 Flash-Lite

$22.50

3 Flash

$45.00

Claude Haiku 4.5

$35.00

GPT-4o mini

$9.00

How These Compare to the Competition

Here's how the new Gemini budget models stack up against other cheap options:

  • Gemini 2.5 Flash-Lite vs DeepSeek V4 Flash: Gemini is cheaper on input ($0.10 vs $0.14) but more expensive on output ($0.40 vs $0.28). DeepSeek wins on output-heavy tasks.
  • Gemini 3.1 Flash-Lite vs Claude Haiku 4.5: Gemini is 75% cheaper on input ($0.25 vs $1.00) and 70% cheaper on output ($1.50 vs $5.00). Haiku has better reasoning, but for most tasks Gemini is the better value.
  • Gemini 3 Flash vs GPT-4o mini: GPT-4o mini is cheaper at $0.15/$0.60 vs Gemini 3 Flash at $0.50/$3.00. But Gemini 3 Flash has 1M context vs GPT-4o mini's 128K — 8x more context for 3x the price.
  • All 3 vs Mistral Small 4: Mistral Small 4 at $0.10/$0.30 is cheaper on input than Gemini 3.1 Flash-Lite ($0.25/$1.50) but has only 128K context vs 1M. For long documents, the new Gemini models win.

Which One Should You Use?

Absolute cheapest

Gemini 2.5 Flash-Lite — $0.10/$0.40, can't beat it

Best quality per dollar

Gemini 3.1 Flash-Lite — $0.25/$1.50 with 1M context

Speed + quality balance

Gemini 3 Flash — $0.50/$3.00, fastest inference

Need 1M context on a budget

Any of these — all have 1M windows

The Bigger Picture

These 3 new models push the APIpulse database to 42 models across 10 providers. The Gemini lineup now covers every price point from $0.10 to $12.50 per 1M input tokens — the widest range of any single provider.

The trend is clear: 1M context windows are becoming the standard, even at budget price points. If you're still using models with 128K context, you're paying for less capability than you could get.

With Claude 4 shutting down June 15, now is the perfect time to evaluate these new Gemini options. Use our deprecation calculator to see how much you'd save by switching.

Compare all 42 models side by side

Open Cost Calculator

Pricing data verified Jun 12, 2026. Prices per 1M tokens. See our full pricing table for all 42 models.

Want to optimize your AI API costs?

APIpulse Pro ($29 one-time) includes saved scenarios, cost report exports, and personalized recommendations that can save you up to 40%.

Get Pro — $29