What is the cheapest AI API for building a chatbot?

For chatbots, DeepSeek V4 Flash ($0.14/$0.28 per million tokens) is the cheapest option that maintains good conversation quality. A typical 10-turn chatbot conversation costs ~$0.0003 with DeepSeek V4 Flash vs $0.0038 with GPT-5 — a 92% savings. For basic FAQ bots where quality is less critical, Llama 3.1 8B ($0.10/$0.10) is even cheaper.

Which AI API is cheapest for code generation?

DeepSeek V4 Pro ($0.435/$0.87) is the best value for code generation — it matches GPT-5.3 Codex quality at 93% lower output cost. A typical code generation request (2K input, 4K output) costs $0.0039 with DeepSeek V4 Pro vs $0.058 with GPT-5.3 Codex. For simple completions, Gemini 3 Flash ($0.50/$3.00) is also competitive.

What is the cheapest AI API for RAG (Retrieval-Augmented Generation)?

RAG workloads are output-heavy, making DeepSeek V4 Pro ($0.435/$0.87) the cheapest quality option — a typical RAG query (10K context + 1K output) costs $0.0052. For high-volume RAG where you can tolerate slightly lower quality, Gemini 2.5 Flash-Lite ($0.10/$0.40) costs just $0.0008 per query — 84% cheaper than DeepSeek. The key is matching model quality to your RAG accuracy requirements.

Which AI API is cheapest for text summarization?

Summarization is input-heavy (long documents in, short summary out). Gemini 2.5 Flash-Lite ($0.10/$0.40) is the cheapest at $0.0005 per 5,000-word document. DeepSeek V4 Flash ($0.14/$0.28) is close at $0.0007. For high-quality summaries of complex documents, Claude Haiku 4.5 ($1.00/$5.00) costs $0.006 but delivers significantly better coherence.

What is the cheapest AI API for embeddings?

For dedicated embeddings, OpenAI's text-embedding-3-small ($0.02/1M tokens) and Cohere's embed-english-v3.0 ($0.10/1M tokens) are the standard choices. However, several providers now offer free or near-free embedding endpoints. For vector search at scale, Mistral Small 4 ($0.10/$0.30) can handle embedding + generation in a single call, eliminating a separate embedding API cost.

How do I choose the cheapest AI API for my specific use case?

The cheapest API depends on your input/output token ratio, quality requirements, and volume. Use the APIpulse Model Recommendation Engine — answer 4 questions about your use case, and it recommends the top 3 models with projected monthly costs. For exact calculations, the Cost Calculator lets you model your token usage across all 42 models.

Cheapest AI API by Use Case: Chatbots, Code Gen, RAG & More

42 models compared across 7 real-world workloads. Stop guessing — find the cheapest AI API for your exact use case with per-request cost breakdowns.

Published June 18, 2026 · 12 min read · Updated weekly

"What's the cheapest AI API?" is the wrong question. The right question is: "What's the cheapest AI API for what I'm building?"

A chatbot that sends short messages and gets short replies has completely different cost drivers than a RAG system that processes 10,000 tokens of context per query. The cheapest model for one workload can be 10× more expensive for another.

This guide breaks down the cheapest AI API for 7 common use cases — with real per-request costs calculated from current pricing data across all 42 models.

💡 Key insight: Output tokens cost 2-6× more than input tokens across every provider. The cheapest API for your use case depends on your input/output ratio — not just the per-token price.

🤖 Chatbots & Conversational AI

🏆 Winner: DeepSeek V4 Flash

Typical workload: 500 input tokens (system prompt + history) → 200 output tokens (reply) per turn. 10 turns per conversation.

Model	Input	Output	Cost/Conversation	Savings vs GPT-5
DeepSeek V4 Flash	$0.14/M	$0.28/M	$0.0003	↓ 92%
Llama 3.1 8B	$0.10/M	$0.10/M	$0.0002	↓ 95%
Gemini 2.5 Flash-Lite	$0.10/M	$0.40/M	$0.0003	↓ 92%
GPT-5 mini	$0.25/M	$2.00/M	$0.0017	↓ 55%
Claude Haiku 4.5	$1.00/M	$5.00/M	$0.0055	—
GPT-5	$1.25/M	$10.00/M	$0.0038	baseline

Verdict: For chatbots, DeepSeek V4 Flash delivers solid conversation quality at $0.0003/conversation — 92% cheaper than GPT-5. Llama 3.1 8B is even cheaper but with noticeably lower response quality for complex conversations. If you need GPT-5-level quality, DeepSeek V4 Pro ($0.435/$0.87) is still 85% cheaper.

When to spend more: Customer-facing chatbots handling sensitive topics (healthcare, finance) benefit from Claude Haiku 4.5 or GPT-5 mini's better instruction-following and safety guardrails.

💻 Code Generation & Completion

🏆 Winner: DeepSeek V4 Pro

Typical workload: 2,000 input tokens (prompt + context) → 4,000 output tokens (generated code). Code generation is output-heavy — output tokens dominate cost.

Model	Input	Output	Cost/Request	Savings vs Codex
DeepSeek V4 Pro	$0.435/M	$0.87/M	$0.0039	↓ 93%
DeepSeek V4 Flash	$0.14/M	$0.28/M	$0.0013	↓ 98%
Mistral Large 3	$0.50/M	$1.50/M	$0.0069	↓ 88%
Gemini 3 Flash	$0.50/M	$3.00/M	$0.013	↓ 78%
Claude Sonnet 4.6	$3.00/M	$15.00/M	$0.066	—
GPT-5.3 Codex	$1.75/M	$14.00/M	$0.0595	baseline

Verdict: DeepSeek V4 Pro is the clear winner for code generation — 93% cheaper than GPT-5.3 Codex with comparable code quality for most languages. Its output token pricing ($0.87/M) is absurdly cheap for code workloads where you're generating thousands of tokens per request.

When to spend more: Complex multi-file refactoring or code requiring deep reasoning about large codebases benefits from Claude Sonnet 4.6's superior context handling. For simple completions and boilerplate, DeepSeek V4 Flash at $0.0013/request is unbeatable.

📚 RAG (Retrieval-Augmented Generation)

🏆 Winner: DeepSeek V4 Pro (quality) / Gemini 2.5 Flash-Lite (volume)

Typical workload: 10,000 input tokens (retrieved context) → 1,000 output tokens (answer). RAG is input-heavy — large context windows, shorter outputs.

Model	Input	Output	Cost/Query	Savings vs GPT-5
Gemini 2.5 Flash-Lite	$0.10/M	$0.40/M	$0.0014	↓ 95%
DeepSeek V4 Flash	$0.14/M	$0.28/M	$0.0017	↓ 94%
DeepSeek V4 Pro	$0.435/M	$0.87/M	$0.0052	↓ 83%
Gemini 3 Flash	$0.50/M	$3.00/M	$0.008	↓ 74%
Claude Haiku 4.5	$1.00/M	$5.00/M	$0.015	↓ 50%
GPT-5	$1.25/M	$10.00/M	$0.0225	baseline

Verdict: RAG's input-heavy nature makes cheap input tokens critical. Gemini 2.5 Flash-Lite ($0.10/M input) is 95% cheaper than GPT-5 for RAG queries. If you need higher answer quality, DeepSeek V4 Pro at $0.0052/query is still 83% cheaper than GPT-5 with better reasoning on complex retrieved context.

Pro tip: For high-volume RAG (10K+ queries/day), consider a tiered approach — route simple factual queries to Flash-Lite and complex analytical queries to DeepSeek V4 Pro. This hybrid approach can cut costs by 90%+ while maintaining quality where it matters.

📝 Text Summarization

🏆 Winner: Gemini 2.5 Flash-Lite

Typical workload: 5,000 input tokens (document) → 300 output tokens (summary). Summarization is the most input-heavy common workload.

Model	Input	Output	Cost/Document	Savings vs GPT-5
Gemini 2.5 Flash-Lite	$0.10/M	$0.40/M	$0.0006	↓ 91%
DeepSeek V4 Flash	$0.14/M	$0.28/M	$0.0008	↓ 88%
Llama 3.1 8B	$0.10/M	$0.10/M	$0.0005	↓ 92%
Mistral Small 4	$0.10/M	$0.30/M	$0.0006	↓ 91%
Claude Haiku 4.5	$1.00/M	$5.00/M	$0.0066	—
GPT-5	$1.25/M	$10.00/M	$0.0093	baseline

Verdict: Summarization is input-dominated, making cheap input tokens everything. Gemini 2.5 Flash-Lite at $0.10/M input is 91% cheaper than GPT-5. For summarizing 1,000 documents/day, you're looking at $0.60/day vs $9.25/day — saving $260/month on a single workload.

Quality note: For simple extractive summaries (pull key points), Flash-Lite is excellent. For abstractive summaries requiring deep understanding (rephrase, synthesize, analyze), Claude Haiku 4.5 produces noticeably better results at 10× the cost — still far cheaper than GPT-5.

🔢 Embeddings & Vector Search

🏆 Winner: OpenAI text-embedding-3-small

Typical workload: 500 input tokens per document, no output tokens. Pure embedding generation for vector databases, semantic search, and classification.

Model	Price	Cost/1M Docs	Notes
OpenAI text-embedding-3-small	$0.02/M tokens	$10	1536 dimensions, great quality
Cohere embed-english-v3.0	$0.10/M tokens	$50	1024 dimensions, excellent for search
Mistral Small 4 (as embedder)	$0.10/M input	$50	Can do embed + generate in one call
Voyage AI voyage-3	$0.06/M tokens	$30	1024 dimensions, strong retrieval

Verdict: OpenAI's embedding model is the cheapest dedicated option at $0.02/M tokens. For a database of 1M documents (500 tokens each), embedding costs just $10. The real cost of embeddings is usually the vector database hosting, not the embedding API.

Pro tip: If you're already using a chat/completion model for RAG, some providers (Mistral, Cohere) let you use the same model for both embedding and generation — simplifying your stack and potentially reducing API calls.

✍️ Content Generation (Marketing, Copywriting)

🏆 Winner: DeepSeek V4 Pro

Typical workload: 500 input tokens (brief) → 2,000 output tokens (article/copy). Content generation is output-heavy with high creative requirements.

Model	Input	Output	Cost/Piece	Quality Rating
DeepSeek V4 Pro	$0.435/M	$0.87/M	$0.002	⭐⭐⭐⭐
Gemini 3 Flash	$0.50/M	$3.00/M	$0.0063	⭐⭐⭐⭐
Claude Sonnet 4.6	$3.00/M	$15.00/M	$0.0315	⭐⭐⭐⭐⭐
GPT-5	$1.25/M	$10.00/M	$0.0203	⭐⭐⭐⭐½
Claude Opus 4.8	$5.00/M	$25.00/M	$0.051	⭐⭐⭐⭐⭐

Verdict: DeepSeek V4 Pro at $0.002 per piece of content is 90% cheaper than GPT-5 and produces surprisingly good marketing copy. For brand-sensitive content where tone and voice matter most, Claude Sonnet 4.6 is worth the 15× premium — its writing quality is noticeably more natural and engaging.

Volume math: If you're generating 100 pieces of content/month, DeepSeek V4 Pro costs $0.20. Claude Sonnet 4.6 costs $3.15. GPT-5 costs $2.03. The quality difference between DeepSeek and GPT-5 is much smaller than the price difference.

🔍 Data Extraction & Structured Output

🏆 Winner: Gemini 3 Flash

Typical workload: 3,000 input tokens (document) → 500 output tokens (extracted JSON/data). Balanced input/output, but requires reliable structured output formatting.

Model	Input	Output	Cost/Extraction	JSON Reliability
Gemini 3 Flash	$0.50/M	$3.00/M	$0.003	⭐⭐⭐⭐⭐
DeepSeek V4 Pro	$0.435/M	$0.87/M	$0.0017	⭐⭐⭐⭐
GPT-5 mini	$0.25/M	$2.00/M	$0.0018	⭐⭐⭐⭐⭐
Claude Haiku 4.5	$1.00/M	$5.00/M	$0.0055	⭐⭐⭐⭐⭐
GPT-5	$1.25/M	$10.00/M	$0.0088	⭐⭐⭐⭐⭐

Verdict: DeepSeek V4 Pro is cheapest per-extraction ($0.0017) but Gemini 3 Flash ($0.003) has better structured output reliability — critical when you're parsing extracted JSON in production. GPT-5 mini ($0.0018) offers excellent JSON reliability at near-DeepSeek prices.

Reliability tip: For data extraction, JSON validity matters more than raw cost. A model that produces invalid JSON 5% of the time costs you in retry logic, error handling, and downstream failures. Pay the small premium for models with proven structured output (Gemini 3 Flash, GPT-5 mini, Claude Haiku 4.5).

The Input/Output Ratio Rule

The biggest mistake developers make when choosing an AI API: comparing models by per-token price without considering their workload's input/output ratio.

Here's why it matters:

Input-heavy workloads (RAG, summarization, document analysis): Prioritize cheap input tokens. Gemini Flash-Lite ($0.10/M) and DeepSeek V4 Flash ($0.14/M) win.
Output-heavy workloads (code generation, content creation, chatbot replies): Prioritize cheap output tokens. DeepSeek V4 Pro ($0.87/M output) dominates.
Balanced workloads (data extraction, classification, simple Q&A): Look at the blended cost. Gemini 3 Flash and Mistral Small 4 are strong here.

🧮 Quick formula: Monthly cost = (monthly input tokens × input price) + (monthly output tokens × output price). A model with $0.10 input / $10.00 output is NOT cheaper than $1.00 input / $1.00 output if your workload is 50/50. Do the math.

Cost Comparison by Monthly Volume

Here's what your monthly bill looks like across 3 common workload profiles, at different volumes:

🤖 Chatbot (500 in / 200 out per conversation, 10 turns)

Volume	DeepSeek V4 Flash	GPT-5 mini	GPT-5	Savings
1K conversations/mo	$0.30	$1.70	$3.80	$3.50/mo
10K conversations/mo	$3.00	$17.00	$38.00	$35/mo
100K conversations/mo	$30.00	$170.00	$380.00	$350/mo

💻 Code Gen (2K in / 4K out per request)

Volume	DeepSeek V4 Pro	Claude Sonnet 4.6	GPT-5.3 Codex	Savings
1K requests/mo	$3.90	$66.00	$59.50	$55.60/mo
10K requests/mo	$39.00	$660.00	$595.00	$556/mo
50K requests/mo	$195.00	$3,300.00	$2,975.00	$2,780/mo

📚 RAG (10K in / 1K out per query)

Volume	Flash-Lite	DeepSeek V4 Pro	GPT-5	Savings
1K queries/mo	$1.40	$5.20	$22.50	$21.10/mo
10K queries/mo	$14.00	$52.00	$225.00	$211/mo
100K queries/mo	$140.00	$520.00	$2,250.00	$2,110/mo

Find the cheapest AI API for your exact use case

Don't guess — calculate. The APIpulse Recommendation Engine analyzes your use case, quality needs, and volume to recommend the top 3 models with projected monthly costs.

Find My Model →
Open Cost Calculator Get Pro — $29

The Hidden Costs Nobody Talks About

Per-token pricing is only part of the equation. These hidden costs can dwarf your API bill:

1. Latency vs Throughput Tradeoff

Cheaper models often have higher latency. If your chatbot takes 5 seconds to respond instead of 1 second, you lose users. Factor in the cost of lost conversions when choosing "the cheapest" option.

2. Retry Costs

Models with lower structured output reliability (some open-weight models) require JSON retry logic. A 5% retry rate on 100K requests/month = 5,000 extra API calls. That's a hidden 5% cost increase.

3. Context Window Waste

If you're paying for a 1M context window but only using 10K tokens, you're not wasting money on the unused context — but you are wasting money if a cheaper model with a smaller context window would suffice.

4. Prompt Engineering Overhead

Cheaper models often need more detailed prompts to match quality. If your engineers spend 10 extra hours/month tweaking prompts to save $50 on API costs, you're losing money.

How to Switch Models (Without Breaking Everything)

Found a cheaper model? Here's how to switch safely:

A/B test first — Route 10% of traffic to the new model, compare quality metrics (user ratings, task completion, error rates).
Use the same prompt — Most modern models handle similar prompt formats. Test with your existing prompts before rewriting.
Monitor output distribution — If the new model produces longer/shorter outputs, your downstream systems might break.
Keep a fallback — Route failed requests to your original model. The cost of a failed request far exceeds the savings from a cheaper model.
Track per-model costs separately — Use APIpulse's calculator to model costs before switching, then verify against real usage.

Cheapest AI API by Use Case: Chatbots, Code Gen, RAG & More

🤖 Chatbots & Conversational AI

💻 Code Generation & Completion

📚 RAG (Retrieval-Augmented Generation)

📝 Text Summarization

🔢 Embeddings & Vector Search

✍️ Content Generation (Marketing, Copywriting)

🔍 Data Extraction & Structured Output

The Input/Output Ratio Rule

Cost Comparison by Monthly Volume

🤖 Chatbot (500 in / 200 out per conversation, 10 turns)

💻 Code Gen (2K in / 4K out per request)

📚 RAG (10K in / 1K out per query)

Find the cheapest AI API for your exact use case

The Hidden Costs Nobody Talks About

1. Latency vs Throughput Tradeoff

2. Retry Costs

3. Context Window Waste

4. Prompt Engineering Overhead

How to Switch Models (Without Breaking Everything)

Related Posts