How much does the OpenAI embedding API cost?

OpenAI embedding pricing: text-embedding-3-small costs $0.02 per 1M tokens, text-embedding-3-large costs $0.13 per 1M tokens, and text-embedding-ada-002 costs $0.10 per 1M tokens. The small model offers the best value for most use cases.

What is the cheapest embedding API?

OpenAI text-embedding-3-small at $0.02 per 1M tokens is the cheapest paid embedding API. Google's text-embedding-004 offers a free tier for low-volume use. Cohere embed-v3 at $0.10 per 1M tokens offers the best multilingual support at a competitive price.

How much does it cost to embed 1 million documents?

Assuming 500 tokens per document (roughly 375 words), embedding 1 million documents costs: OpenAI small: $10, OpenAI large: $65, Cohere v3: $50, Google v4: Free tier or ~$0.50. Re-embedding for model upgrades multiplies these costs.

Which embedding model is best for RAG?

For English-only RAG, OpenAI text-embedding-3-large ($0.13/1M) offers the best quality-per-dollar with 3072 dimensions. For multilingual RAG, Cohere embed-v3 ($0.10/1M) supports 100+ languages. For budget RAG, OpenAI text-embedding-3-small ($0.02/1M) delivers 90% of the quality at 85% less cost.

How do I reduce embedding API costs?

5 ways to reduce embedding costs: 1) Use text-embedding-3-small instead of large (85% savings). 2) Batch API calls (up to 2048 inputs per request). 3) Use shorter chunks (256 tokens vs 1000). 4) Cache embeddings to avoid re-embedding. 5) Use dimension reduction (1536d instead of 3072d) for text-embedding-3-large.

Embedding API Cost Calculator

Compare embedding costs across OpenAI, Cohere, and Google. Estimate RAG pipeline costs, document indexing spend, and find the cheapest embedding model for your use case.

Document size:

By volume:

Embedding Model

Tokens per document ~375 words per 500 tokens

Number of documents

Queries per day Each query is ~100 tokens

Embedding Cost Estimate

Indexing cost (one-time) $0.00

Query cost per request $0.000000

Daily query cost $0.00

Monthly embedding cost $0.00

Annual embedding cost $0.00

Cost per document $0.000000

Total tokens to embed 0

All Embedding Models — Cost Comparison

See how your costs compare across all embedding models with your current settings

RAG Pipeline Cost Calculator

Estimate total RAG costs: embedding + retrieval + generation in one view

Embedding Model

Generation Model

Documents in vector DB

RAG queries per day

Context tokens per query Retrieved chunks sent to generator

Output tokens per response

RAG Pipeline Monthly Cost

Embedding cost (queries) $0.00

Generation input cost $0.00

Generation output cost $0.00

Total RAG monthly cost $0.00

Cost per RAG query $0.000000

Cost Breakdown

Embedding % of total 0%

Generation % of total 0%

Indexing cost (one-time) $0.00

Embedding API Pricing Explained

Embedding models convert text into numerical vectors for similarity search. Unlike chat/completion models, embedding APIs only charge for input tokens — there's no output cost. This makes embedding significantly cheaper than generation, but costs scale linearly with your document count.

Embedding Model Comparison

OpenAI text-embedding-3-small ($0.02/1M tokens): Best value. 1536 dimensions. Supports dimension reduction to 512d for further savings. Ideal for most English RAG applications.
OpenAI text-embedding-3-large ($0.13/1M tokens): Best quality. 3072 dimensions (reducible to 256d). 10% better retrieval accuracy than small. Worth it for high-stakes search.
Cohere embed-v3 ($0.10/1M tokens): Best multilingual. 1024 dimensions. Supports 100+ languages. Built-in compression. Best for non-English RAG.
Google text-embedding-004: Free tier for low volume. 768 dimensions. Good for prototyping and small projects.

How to Reduce Embedding Costs

Use text-embedding-3-small: 85% cheaper than large with 90% of the quality. Start here and upgrade only if retrieval quality is insufficient.
Reduce dimensions: text-embedding-3-large supports 256d/512d/1024d/1536d/3072d. 1024d is often sufficient with 60% cost reduction.
Optimize chunk size: 256-512 token chunks balance retrieval quality with embedding cost. Smaller chunks = more documents = more embedding calls.
Batch API calls: Embed up to 2048 inputs per request. Reduces API overhead and can improve throughput.
Cache embeddings: Store embeddings in your vector DB. Only re-embed when documents change. Avoid re-embedding on every query.

Embedding vs Generation Cost

In a typical RAG pipeline, embedding costs are 5-15% of total API spend. Generation (the LLM call) dominates costs. However, at scale with millions of documents, one-time indexing costs can be significant. For 10M documents at 500 tokens each, OpenAI small costs $10 to index — but generation at 1K queries/day costs $150+/month.

Related Tools

Best AI Model for RAG — Interactive RAG cost calculator + model rankings
AI API Cost Calculator — Compare generation model costs
RAG Cost Calculator — Full RAG pipeline cost estimation
Cost Explorer — See all 67 models ranked by cost
Token Estimator — Count tokens in your text
Embedding Pricing Guide — Full pricing breakdown

Building a RAG pipeline? Compare embedding + generation costs together.

Try RAG Cost Calculator → 🔌 Free MCP Server →

This was a snapshot. What about next month?

Prices change. New models launch. Our tools catch what a one-time calculation can't — and saves you money every month.

Free Tools → 🔍 Free audit first

All Tools Are Free

No signup required to 67-model comparison, migration code snippets, PDF reports, price alerts, and cost monitoring. ✅ All tools free.

Free Tools →