Cheapest AI API for Speech-to-Text
Find the cheapest AI API for speech-to-text transcription. We compared 7 providers — from $0.004 per minute.
Calculate Your Transcription Cost
Enter your monthly audio volume to see the cheapest providers for your workload.
Use case:
Speech-to-Text API Cost Ranking
Every provider ranked by cost for a typical workload: 300 minutes/day, async batch processing.
Top Picks by Volume
Small Project (under $15/month)
Deepgram Nova-2$38.70/mo
OpenAI Whisper$54.00/mo
AssemblyAI Async$76.50/mo
Business ($50-200/month)
Deepgram Nova-2$258/mo (1K min/day)
OpenAI Whisper$360/mo
AssemblyAI Async$510/mo
Enterprise Volume ($500+/month)
Deepgram Nova-2$1,296/mo (5K min/day)
OpenAI Whisper$1,800/mo
Google Cloud STT$2,880/mo
Strategy: Async + Realtime Hybrid
Use a hybrid approach — async processing for recorded content, real-time for live applications.
Smart STT Pipeline (500 min/day)
70% recordings → Deepgram async ($0.0043/min)$45.15/mo
30% live calls → Deepgram realtime ($0.0077/min)$34.65/mo
Total with hybrid$79.80/mo (vs $135 using realtime for everything)
The hybrid approach saves 41% compared to using real-time processing for all audio. Most recorded content can be processed asynchronously at lower cost.
Find the cheapest model for your transcription workload
Enter your usage and see all providers ranked by cost. Free, no signup.
Open Savings Calculator →Key Factors When Choosing an STT API
- Per-minute pricing: STT APIs charge per minute of audio processed. Async (batch) is 40-60% cheaper than real-time streaming. If you don't need instant results, always use async.
- Accuracy matters more than price: A 1% accuracy improvement can save hours of manual correction. Deepgram Nova-2 and OpenAI Whisper lead in accuracy benchmarks. The cheapest option isn't always the best value if accuracy is poor.
- Speaker diarization: For meetings and multi-speaker content, speaker diarization (who said what) is essential. Deepgram, AssemblyAI, and Google all offer this. It may cost extra ($0.01-0.02/min additional).
- Language support: OpenAI Whisper supports 100+ languages. Deepgram focuses on English and major European/Asian languages. Google Cloud STT has the broadest language coverage. Choose based on your target languages.
- Custom vocabulary: For technical, medical, or industry-specific content, custom vocabulary support improves accuracy. AssemblyAI and Google have the best custom vocabulary options.
- File size limits: Some APIs limit audio file length (e.g., 25MB for OpenAI Whisper API). For long recordings, use chunked uploads or choose a provider with no file size limits (Deepgram, Google).
Related Tools
- Savings Calculator — See how much you can save by switching models
- Cost Explorer — See all 42 models ranked by your usage
- Cheapest AI API Finder — Find the absolute cheapest model
Related Reading
- Cheapest LLM APIs in 2026 — Full ranking of every model
- Cheapest AI API for Text-to-Speech — TTS cost comparison
- Cheapest AI API for Chatbots — Chatbot cost comparison