Blog · Jun 7, 2026

Prompt Engineering to Reduce AI API Costs by 50%

8 techniques that actually work — with real examples using GPT-5, Claude Sonnet 4.6, and DeepSeek V3.2.

Most developers optimize their AI stack by switching to cheaper models. That's the obvious move — but it's not the biggest lever. Prompt engineering alone can cut your API costs by 30-70% without changing models, infrastructure, or architecture.

Here are 8 techniques we've seen work across hundreds of production deployments. Each one includes a before/after example with real token counts and cost calculations.

1. Output Length Control — Save 30-50%

The single biggest cost driver is output tokens. Most models charge 3-10x more for output than input. If your prompt generates 500 tokens when you only need 100, you're paying 5x too much.

// Before: verbose prompt → 500 output tokens

"Analyze this customer review and provide a detailed sentiment analysis with explanation, confidence score, key themes, and actionable recommendations for the product team."

// After: focused prompt → 80 output tokens

"Classify this review as positive/negative/neutral. Reply with JSON: {sentiment, confidence: 0-1, one_word_reason}"

Cost impact on GPT-5 ($10/M output): 500 tokens → 80 tokens = $0.0042 savings per request. At 10K requests/day, that's $1,260/month saved.

2. System Prompts Over Few-Shot — Save 10-20%

Few-shot examples eat input tokens fast. Each example is 50-200 tokens. Using 5 examples costs 250-1,000 input tokens per request. A well-written system prompt achieves the same result in 50-100 tokens.

// Before: 5 few-shot examples → 800 input tokens

"Classify the sentiment. Examples: 'Great product!' → positive. 'Terrible service' → negative. 'It's okay' → neutral. 'Love it' → positive. 'Worst experience' → negative."

// After: system prompt → 120 input tokens

"Classify text sentiment as positive, negative, or neutral. Output single word."

3. Structured Output Formats — Save 20-40%

When you ask for natural language explanations, the model generates verbose responses. When you ask for JSON or structured data, responses are 2-5x shorter and more consistent.

// Before: natural language → 200 output tokens

"Tell me about this product's features and pricing."

// After: structured output → 60 output tokens

"Extract product info as JSON: {name, features: [], price, currency}. No explanation."

4. Prompt Caching (Repeat Prefixes) — Save 50-90%

OpenAI, Anthropic, and Google all support prompt caching. If your system prompt + context is the same across requests, the cached portion costs 50-90% less. The key is keeping your prefix consistent.

How it works: If your system prompt is 500 tokens and you send 1,000 requests/day with the same prefix, the cached portion (500 tokens) costs $0.000015/token instead of $0.00015/token on GPT-5. That's $20/month saved just from caching.

5. Model Routing — Save 50-80%

Not every request needs GPT-5 or Claude Opus. Route simple tasks (classification, extraction, formatting) to budget models, and reserve premium models for complex reasoning.

Task Type Instead Of Use Savings
Sentiment classification GPT-5 ($1.25/$10) DeepSeek V3.2 ($0.23/$0.34) 82%
Email categorization Claude Sonnet 4.6 ($3/$15) GPT-4o mini ($0.15/$0.60) 95%
Data extraction GPT-5 ($1.25/$10) DeepSeek V4 Flash ($0.14/$0.28) 89%
Complex reasoning GPT-5 ($1.25/$10) Keep GPT-5

6. Batch Processing — Save 50%

OpenAI's Batch API costs 50% less than the real-time API. If your use case can tolerate 24-hour turnaround (data processing, report generation, content moderation), batch is a no-brainer.

Cost impact: GPT-5 drops from $1.25/$10 to $0.625/$5 per 1M tokens. For a workload processing 10M tokens/day, that's $187/month saved.

7. Response Length Limits — Save 20-40%

Most APIs support a max_tokens parameter. Setting a reasonable limit prevents runaway output costs. If you only need 200 tokens, don't let the model generate 2,000.

Pro tip: Set max_tokens to 1.5x your expected output length. This catches edge cases without wasting tokens on rambling responses.

8. Prompt Compression — Save 30-60%

Remove filler words, use abbreviations, and compress instructions. The model understands compressed prompts just as well.

// Before: 85 tokens

"I would like you to please analyze the following customer review and provide me with a detailed sentiment analysis. Please include the overall sentiment, a confidence score from 0 to 1, and a brief explanation of why you chose that sentiment."

// After: 32 tokens (62% reduction)

"Analyze review sentiment. Output: {sentiment, confidence: 0-1, reason}"

Combined Impact: Real-World Example

Let's say you're running a customer support chatbot with GPT-5, processing 5,000 requests/day:

Metric Before After
Input tokens/request 800 350
Output tokens/request 400 120
Model GPT-5 DeepSeek V3.2
Daily cost $25.00 $0.63
Monthly cost $750 $18.90

Total savings: $731.10/month (97.5% reduction) — from prompt optimization + model routing combined.

Calculate your exact savings

Use our cost calculator to compare models and estimate how much you'd save with these techniques.

Open Cost Calculator

Quick Reference: Which Technique Saves Most?

Highest Impact

Model routing (50-80%) + output control (30-50%)

Easiest to Implement

Response length limits + structured output

Most Overlooked

Prompt caching + batch processing

Best ROI

Prompt compression (30-60% with 5 min effort)

Pricing data verified Jun 7, 2026. Use our cost calculator to estimate savings for your specific workload. See also: 12 Ways to Reduce AI API Costs and AI API Caching Strategies.