Model vs model, real cost
LLM API cost comparisons
The per-token rate is the easy part. What actually decides your bill is the input/output mix of your workload. Each comparison below runs both models across the same four real jobs — and stays current, because every page reads live from our pricing data instead of a blog post that went stale six months ago.
- Comparisons
- 10
- Workloads each
- 4
- Pricing
- Live
Every head-to-head
Claude Opus 4.8 vs GPT-5.5
Anthropic vs OpenAI — cost per 1,000 calls across chat, RAG, long-doc summary, and bulk classification.
Compare cost →Claude Opus 4.8 vs Gemini 3.1 Pro
Anthropic vs Google — cost per 1,000 calls across chat, RAG, long-doc summary, and bulk classification.
Compare cost →GPT-5.5 vs Gemini 3.1 Pro
OpenAI vs Google — cost per 1,000 calls across chat, RAG, long-doc summary, and bulk classification.
Compare cost →Claude Sonnet 4.6 vs GPT-5.4
Anthropic vs OpenAI — cost per 1,000 calls across chat, RAG, long-doc summary, and bulk classification.
Compare cost →Claude Sonnet 4.6 vs Gemini 3.1 Pro
Anthropic vs Google — cost per 1,000 calls across chat, RAG, long-doc summary, and bulk classification.
Compare cost →GPT-5.4 vs Gemini 3.1 Pro
OpenAI vs Google — cost per 1,000 calls across chat, RAG, long-doc summary, and bulk classification.
Compare cost →GPT-5.4 Mini vs Gemini 3.5 Flash
OpenAI vs Google — cost per 1,000 calls across chat, RAG, long-doc summary, and bulk classification.
Compare cost →Claude Haiku 4.5 vs GPT-5.4 Mini
Anthropic vs OpenAI — cost per 1,000 calls across chat, RAG, long-doc summary, and bulk classification.
Compare cost →DeepSeek V4 Pro vs Claude Sonnet 4.6
DeepSeek vs Anthropic — cost per 1,000 calls across chat, RAG, long-doc summary, and bulk classification.
Compare cost →DeepSeek V4 Pro vs GPT-5.5
DeepSeek vs OpenAI — cost per 1,000 calls across chat, RAG, long-doc summary, and bulk classification.
Compare cost →
How we compare
Every comparison prices the same four workloads, so the input-heavy vs output-heavy trade-off between two models is visible at a glance. A model that's cheapest on short chats can be the expensive one on long-document summaries — the mix is the whole game.
Short chat turn
1K in / 500 out — a typical assistant reply
RAG answer
8K in / 800 out — retrieved context + grounded answer
Long-doc summary
50K in / 2K out — summarize a long document
Bulk classification
2K in / 50 out — label/route at high volume