Question 1

Why does the cheapest LLM depend on the workload?

Accepted Answer

Because providers price input and output tokens differently — often output costs 3–5x more than input. A model that's cheapest for short, input-heavy chats can be the most expensive for long-document summaries that generate lots of output. Each comparison here prices the same four real workloads (chat, RAG, long-doc summary, bulk classification) so the trade-off is visible instead of hidden behind a single per-token headline.

Question 2

Are these prices current?

Accepted Answer

Yes — every comparison reads live from our pricing data module rather than being typed into a blog post, so the numbers update when provider pricing changes. We re-verify the underlying rates against each provider's published pricing page on a regular cadence.

Question 3

Does prompt caching change the comparison?

Accepted Answer

Significantly, for input-heavy workloads. Cached input tokens are billed at roughly 10% of the normal input rate by most providers, so a model with a large repeated system prompt can become far cheaper once caching is applied. Use the LLM API Cost Calculator to model your own cache-hit ratio.

Question 4

Which model should I default to in production?

Accepted Answer

Pick the one that's cheapest on the workload that dominates your actual traffic, not the one with the lowest headline rate. Find your dominant workload in the four profiles below, then read the verdict on the relevant comparison page — the cheapest model often flips between chat-heavy and summary-heavy products.

LLM API cost comparisons

Every head-to-head

Claude Opus 4.8 vs GPT-5.5

Claude Opus 4.8 vs Gemini 3.1 Pro

GPT-5.5 vs Gemini 3.1 Pro

Claude Sonnet 4.6 vs GPT-5.4

Claude Sonnet 4.6 vs Gemini 3.1 Pro

GPT-5.4 vs Gemini 3.1 Pro

GPT-5.4 Mini vs Gemini 3.5 Flash

Claude Haiku 4.5 vs GPT-5.4 Mini

DeepSeek V4 Pro vs Claude Sonnet 4.6

DeepSeek V4 Pro vs GPT-5.5

How we compare

Short chat turn

RAG answer

Long-doc summary

Bulk classification

Frequently asked questions

Go deeper