Independent data study · 25 models

The real cost of an LLM API call in 2026

We took one identical task and priced it across every major large-language-model API — 25 models from OpenAI, Anthropic, Google, Meta, DeepSeek, Groq and more. The gap between the cheapest and most expensive way to do the exact same thing is 526×. Prices verified 2026-05-28.

The headline finding

A single API task — 3,000 tokens of context in, a 500-token answer out — costs $0.19 per 1,000 runs on Llama 3.1 8B Instant (Groq), and $100.00 per 1,000 runs on o3-Pro. Same input, same output length, same task. The only thing that changed is which model you called — and that decision alone swings your bill by 526×. The median model lands at $5.50. 5 of the 25 models come in under a dollar per 1,000 tasks.

This is why "what does AI cost" has no single answer. The cost of intelligence in 2026 is not a number — it's a 526× range, and where you land in that range is an engineering decision, not a vendor one.

The full ranking

Every model in our pricing dataset, cheapest to most expensive, for the standard task. The bar is relative to the most expensive option.

1Llama 3.1 8B Instant (Groq)$0.19
2Gemini 3.5 Flash$0.38
3GPT-5.4 Mini$0.75
4Command R$0.75
5Jamba 1.5 Mini$0.80
6DeepSeek V4 Pro$1.74
7Llama 3.3 70B (Groq)$2.17
8o4-mini$2.75
9Llama 3.1 70B$3.08
10Llama 3.3 70B (Together)$3.08
11Llama 3.3 70B (Fireworks)$3.15
12Sonar$3.50
13Claude Haiku 4.5$5.50
14Mistral Large$9.00
15Jamba 1.5 Large$10.00
16Llama 3.1 405B$11.50
17Gemini 3.1 Pro$12.00
18Command R+$12.50
19GPT-5.4$15.00
20Claude Sonnet 4.6$16.50
21Sonar Pro$16.50
22Grok 2$22.50
23Claude Opus 4.7$27.50
24GPT-5.5$30.00
25o3-Pro$100.00

Cost per 1,000 tasks of 3,000 input + 500 output tokens, USD, at list price. Hosted open-weight models (Together, Fireworks, Groq) reflect that provider's rate, not self-hosting.

Why output tokens, not input, decide your bill

The instinct is to worry about how much context you send. The data says worry about how much the model writes back. Across these 25 models, output is priced 3–5× higher than input— because generating tokens one-by-one is compute-bound, while reading your prompt is parallelisable. A model that looks cheap on input can be expensive in practice if it's verbose. The single most reliable way to cut an AI bill isn't a cheaper model — it's a shorter, structured output: ask for JSON, cap max_tokens, stop the model from "explaining its answer" when you only need the answer.

The same task, three shapes

Cost ranking isn't fixed — it shifts with the shape of the workload. A chat reply, a retrieval answer, and a long-document summary stress different parts of the price card:

WorkloadCheapestMost expensiveSpread
Short chat reply
1k in · 300 out
Llama 3.1 8B Instant (Groq)
$0.07
o3-Pro
$44.00
595×
RAG answer
8k in · 600 out
Llama 3.1 8B Instant (Groq)
$0.45
o3-Pro
$208.00
464×
Long-doc summary
30k in · 800 out
Llama 3.1 8B Instant (Groq)
$1.56
o3-Pro
$664.00
425×

Per 1,000 tasks, USD, list price. Long-context workloads punish models that price input highly; short replies are dominated by output rates.

The frontier tier is its own race

If you specifically need a flagship model, the spread is still wide. Among the models labelled frontier-tier in our dataset, the cheapest is Mistral Large at $9.00 per 1,000 tasks, while GPT-5.5 costs $30.00 — a 3× difference inside the "premium" bracket alone. Paying more does not reliably buy you more capability; it often just buys a different vendor's margin. Benchmark on your task before assuming the expensive model is the better one.

Want your own number, not ours? Run your real token counts through the LLM API Cost Calculator — or skip the token math entirely and pick a workload in the AI Cost-per-Task Calculator.

Prompt caching: the discount most teams leave on the table

Every price above is the uncached rate. For any workload that resends the same context — a fixed system prompt, the same RAG documents, a few-shot template — most vendors now bill the repeated portion at roughly 10% of the input rate, and a few go far lower. On a workload where 80% of the input is reused, caching can cut total cost 30–60% without changing the model. It does nothing for genuinely novel input, which is why it rewards architecture (stable prompt prefixes) more than it rewards model choice.

Methodology

We price one standardised task — 3,000 input tokens and 500 output tokens — using each vendor's published per-token list price, then multiply to a per-1,000-task figure. We use list prices (not negotiated or committed-use discounts), uncached input, and the vendor's own region-neutral USD rate. Hosted open-weight models reflect the named host's price, not the cost of self-hosting the weights, which has entirely different economics. Reasoning models are priced on visible tokens only; their hidden reasoning tokens bill as output and will increase real-world cost beyond the figure shown. Every rate links to the vendor's public pricing page in our source dataset, and the whole set is re-verified weekly — this study was last confirmed 2026-05-28.

This data is free to cite and reference under CC BY 4.0 — a link back to this page is appreciated. Figures are list prices that change often; confirm against the vendor before committing a budget.

FAQ

Which LLM API is cheapest in 2026?

For our standard task, Llama 3.1 8B Instant (Groq) at $0.19per 1,000 tasks. But "cheapest" depends on workload shape and on whether you can cache context — the ranking above shifts by task. The cheapest model that meets your quality bar is the only one that matters; price is necessary, not sufficient.

Is GPT cheaper than Claude or Gemini?

It depends entirely on tier. Budget models from all three labs cost a fraction of their flagships, and the cheapest flagship in this study isn't always the one you'd guess — see the ranking. Compare the specific models you'd actually deploy, not the brands.

How often do these prices change?

Frontier model prices are the fastest-moving dataset we track — they change on the order of weeks as new models launch and labs compete on price. That's why this page is regenerated from a weekly-verified dataset rather than hand-written numbers.

Prices verified 2026-05-28. Independent analysis, not financial advice — confirm current rates on each vendor's pricing page before sizing a workload.

Price your own workload