Why is output so much more expensive than input?

Generating tokens is sequential and compute-bound, while reading the prompt can be processed in parallel. Most vendors price output tokens 3–5× higher than input. That means a task's cost is driven far more by how much the model writes than by how much you feed it — short, structured outputs are the biggest cost lever you control.

Do reasoning models cost more?

Substantially. Reasoning models bill their hidden 'thinking' tokens as output, and their output rates are already the highest on the board. In this study the most expensive option is o3-Pro at $100.00 per 1,000 tasks — over 526× the cheapest model for the identical task.

How does prompt caching change the math?

For workloads that resend eligible context, some providers offer a discounted cached-input rate. The discount, minimum cache length, and expiry rules vary by model. Use the calculator's model-specific rate and validate the eligible share from production logs.

2026 LLM API Pricing Study: Real Cost Per Task

The headline finding

A single API task — 3,000 tokens of context in, a 500-token answer out — costs $0.19 per 1,000 runs on Llama 3.1 8B Instant (Groq), and $100.00 per 1,000 runs on o3-Pro. Same input, same output length, same task. The only thing that changed is which model you called — and that decision alone swings your bill by 526×. The median model lands at $5.50. 1 of the 20 models come in under a dollar per 1,000 tasks.

This is why "what does AI cost" has no single answer. The cost of intelligence in 2026 is not a number — it's a 526× range, and where you land in that range is an engineering decision, not a vendor one.

The full ranking

Every model in our pricing dataset, cheapest to most expensive, for the standard task. The bar is relative to the most expensive option.

1Llama 3.1 8B Instant (Groq)$0.19

2DeepSeek V4 Pro$1.74

3Llama 3.3 70B (Groq)$2.17

4Mistral Large 3$2.25

5Command R$2.25

6Sonar$3.50

7Llama 3.3 70B (Together)$3.64

8GPT-5.4 Mini$4.50

9Grok 4.3$5.00

10Claude Haiku 4.5$5.50

11o4-mini$5.50

12Gemini 3.5 Flash$9.00

13Gemini 3.1 Pro$12.00

14Command R+$12.50

15GPT-5.4$15.00

16Claude Sonnet 4.6$16.50

17Sonar Pro$16.50

18Claude Opus 4.8$27.50

19GPT-5.5$30.00

20o3-Pro$100.00

Cost per 1,000 tasks of 3,000 input + 500 output tokens, USD, at list price. Hosted open-weight models (Together and Groq) reflect that provider's rate, not self-hosting.

Why output tokens, not input, decide your bill

The instinct is to worry about how much context you send. The data says worry about how much the model writes back. Across these 20 models, output is often priced above input because generating tokens is sequential and compute-intensive. A model that looks cheap on input can still be expensive in practice if it is verbose. One reliable cost lever is a shorter, structured output: ask for JSON, cap max_tokens, stop the model from "explaining its answer" when you only need the answer.

The same task, three shapes

Cost ranking isn't fixed — it shifts with the shape of the workload. A chat reply, a retrieval answer, and a long-document summary stress different parts of the price card:

Workload	Cheapest	Most expensive	Spread
Short chat reply 1k in · 300 out	Llama 3.1 8B Instant (Groq) $0.07	o3-Pro $44.00	595×
RAG answer 8k in · 600 out	Llama 3.1 8B Instant (Groq) $0.45	o3-Pro $208.00	464×
Long-doc summary 30k in · 800 out	Llama 3.1 8B Instant (Groq) $1.56	o3-Pro $664.00	425×

Per 1,000 tasks, USD, list price. Long-context workloads punish models that price input highly; short replies are dominated by output rates.

The frontier tier is its own race

If you specifically need a flagship model, the spread is still wide. Among the models labelled frontier-tier in our dataset, the cheapest is Grok 4.3 at $5.00 per 1,000 tasks, while GPT-5.5 costs $30.00 — a 6× difference inside the "premium" bracket alone. Paying more does not reliably buy you more capability; it often just buys a different vendor's margin. Benchmark on your task before assuming the expensive model is the better one.

Want your own number, not ours? Run your real token counts through the LLM API Cost Calculator — or skip the token math entirely and pick a workload in the AI Cost-per-Task Calculator.

Prompt caching: the discount most teams leave on the table

Every price above is the uncached rate. For any workload that resends the same context — a fixed system prompt, the same RAG documents, or a few-shot template — some providers offer a model-specific cached-input rate. The discount, minimum cache length, and expiry rules differ. Measure the eligible repeated share in production and apply the corresponding rate; caching does nothing for genuinely novel input.

Methodology

We price one standardised task — 3,000 input tokens and 500 output tokens — using each vendor's published per-token list price, then multiply to a per-1,000-task figure. We use list prices (not negotiated or committed-use discounts), uncached input, and the vendor's own region-neutral USD rate. Hosted open-weight models reflect the named host's price, not the cost of self-hosting the weights, which has entirely different economics. Reasoning models are priced on visible tokens only; their hidden reasoning tokens bill as output and will increase real-world cost beyond the figure shown. Every rate links to the vendor's public pricing page in our source dataset. The last completed full-dataset check was 2026-07-14; confirm material decisions against the linked vendor source.

This data is free to cite and reference under CC BY 4.0 — a link back to this page is appreciated. Figures are list prices that change often; confirm against the vendor before committing a budget.

FAQ

Which LLM API is cheapest in 2026?

For our standard task, Llama 3.1 8B Instant (Groq) at $0.19per 1,000 tasks. But "cheapest" depends on workload shape and on whether you can cache context — the ranking above shifts by task. The cheapest model that meets your quality bar is the only one that matters; price is necessary, not sufficient.

Is GPT cheaper than Claude or Gemini?

It depends entirely on tier. Budget models from all three labs cost a fraction of their flagships, and the cheapest flagship in this study isn't always the one you'd guess — see the ranking. Compare the specific models you'd actually deploy, not the brands.

How often do these prices change?

Frontier model prices are the fastest-moving dataset we track — they can change as new models launch and providers revise their rate cards. This page is generated from the dated source dataset rather than duplicated hand-written price examples.

Prices verified 2026-07-14. Independent analysis, not financial advice — confirm current rates on each vendor's pricing page before sizing a workload.

The real cost of an LLM API call in 2026