How much does it cost to run an AI chatbot?

The honest 2026 breakdown — model choice, prompt caching, conversation length, and the three mistakes that quietly triple your bill. Pricing verified 2026-05-28.

The short answer

A typical customer-support chatbot handling 1,000 conversations per day at about five turns each costs, in mid-2026:

  • $50–200/month on an efficient model (Claude Haiku 4.5, GPT-5.4 Mini, Gemini 3.5 Flash) with prompt caching
  • $200–800/month on a balanced model (Claude Sonnet 4.6)
  • $800+/month on a frontier model (GPT-5.5, Claude Opus 4.7)

That's a 10–20× spreadfor the exact same workload. The number you land on is decided by four levers, and most teams get at least two of them wrong. Here's each one.

Want a number for your exact volume? The AI Cost-per-Task Calculator has a chatbot template — set your conversations/day and turns, and it models the cost across every major model instantly.

Lever 1 — Model choice (the 20× lever)

The price gap between efficient and frontier models in 2026 is enormous. On output tokens — which dominate chatbot cost because the model talks back — you're looking at roughly:

  • Gemini 3.5 Flash: $0.30 per million output tokens
  • GPT-5.4 Mini: $0.60 per million
  • Claude Haiku 4.5: $5.00 per million
  • Claude Sonnet 4.6: $15.00 per million
  • Claude Opus 4.7: $25.00 per million
  • GPT-5.5: $30.00 per million

The strategic insight most teams miss: a support or FAQ chatbot rarely needs a frontier model. Answering "what's your refund policy" from a knowledge base is not a reasoning-hard task. Efficient models handle it indistinguishably from frontier models in blind tests. Paying GPT-5.5 prices for FAQ answers is the most common — and most expensive — mistake in production chatbots.

When does a frontier model earn its price? When the chatbot must reason over ambiguous, multi-step, or high-stakes queries — medical triage, legal interpretation, complex troubleshooting where a wrong answer is costly. For those, the premium is justified. For everything else, start cheap and only upgrade if quality testing demands it.

Lever 2 — Prompt caching (the 40–55% lever)

Every chatbot has a system prompt — the instructions, tone, and often a chunk of knowledge base — sent on every single turn. It's identical each time. Without caching, you pay full input price to re-send it on every message.

Prompt caching fixes this. Anthropic, OpenAI, Google, and DeepSeek all let you cache the repeated portion and pay roughly 10% of the normal input rate for it. For a chatbot with a 600-token system prompt and five-turn conversations, caching typically cuts 40–55% of total input cost.

This is free money that teams routinely leave on the table. If you're running a chatbot without prompt caching enabled, that is almost always the first optimization to make — bigger impact than switching models in many cases.

Lever 3 — Conversation length (the hidden multiplier)

Here's the trap that surprises people. A chatbot doesn't cost "X per conversation." It costs X per turn, and conversation history grows with each turn because the model needs the full thread as context.

By turn five, you're sending the system prompt plus four previous exchanges plus the new message. So a ten-turn conversation doesn't cost twice a five-turn one — it can cost three to four times as much, because later turns carry much more context.

The practical lever: keep conversations focused. A chatbot that resolves issues in three turns is not just better UX — it's materially cheaper to run than one that meanders through ten. Design for resolution, not engagement.

Lever 4 — Volume (and why batch doesn't apply here)

Cost scales linearly with conversation volume, which is obvious. The less obvious point: real-time chatbots can't use the batch API discounts(typically 50% off) that classification and summarization workloads can, because users are waiting for a response. So a chatbot's per-token cost is the full real-time rate. Plan around that.

Worked example: 1,000 conversations/day

Let's cost a realistic support chatbot: 600-token system prompt, five turns per conversation, ~100-token user messages, ~250-token responses, 55% of input cached.

  • Gemini 3.5 Flash: roughly $40–70/month
  • GPT-5.4 Mini: roughly $60–110/month
  • Claude Haiku 4.5: roughly $150–250/month
  • Claude Sonnet 4.6: roughly $400–700/month
  • Claude Opus 4.7 / GPT-5.5: roughly $800–1,400/month

Same chatbot. Same users. Same conversations. The only difference is the four levers above. This is why "how much does a chatbot cost" has no single answer — it has a range that you control.

These ranges come straight from the AI Cost-per-Task Calculator. Plug in your real numbers — it models caching and conversation growth automatically. For raw token-level pricing across 25+ models, see the LLM API Cost Calculator.

The three mistakes that triple a chatbot bill

  1. Defaulting to a frontier model.Teams reach for the most capable model "to be safe" and pay 10–20× more than needed. Start with an efficient model; upgrade only if quality testing forces you to.
  2. Not enabling prompt caching. Re-sending the same system prompt at full price on every turn. Free 40–55% saving left unclaimed.
  3. Letting conversations sprawl. Every extra turn carries the whole thread. Long chats cost super-linearly. Design for fast resolution.

Fix all three and a chatbot bill commonly drops by 60–80% with no loss in answer quality. That's the difference between a feature that's "too expensive to ship" and one that's comfortably profitable.

FAQ

What's the cheapest way to run a chatbot?

An efficient model (Gemini 3.5 Flash, GPT-5.4 Mini, or DeepSeek V4 Pro) with prompt caching enabled and conversations designed to resolve quickly. That combination is often 15–20× cheaper than a frontier model with no caching and sprawling chats.

Should I use one model for everything?

Not necessarily. A common pattern is "model routing": send simple queries to a cheap model and escalate only genuinely hard ones to a frontier model. This can cut costs substantially while preserving quality where it matters. It adds engineering complexity, so weigh it against your volume.

Do embedding/RAG costs change this?

If your chatbot retrieves from a knowledge base (RAG), add the retrieval-call cost on top. Embedding costs for indexing are usually trivial (rounding error vs generation). The bigger RAG cost is the retrieved context you stuff into each prompt — see the RAG template in our cost calculator.

How accurate are these estimates vs my real bill?

Within 10–20% for typical workloads. Variance comes from your actual token distribution, batch discounts (not applicable to real-time chat), and enterprise volume pricing. Log a week of real usage for precision, then model it.

Pricing verified 2026-05-28. This is general guidance, not a quote — always confirm current rates on the vendor pricing page before committing to a workload.

Calculate your own cost