Self-host vs API breakeven

"Just run our own model and stop paying per token" is right only above a certain volume — and the line moves once you count the GPU you pay for around the clock but use part of the day. Find the token volume where a dedicated host actually beats the managed API.

API cost / mo

$1,500

at 500M tokens × $3/M

Breakeven volume

600M

tokens/mo where self-host = API

Cheaper option

API

by ~$300/mo at this volume

At 500M tokens/month, you cross the self-host breakeven at 600M tokens. You're below it, so the managed API is cheaper and you skip the ops burden entirely.

This compares raw token cost only. Self-hosting also carries engineering time, idle-GPU waste (a host you pay for 24/7 but use 30% of the day erases much of the saving), scaling headaches, and model-update lag. Treat the breakeven as the floor volume to even consider self-hosting — not the point at which it automatically wins.

The breakeven, in one line

Self-hosting trades a per-token price for a fixed monthly cost, so it wins only when your volume is high enough that the equivalent API bill would exceed that fixed cost. The crossover is simply fixed monthly host cost ÷ API price per token. Below it, pay per token; above it, a dedicated host can be cheaper — with one big caveat below.

Why idle time is the real decider

An API charges you for exactly the tokens you use. A GPU charges you for every hour it exists, busy or not. If your traffic is spiky — heavy during the day, quiet at night — a host you use 30% of the time costs more than three times its "effective" rate, which can erase the entire saving. Steady, high-utilization workloads are where self-hosting genuinely pays; bursty ones usually favour the API well past the raw breakeven.

A worked example

At 500M tokens a month and a $3-per-million blended API rate, the API costs about $1,500/month. A dedicated host at $1,800/month all-in is moreexpensive here — you don't cross breakeven until ~600M tokens. Push volume to 1.2B tokens and the API would be ~$3,600 while the host stays $1,800: now self-hosting saves ~$1,800/month — provided the GPU is busy enough to deliver that throughput. The decision is volume andutilization, never volume alone.

How to use it

Estimate your real monthly token volume, your blended API rate (price your exact model mix in the LLM API Cost Calculator), and an honest all-in host cost including ops. If you're below breakeven, the answer is easy. If you're above it, pressure-test utilization before committing — and remember the API also buys you instant access to new models and zero maintenance.

Frequently asked questions

When is self-hosting an LLM cheaper than using an API?

Only above a breakeven volume. Self-hosting swaps a per-token price for a fixed monthly GPU + operations cost, so it wins only once your usage is high enough that the equivalent API bill would exceed that fixed cost. Below the breakeven, the managed API is cheaper and far less work; above it, a dedicated host can save money — if you keep it busy.

What is the breakeven volume for self-hosting?

It's the fixed monthly self-host cost divided by the API price per token. If a host costs $1,800/month all-in and the API charges $3 per million tokens, breakeven is 600 million tokens per month. Process less than that and the API is cheaper; process more and self-hosting starts to pay — assuming high utilization.

What costs does self-hosting add beyond the GPU rental?

The GPU is the visible cost; the hidden ones decide the outcome. Engineering time to deploy and maintain the stack, idle-GPU waste (you pay 24/7 but may use a fraction of the day), autoscaling and reliability work, and the lag in getting new model versions a managed API ships day one. A host used 30% of the time erases most of the headline saving.

Should I self-host to save money?

Treat the breakeven as a floor, not a green light. If you're well below it, don't — the API is cheaper and simpler. If you're well above it and your traffic is steady enough to keep GPUs busy, model the full cost (including ops and idle time) before switching. Many teams that 'should' self-host on paper still come out ahead on a managed API once their real utilization is counted.

Independent analysis, not infrastructure advice. Compares raw token cost; validate against your real utilization and ops overhead before switching.

Related