How to find the cheapest cloud GPU

The same GPU can cost 8.6× more on one cloud than another — so "cheapest" is less about the card and more about where and how you rent it. Here's the playbook, with a live price table across 20 providers (verified 2026-06-05).

The short answer

The cheapest cloud GPU is the lowest-priced card that still fits your model's memory and throughput need — rented on a specialist or marketplace cloud, on spot/interruptible capacity if your workload can tolerate it. Today the floor across our index is $0.35/hr for the RTX 4090 at Vast.ai. The expensive way to do this is to default to a hyperscaler and over-spec the card.

Why the same GPU has a 8.6× price range

Take the H100, the reference training card. On our index it runs from $1.65/hr at Vast.ai to $14.19/hr at Google Cloud — identical silicon, a 8.6× spread. The gap isn't performance; it's what's bundled around the card. Specialist clouds sell the GPU and little else. Hyperscalers wrap it in their platform, networking, enterprise support, and compliance, and charge for the whole bundle — valuable if you need it, pure overhead if you don't.

On-demand vs spot vs reserved

Three pricing models, three different jobs:

  • On-demand— the GPU is yours until you release it. Highest hourly rate, zero interruption risk. Default for interactive work, demos, and anything you can't checkpoint.
  • Spot / marketplace — interruptible capacity at a deep discount (commonly 40–70% off). It can be reclaimed with little notice, so it fits checkpointed training and fault-tolerant batch jobs. The single biggest lever on a training bill.
  • Reserved / committed — you commit to weeks or months up front for a lower effective rate. Cheapest per hour if you keep the card busy the whole term; a trap if utilization is spiky.

Live: cheapest cloud GPU by model

Lowest tracked on-demand rate for each major card, re-verified weekly. Tap any model for the full provider breakdown.

GPUVRAMCheapest $/hrBest providerSpread
B200192GB$4.99Lambda2.9×
H200141GB$2.60GMI Cloud1.7×
H10080GB$1.65Vast.ai8.6×
A100 80GB80GB$0.78Thunder Compute7.4×
L40S48GB$0.72Spheron1.2×
RTX 409024GB$0.35Vast.ai2.0×
RTX 509032GB$0.76Spheron1.0×

Standard on-demand pricing, per single GPU, USD · verified 2026-06-05 · full GPU Price Index →

The hidden costs that erase a cheap rate

A low $/hr is necessary, not sufficient. Three things quietly add back the saving:

  • Idle time. You pay for every hour the GPU exists, not every hour you use it. A card you keep busy 30% of the day effectively costs three times its headline rate. Utilization, not the sticker, is the real price.
  • Egress & storage. Moving datasets in and checkpoints out — and parking them between runs — can rival the compute line on data-heavy jobs. Specialist clouds often charge less here too.
  • Reliability tax.The cheapest marketplace node is no bargain if it's reclaimed mid-epoch and you lose hours of progress. Checkpoint aggressively, or pay up for on-demand where interruption is unacceptable.

How to actually choose

  1. Right-size the card.Match VRAM and throughput to the model — don't rent a B200 to serve something an L40S handles.
  2. Pick the pricing model to the workload.Checkpointed training → spot. Interactive or can't-fail → on-demand. Steady long-running → consider reserved.
  3. Start specialist, escalate only if you need the platform. Reach for a hyperscaler when you genuinely need its networking, compliance, or managed services — not by default.
  4. Compare live. Rates move; check the GPU Price Index before each significant run.

Cheaper still: should you rent at all?

If your goal is running an LLM rather than training one, the cheapest option may be not renting a GPU at all. A managed API bills only for the tokens you use, while a rented GPU bills around the clock — so the API is usually cheaper until you reach real, steady volume. Find your crossover with the self-host vs API breakeven calculator, and price your model mix in the LLM API cost calculator.

Frequently asked questions

What is the cheapest cloud GPU to rent?

For raw $/hr, older or consumer-class cards (RTX 4090, L40S, A100) on specialist and marketplace clouds are cheapest, often under $1/hr. For modern training silicon, an H100 starts around $1.65/hr on a marketplace versus $10–14/hr on a hyperscaler. The cheapest card that actually fits your model's VRAM and throughput need is the one to pick — paying for a B200 to serve a 7B model is the most common way to overspend.

Why is the same GPU so much cheaper on some clouds?

It's the same silicon — the price gap is packaging, not performance. Specialist GPU clouds and marketplaces compete on raw price and pass through cheap capacity; hyperscalers (AWS, Azure, Google) bundle the card with their platform, networking, support, and enterprise SLAs and price several times higher. For a self-contained training or inference job, the specialist rate is usually all you need.

Is spot or on-demand cheaper for GPUs?

Spot and marketplace supply is cheaper — often 40–70% below on-demand — but it can be reclaimed with little notice, so it suits checkpointed training and fault-tolerant batch work. On-demand guarantees the GPU stays yours and is the right default for interactive workloads and anything you can't afford to have interrupted.

Should I rent a GPU or just use an LLM API?

If you're running an existing model rather than training one, a managed API is usually cheaper until you hit real scale, because a rented GPU bills every hour it exists while an API bills only per token used. Renting wins above a breakeven volume and only if you keep the card busy — model your own crossover before committing.

Independent analysis, no vendor influence. Prices are standard on-demand rates per single GPU, re-verified weekly (last 2026-06-05); spot, committed-use, and negotiated rates differ. Published under CC BY 4.0 — cite freely with a link.

Keep going