03 · LLM Launches & Updates

The new LLM pricing math: how to cut your API bill without changing models

Token prices dropped again, but the real savings are in routing, caching, and context discipline. A practical framework for lowering cost per task.

By Priya Nandakumar — Editor, Models & Infrastructure

Published Jun 5, 2025 · Updated Jun 10, 2025 · 8 min read

speka.info/llm-updates/gpt-pricing-tiers/

Illustration: Comparison chart of LLM API pricing tiers and cost-per-task

Quick Answer

The fastest way to cut your LLM bill is not switching to a cheaper model — it’s routing simple tasks to small models, caching repeated context, and trimming prompt bloat. Together these routinely cut cost per task by half or more without any drop in output quality.

Key Takeaways

01Cost per task, not price per token, is the number that matters.
02Route easy requests to a small model and reserve the frontier model for hard ones.
03Prompt caching can slash repeated-context costs dramatically.
04Context discipline — sending only what’s needed — is the cheapest optimisation available.

Table of Contents

Stop optimising the wrong number
Route, don’t default
Cache what repeats
Trim the context

Every few months a lab cuts token prices and the internet celebrates. But if your bill is climbing, the price per token is rarely the problem — how you spend those tokens is. The teams with the lowest cost per task are not always on the cheapest model; they are the ones being deliberate about routing, caching, and context.

Stop optimising the wrong number

Price per token is a vendor metric. The number that shows up on your invoice is cost per task: how many tokens a completed unit of work consumes. A model half the price that needs three retries and a bloated prompt can cost more than a pricier one that gets it right the first time.

Route, don’t default

Most applications send everything to their best model out of habit. A large share of real requests — classification, extraction, short rewrites — are handled perfectly by a small, cheap model. Routing those away from the frontier tier is usually the single biggest saving available, and users never notice.

The cheapest token is the one you never send.

Cache what repeats

If your prompts share a long fixed preamble — a system prompt, a document, a set of examples — prompt caching lets you pay full price once and a fraction thereafter. For chat apps and document workflows with heavy repeated context, this alone can reshape a bill.

Trim the context

Sending an entire document when a model needs two paragraphs is the most common form of waste. Retrieve and pass only what the task requires. Context discipline costs nothing to adopt and compounds on every single call.

We track every price change and tier shift as it happens in LLM Launches & Updates.

Frequently asked questions

Should I always use the cheapest model?

No. Match the model to the task: small models for simple work, frontier models for genuinely hard reasoning. Optimise cost per completed task, not price per token.

What is prompt caching?

A feature that lets you reuse a fixed portion of a prompt across calls at a reduced rate, cutting the cost of repeated context like system prompts or documents.

How much can these techniques save?

It varies, but routing and caching together commonly cut cost per task by half or more without reducing output quality.

Written by

Priya Nandakumar

Priya covers the frontier-model market and previously worked in ML infrastructure. She builds the cost models she writes about and checks vendor pricing pages weekly.

The new LLM pricing math: how to cut your API bill without changing models

Stop optimising the wrong number

Route, don’t default

Cache what repeats

Trim the context

Frequently asked questions

Should I always use the cheapest model?

What is prompt caching?

How much can these techniques save?

Related posts

Claude Skills, explained: how reusable agent playbooks change AI workflows

GitHub Weekly Wins: five repos worth starring this week

How to sell AI automation services to local businesses (a starter playbook)