There’s a conversation about AI model costs that most businesses aren’t having yet, but they will be.

It’s not “should we use AI?” That ship has sailed. It’s “why is our AI bill so high, and what on earth are we getting for it?”

When we built our first team of AI agents at Devstars, we made every mistake going. We defaulted to the most powerful model for everything, ran tasks sequentially when they could run in parallel, and wondered why we were burning through £62 before lunch.

ai model costs comparison chart: daily token expenses for claude sonnet, opus, and haiku models over several days.

A bit of attention, a bit of model-routing logic, and that bill dropped to around £10 a day. Same output quality. Same capabilities. Just smarter decisions about which brain to use for which job.

Here’s what we learned.

Contents hide

First, Let’s Talk Tokens

The Main Models and What They Cost

The Mental Model: Three Tiers of AI

Matching Work to Models: A Practical Guide

How We Cut Our AI Model Costs by 75%

The Cost of Not Thinking About This

What This Means for You Right Now

A Note on Pricing Changes

First, Let’s Talk Tokens

Before any pricing makes sense, you need to understand how AI is actually billed.

AI models don’t charge by the word, the query, or the hour. They charge by the token.

A token is roughly 0.75 of a word, or about 4 characters. So “digital marketing” is approximately 3 tokens. A 1,000-word blog post is around 1,300 tokens.

There are two types, and they’re priced very differently:

Input tokens — everything you send to the model. Your question, your instructions, any documents you’re asking it to analyse, and conversation history. This is relatively cheap.
Output tokens — everything the model generates back. The response, the analysis, the content. This is significantly more expensive, typically 4–8x the price of input.

Why the gap? Generating output is computationally heavy. The model is creating something new, token by token, using considerable processing power. Reading your input is comparatively light work.

This matters enormously when you’re designing AI workflows. If you can ask for a shorter, structured answer rather than a flowing essay, you save real money. A model that returns a JSON object rather than paragraphs of explanation costs a fraction of the price for the same underlying analysis.

The Main Models and What They Cost

All AI model costs below are quoted per million tokens as of early 2026. Think of a million tokens as roughly 750,000 words, or about 600 average blog posts.

Claude (Anthropic)

Model	Input (per 1M tokens)	Output (per 1M tokens)	Best For
Claude Opus	£4.00	£20.00	Complex reasoning, nuanced strategy
Claude Sonnet	£2.40	£12.00	Balanced quality and cost
Claude Haiku	£0.80	£4.00	Fast, repetitive tasks

Anthropic’s tiered naming is straightforward: Opus is the flagship, Sonnet is the workhorse, Haiku is the sprinter. The gap between Opus and Haiku is 25x on output. That is the difference between £10/day and £250/day at the same volume. See Anthropic’s official pricing for the latest figures.

OpenAI (GPT)

Model	Input (per 1M tokens)	Output (per 1M tokens)	Best For
GPT-5 Mini	£0.20	£1.60	High-volume classification, simple tasks
GPT-4o Mini	£0.12	£0.48	Cheap, fast, surprisingly capable
GPT-5	£1.00	£8.00	General reasoning, content
GPT-4o	£2.00	£8.00	Multimodal, vision, nuanced tasks
GPT-5.2	£1.40	£11.20	Coding, agentic tasks

GPT-4o Mini is extraordinary value for the right tasks. It costs almost nothing and handles classification, routing, summarisation, and structured data extraction reliably well. GPT-4 classic, by contrast, used to cost £24/£48 per million tokens — most people had no idea they were paying that.

Gemini (Google)

Model	Input (per 1M tokens)	Output (per 1M tokens)	Best For
Gemini 2.0 Flash-Lite	£0.06	£0.24	Ultra-cheap volume tasks
Gemini 2.5 Flash-Lite	£0.08	£0.32	Fast lightweight tasks
Gemini 2.5 Flash	£0.28	£1.12	Research, long-context analysis
Gemini 2.5 Pro	£1.00	£8.00	Deep reasoning, complex analysis
Gemini 3 Pro	£1.60	£9.60	Flagship capability

Google’s Gemini Flash models have enormous context windows (you can feed in entire documents, websites, even videos) at prices that make Opus look like a luxury car. For research-heavy tasks where you need to process large amounts of text, Gemini Flash is often the right call.

The Mental Model: Three Tiers of AI

Stop thinking about specific model names and start thinking in tiers.

Tier 1 — The Strategists (Opus, GPT-5.2 Pro, Gemini 3 Pro)
These are your senior partners. Expensive, deliberate, exceptional at nuanced reasoning. Use them sparingly for the decisions that genuinely require deep thinking. Final strategy documents. Complex competitive analysis. High-stakes content where quality is non-negotiable.

Tier 2 — The Workhorses (Sonnet, GPT-5, Gemini 2.5 Pro, GPT-4o)
This is where most of your AI work should live. Balanced capability and cost. Good enough for the vast majority of tasks, excellent for most. First drafts, analysis, client reports, research synthesis, content production at scale.

Tier 3 — The Sprinters (Haiku, GPT-4o Mini, GPT-5 Mini, Gemini 2.5 Flash-Lite)
Fast and cheap. Deceptively capable for the right tasks. Routing decisions, classification, structured data extraction, summarisation, simple Q&A. When you need to process thousands of items, this is your tier.

Matching Work to Models: A Practical Guide

Use Tier 1 for:

Strategy documents requiring genuine reasoning and nuance
Situations where a hallucination or poor judgment would cause real damage
Code architecture decisions (not the actual code — the decisions)
Complex financial or legal document analysis

Use Tier 2 for:

Blog posts, email sequences, landing page copy
SEO content and keyword-focused articles
Client analysis reports
Research synthesis and summaries
Coding and script writing
Most agentic tasks where the agent needs to make judgement calls

Use Tier 3 for:

Deciding which tier to use for the next task (yes, really — a cheap model routing to an expensive one saves money)
Classifying customer queries before they reach a human
Extracting structured data from documents
Summarising long documents before sending them to a more expensive model
Checking outputs for format compliance
Any task where you know the output is simple and predictable

How We Cut Our AI Model Costs by 75%

When we built our AI agent team at Devstars, we started naively. Every task went to the best model because it felt safer. Why take risks with a cheaper model?

The problem is that “safer” is relative. Yes, Opus produces slightly better output than Haiku for some tasks. But for a task like “does this piece of content contain a keyword?” or “summarise this in three bullet points,” Haiku does the job just as well at a fraction of the cost.

Here’s what we changed:

We introduced a routing layer. A cheap, fast model reads each incoming task and decides which tier it needs. The routing model itself costs almost nothing. The savings from routing medium tasks away from flagship models are substantial.
We compressed outputs. Instead of asking for a detailed explanation with a flowing narrative, we ask for structured JSON with specific fields. Shorter outputs, same information, lower cost.
We used prompt caching. Anthropic (and others) allow you to cache prompts that repeat across requests. If your system instructions are 2,000 tokens and you send 1,000 requests, that’s 2 million input tokens. With caching, you can reduce that by up to 90%.
We reserved Tier 1 for Tier 1 work. Sounds obvious. Wasn’t in practice.

The result: costs dropped from around £32 to roughly £8 a day. Same agents, same tasks, same quality output. Just smarter model selection.

The Cost of Not Thinking About This

Here’s the reality check.

A startup running AI agents, processing content, analysing customer queries, and generating reports can easily spend £1,000–£3,000 a month on AI model costs if they are not paying attention. Most of that spend comes down to two mistakes: using expensive models for low-value tasks, and generating more output tokens than necessary.

A few hours of audit — mapping each workflow to the appropriate model tier, compressing outputs, implementing caching — can cut that by half or more without any reduction in quality. That’s real money, especially when you’re scaling.

What This Means for You Right Now

If you’re just starting with AI tools, don’t default to the most powerful model. Start with a mid-tier model (Sonnet, GPT-4o, Gemini 2.5 Pro), see what quality you’re getting, then decide whether you need to step up or can step down.

If you’re already running AI workflows, audit them. For each task, ask: Does this genuinely require Tier 1, or am I paying premium prices out of habit?

If you’re building an AI-powered product or service, build model routing from the start. It’s far easier to do at the design stage than to retrofit later.

And if your AI bill is creeping up and you’re not sure why, get someone to look at it. The savings are usually sitting in plain sight.

A Note on Pricing Changes

AI model costs have been falling consistently. GPT-4, which cost £24/£48 per million tokens at launch, has effectively been replaced by models that are faster, better, and a fraction of the price. This trend will continue.

The implication: lock in flexible architecture. Don’t hardcode a specific model into your systems. Use an abstraction layer that lets you swap models when better options emerge. Which, in this market, is roughly every six months.

Stuart Watkins is the founder of Devstars LWDA, a Jersey-based digital agency specialising in GEO, technical SEO, and AI-powered growth systems. If you want to put AI model costs to work for your business, our OpenClaw AI agent platform in Jersey is a practical starting point. That is worth a conversation.

Custom Software & Automation

GEO & Next Generation Marketing

Data & Intelligence

Custom Software & Automation

GEO & Next Generation Marketing

Data & Intelligence

What AI Models Actually Cost — And How to Stop Wasting Money on the Wrong Ones

First, Let’s Talk Tokens

The Main Models and What They Cost

Claude (Anthropic)

OpenAI (GPT)

Gemini (Google)

The Mental Model: Three Tiers of AI

Matching Work to Models: A Practical Guide

Use Tier 1 for:

Use Tier 2 for:

Use Tier 3 for:

How We Cut Our AI Model Costs by 75%

The Cost of Not Thinking About This

What This Means for You Right Now

A Note on Pricing Changes

Ready to Build for Growth?