Is AI Actually Cheap?

The Economics Don’t Add Up (Yet)

Hi friends,

This week it's just Pete and I, getting into something we'd both noticed over the last few weeks - a real surge in hostility toward AI, from booed graduation speakers to organised pushback against data centres. We wanted to pull apart where that anger is actually coming from, how much of it is fair, and what it can miss. It took us somewhere deeper than the usual "AI good / AI bad", and straight into this week's essay on whether AI is anywhere near as cheap as it feels - and who's really paying for it.

Three things in AI for SMEs this week: how AI is turning pricing into a real margin lever, the adoption gap most small businesses are still stuck in, and why the "AI is killing graduate jobs" headlines don't quite hold up.

We’re still running the Free AI Audit at otherstuff.ai/ai-audit. If you're trying to work out what AI means for your business, it's a useful place to start.

Now to this week’s Good Stuff.

Is AI Actually Cheap? The Economics Don’t Add Up (Yet).

Peter Steinberger, the creator of OpenClaw, and now an engineer at OpenAI, recently racked up USD $1.3 million in API costs in a single month by running around a 100 Codex instances simultaneously on an open-source project.

His bill covered 603 billion tokens across 7.6 million requests over 30 days, and is the most visible example of what happens when agentic software development is run without budget constraints, and how quickly costs escalate when agents operate continuously at scale.

Steinberger posted a screenshot of his bill on X, showing $1,305,088.81 charged to the OpenAI API, with GPT-5.5 as the primary model.

Luckily for Steinberger, his new employer, OpenAI, is covering the cost. Steinberger joined OpenAI back in February 2026, and this spending is being treated as a research investment to try and understand what software development might look like when token economics are not a limiting factor.

Peter Steinberger X Post – source: X

But this raised some interesting questions, so Pete and I chatted about this on this week’s episode of The Good Stuff.

1. The Price You See is a Lie, Probably.

Interestingly, Steinberger later clarified the $1.3M was from using Codex's Fast Mode, which burns credits significantly faster, and that disabling it would drop the raw API cost to around USD $300,000.

So a 4x swing sits inside the choice of execution mode. The "real" compute cost and the "billed" number are different by a factor that has little to do with the actual work getting done.

The cost of a fixed unit of intelligence, for the same task and same capability, has been falling something like an order of magnitude a year. So, cost per token for a given capability tier has been falling fast, but cost per task has been rising. There are two reasons for that.

One reason is Jevons paradox - make a resource cheaper to use and you end up using it more, so your total consumption explodes. The classic case is coal - more efficient steam engines led to more coal burned, not less.

The cheaper each token gets, the more lavishly we throw them at problems, so the bills rise even as unit cost falls.

Another reason is one that Pete highlighted on the pod, and it’s this phenomenon of the frontier model quietly running five times in the background. This has a name. It’s called inference-time scaling, where you’re spending more compute at inference to get a better answer.

There’s also capability expansion.

The same task now routes through vastly more tokens than it used to, because the frontier model’s way of getting a good answer is via reasoning chains, multi-sampling, tool calls, and agentic loops running in the background. So, while it looks like you’re paying less per token, even if you’re doing the same amount of work, you’re likely paying far more per task.

In fact, a reasoning model can spend many multiples of the output tokens a plain completion would, before you even count multi-sampling, tool calls, and retrieval.

So both narratives are true at once, just measured against different denominators. The price of a fixed unit of intelligence is collapsing while the frontier of what you attempt with it expands to eat the savings and then some.

You should be getting an efficiency dividend but you never really get to bank it. Every time intelligence gets ten times cheaper, you spend a commensurate amount by buying ten times more intelligence on the same task.

2. So Who Pays and Who Profits?

But there’s another way of looking at it. The cost of intelligence is deflating by design, and that deflation is a gift to users and a guillotine for the labs that produce it. The question worth exploring mightn’t be whether the economics work, but rather who's holding the bill versus who's capturing the value.

Right now, the cost of tokens is being heavily subsidised by the frontier labs and most of us know that isn’t sustainable long term.

The labs are pricing below their own marginal cost in order to race ahead and capture the market. So your stock $20 a month subscription loses them money on a heavy user, and their investors’ capital eats up the gap.

For reference, a USD $200/month Codex Pro plan can currently provide anywhere up to USD $5,000 in API-equivalent value, and Codex and Claude Code subsidise inference well below API rates to win adoption.

This is fairly classic platform economics.

Uber in 2015, AWS in the early days, every platform land-grab in history priced well below cost in order to rapidly gain market share, so the effect is that token price is temporarily untethered from actual cost for strategic reasons, the same way an Uber fare in 2015 was a terrible basis for predicting what a ride costs now.

So, relentless per-token deflation means the model labs are in a capital-incinerating Red Queen race. They’re spending billions to train a frontier model whose pricing power evaporates in roughly twelve months, while users get a massive subsidised wealth transfer - at least for now so long as the subsidies continue.

If the subsidy ends, and it will, at that point prices reprice toward actual cost and the downstream benefit shrinks.

The idea of a Red Queen race comes from Lewis Carroll’s novel Through the Looking-Glass, where the Red Queen tells Alice that in her country "it takes all the running you can do, to keep in the same place." You sprint flat out and your position doesn't improve, because the ground itself is moving under you.

Evolutionary biologists borrowed it for arms races, where a predator gets faster, so the prey gets faster, and after a thousand generations both are dramatically faster and neither is winning. All the energy goes into not falling behind rather than into getting ahead.

I think the frontier labs may be trapped in a similar way.

A lab spends an enormous sum on training compute, talent, and data, to reach the frontier. For a brief period they have the best model and real pricing power, but two things erode it almost immediately.

First, competitors tend to achieve parity pretty soon after, usually within months, because once it's known that a capability is achievable, reproducing it is far cheaper than discovering it.

Second, their own future model, and the broader efficiency curve, makes today's frontier capability dramatically cheaper to serve within a year. So what they spent billions to build commands a premium for a few months and then collapses toward commodity pricing, right as they’re spending the next multiple of billions on the new model that will repeat the cycle.

They’re running flat out to stay in the same place.

This is also what distinguishes it from a normal capital-intensive business, in that the asset depreciates against their own success.

A chip fab is expensive, but a fab you built in 2020 is still pumping out value in 2026. A frontier model's economic moat has a half-life measured in months, even though the model still works fine. You're not amortising the cost over a decade; you're amortising it over the window before the next release and that resets the price floor.

Importantly though, the Red Queen race only bites at the frontier. Models are already good enough for many of the commercial tasks we throw them at and we don’t really need them to materially improve for those tasks.

The moment a task is "good enough" on a model that exists today, that capability stops depreciating against new releases because you’d stop using the newer models for that job. Last year's model does it, runs cheaper every month as inference costs fall, and insofar as that task is concerned, a frontier sprinting ahead is irrelevant.

A huge and growing share of real-world tasks are ‘solved and commoditising,’ and for those tasks the economics are cheap, stable, and improving. The labs' problem is they can only charge a premium for the shrinking band of work that still needs the frontier, while the commoditised base, where most actual value today gets used, earns commodity margins.

The other side of this is the tasks that commoditise first tend to be the high-volume, low-stakes ones, like drafting, summarising, classifying, and routing. Those are exactly the tasks that were low-value-per-task to begin with. So, the work people pay a premium for may disproportionately remain the work that still benefits from the frontier, where the marginal cost of renting the best is trivial against the value at stake.

What Jevons says is that the task universe itself isn't fixed, as intelligence gets cheaper, whole categories of work that were never worth attempting become viable, and those new categories are born at the frontier, where they need the best model and will pay for it.

The frontier premium just migrates. It keeps abandoning solved ground and re-homesteading on newly-possible ground. The labs can never rest on an installed base, because their entire monetisable surface is the perpetually-moving edge of the frontier. The day they stop expanding the universe of doable tasks is the day their premium collapses onto commoditising ground with everyone else.

3. How Do You Prepare for the Subsidy Being Cut?

The big theme here is that the model is inherently commoditising, deflating, and migrating part of the stack, the component whose price collapses in twelve months and whose frontier keeps moving. So, welding your business to any single model, or any single lab, is betting on the one part of the system guaranteed to change underneath you.

So preparation isn't really about the model at all, but rather about not being held captive to it.

Because when the subsidy ends, a rented model can be repriced overnight, or it can be deprecated, and the version your workflow depends on just vanishes at the whims of the lab, and it can be re-aligned into refusing things it did last week.

If your entire business is fused to one provider, their terms become your terms. That's not really a model-quality risk, but rather a dependency risk, and it's almost completely unpriced right now, precisely because the subsidy makes single-provider life feel so cheap and easy.

The hedge is optionality.

Remain model agnostic and able to swap the cheapest good-enough model for your tasks at any time, frontier when a task genuinely needs it, commoditised and cheap when it doesn't, a different provider entirely the day one of them changes the deal, and private inference when you don’t want Anthropic knowing everything about your business.

If the model is interchangeable, then the durable part that's actually yours was never really the model, but the layer above it - the workflow, the accumulated context in your business, your processes, the environment where your people and your agents actually do the work together.

That's where your edge lives, and unlike the model, it doesn't deflate or get deprecated.

The new, exciting, frontier work will always be out there at the rented edge, and that's fine, you can rent it. The durable, ownable value sits in the operating layer, and that's the part worth building to own.

An operating layer that stays model-agnostic, lets you swap providers freely, and actually makes humans and agents work together well is hard to build and run.

Many people end up doing the opposite - welding their business processes to one provider's tools, like Claude Cowork, because it's the path of least resistance, quietly taking on all the dependency risk that comes with it.

Building an operating environment for people and agents to do their best work together inside of a shared business context that, remains model agnostic, is a problem worth solving and it's the one we've been building Wingman around —> Get in touch if you’d like to learn more about Wingman.

We got into this and more this week in the Big Episode 59 of The Good Stuff.

Three Things in AI for SMEs.

1. AI is turning pricing into a profit lever small businesses can finally pull — Margin Up

The Small Business & Entrepreneurship Council calls pricing one of the most powerful but historically underused levers in small-business profitability — and its 2026 data shows AI is making that once labour-intensive work simple enough for a lean team to actually do. McKinsey's research points the same way: firms that redesign how they work around AI, rather than bolting it on, report roughly 10% higher profit margins.

What it means for you: The margin gain isn't in the tool, it's in what you do with it. Revisiting how you price — not just automating admin — is where the profit actually shows up.

SBE Council →

2. Most small businesses are still just experimenting with AI — Capital Up

A new SAS study found nearly 70% of small and mid-sized businesses are still in the "experimental" or "opportunistic" phase — trying tools without folding them into a real strategy. The barrier isn't access anymore; it's data, alignment, and a plan.

What it means for you: While most competitors are stuck piloting, getting even one process genuinely embedded is where the lead opens up. Fewer, deeper integrations beat another round of experiments.

BenefitsPro →

3. The "AI is killing graduate jobs" story is shakier than the headlines — Risk Down

Despite the doom, US employers project graduate hiring up 5.6% for the class of 2026 (NACE), 77% of grads landed a role within three months (up from 63% a year earlier), and IBM is tripling its entry-level hiring — while moving juniors off routine tasks toward client-facing work.

What it means for you: It echoes what we got into on the pod: AI is reshaping tasks, not deleting jobs wholesale. The roles that hold up lean on judgment and relationships, not grunt work — worth weighing if you're hiring or re-skilling.

CNBC →

That's all for this week.

If this resonated, we’d love for you to forward this newsletter to someone who might enjoy exploring these ideas too. See you next week!

Cheers,
Pete & Andy