AI Works Better Inside a System

Models need structure to do useful work

Hi friends,

This week Pete and I welcomed returning guest to The Good Stuff, Deadman Oz, and we discussed how much of the recent AI progress is the models themselves getting smarter, or from improvements to the harness - the systems and structure wrapped around the model that govern and support how it does the work. That's what this week's essay gets into.

Three things in AI for SMEs this week: why the AI agents that actually work are the narrow, well-structured ones, the shadow AI risk quietly building as staff paste company data into chatbots, and what fresh Fed research says about how far ahead AI use really is.

Now to this week’s Good Stuff.

Models or Harnesses?

It’s fashionable to debate whether Opus 4.8 is better than GPT5.5 and which models small businesses should be using for their work, but a better question is where should the model exist inside your work?

Businesses already understand the difference between intelligence and process.

A business applies processes, systems, approvals and hierarchy to support people, because we optimise for reliable, dependable work. Even highly skilled people produce variable work when there is no system around them.

The same should be true of AI.

A law firm doesn’t rely on intelligence alone. It has matter intake, document review, version control, layers of review and approval. Equally, a sales team doesn’t rely on intelligence alone. It has qualification frameworks, CRM processes, reporting and escalation paths.

The question is whether AI should be treated any differently.

The Orchestration Question

In any system that combines a language model with tools and a series of steps to work through, something has to decide what happens at each step. Let’s call that ‘control flow’, and this can play out a couple of ways.

In the first, control flow lives inside the model.

The agent reasons about the goal, selects an action, executes it, observes the result, and reasons again about what to do next, looping until it judges itself complete.

The model is the orchestrator, and every step in the system is decided by inference. Inference just means one run of the model, where you give it an input and it gives you an output back. This is the "agent is the computer" design.

In the second, control flow lives inside deterministic code.

The code moves through a fixed sequence of steps that you designed in advance, and calls the model only at the specific points where genuine judgment is required, like scoring a lead or ranking a shortlist.

The model answers, and the code decides what to do with it.

In this approach the model has been demoted from driving the process to advising it, so you give up some flexibility, because the code can only walk paths you designed ahead of time, and in return you get predictability, because the spine of the system is now deterministic.

Everything else is handled by the code around it. That surrounding code is the harness. It decides when to call the model and what to do with each reply.

This is the "computer is the computer, and it can talk" design.

For example, let's say you ask the system to research twenty companies and flag the ones that might make good sales prospects.

In the agent-is-the-computer version, you hand the model the goal and let it run. It decides to start with company one, searches the web, reads what comes back, decides that wasn't enough so it searches again, moves to company two, decides company two needs a different approach, and so on.

When it works, it adapts intelligently to whatever each prospect throws up. When it doesn't, it might research twelve companies instead of twenty, or change what counts as a ‘good’ prospect halfway through, and you often can't tell why.

In the computer-is-the-computer version, you've created a loop, so, for each of the twenty companies, you run these three fixed steps:

fetch the data,
then ask the model to score it against a defined bar,
then file it based on the score.

The model does the analytical work around scoring the prospect, but it never decides the flow.

The loop ensures all twenty get researched in the same way, and the scoring step is the only place variability appears. You've traded the ability to improvise per-company for the certainty that the process runs the same way every time and is auditable end-to-end.

Why the locus of control matters so much

This isn't merely a stylistic preference, it determines how reliable the system can be, and it does so through compounding.

Start with non-determinism. A model call is not like a normal software function. The same input can produce a different answer on a different run, because the model is choosing from a range of possible answers rather than returning one fixed result.

On its own, a single slightly-unreliable step isn't alarming. The problem is what happens when you chain them together. To end up where you intended you need every step in the chain to land correctly, and the odds of that are each step's odds multiplied together.

A step that works nine times in ten sounds fine, but twenty of them in a row leaves you on the intended path only about an eighth of the time. What matters is how many uncertain decisions you've stacked end to end.

In the agent-is-the-computer design, every step is resolved by the model, so the chain of uncertain decisions is as long as it can be.

Imagine every step as a coin that mostly lands heads and occasionally tails. One bad flip is manageable. The problem is that a long autonomous run asks the model to keep flipping, and each flip becomes another chance for the work to veer away from the original task.

The harness design keeps the chain short.

The deterministic steps add no uncertainty, so the model is called only where the task cannot be reduced to a fixed rule.

You also want to be able to see how the system got to its result, not just the result.

When the model owns the steps, the path is improvised at runtime and exists only as a by-product of the model’s reasoning. You can observe that it arrived somewhere, but the route is mostly a trail of breadcrumbs that can be hard to follow.

When the code owns the steps, the path is fully defined.

Constraint is the active ingredient

If long chains of model decisions cause the system to wander, the mitigation is to reduce the number of open-ended decisions you ask the model to make.

There are three useful patterns to examine here.

The first is the pipeline. A pipeline is a fixed set of steps designed in advance, with the output of each step handed cleanly to the next. The model is slotted in only where the task needs interpretation rather than a fixed rule.

You define the shape of the work up front rather than letting it emerge as the model reasons. That gives the model a smaller question to answer, and gives the system less room to drift.

The sales prospecting loop from earlier is a good example of this.

In its raw form, the model decides the scope, method, order, and even what ‘good’ means, all as it runs. As a pipeline, those decisions become fixed structure, and the only thing left to the model is the score.

The second is the agent swarm. The obvious-looking alternative is to let several agents sort out roles among themselves and pass work back and forth, the way a human team self-organises.

This tends to multiply the problem though. Every handoff between agents is one more dice roll, on top of all the rolls already happening inside each agent, and in practice they often get confused and spend their effort coordinating rather than working.

The third is the disposable agent, which applies the same idea to memory. Rather than one long-lived agent that accumulates instructions over a long run and that you then hope keeps honouring them, you spin up a fresh agent for each unit of work, hand it only the context that unit needs, and discard it when it's done.

So pipelines constrain the flow of work, swarms are the cautionary case of what happens when constraint is missing, and disposable agents constrain how much context any one agent has to carry.

The important thing is to give the model the aspects of the work it’s useful for, and keep the rest in the system around it.

The steelman

The strongest case for the agent-as-computer view has three parts.

The first is accessibility.

Most users don’t want to think in pipelines or specify scaffolding. They want to state an objective in plain language and have the system work towards it. On this view, lower reliability is the price of not having to build anything.

The second is generality.

A pipeline works because someone designed the steps in advance for a known kind of task. A general-purpose agent has no such luxury. The range of possible requests is too broad, so past a certain point the model has to assemble the route itself.

The third is that drift may be a temporary phenomenon.

The harness design treats drift as something to contain. The opposing view that Deadman Oz raised is that as models improve, each step becomes more reliable and long chains of model decisions become less fragile. The proposed mechanism is reasoning, where the model keeps checking whether it is still on track and corrects itself when it wanders.

Predictability is the product

A business applies processes, systems, approvals and hierarchy, precisely because those structures mitigate risk and inconsistency in the work. Even highly skilled, excellent people produce variable output when there's no system around them. That structure is what makes the output consistent and reliable. This is a normal, accepted part of how good businesses run.

The harness is the same idea applied to AI. It's the surrounding system that makes the output more reliable, consistent and dependable, the same way it does for human work.

This is why the harness matters more than the model for a lot of business work.

A weaker model inside a well-built system can outperform a stronger model that has been handed the whole process, because the performance is coming from structure as much as intelligence. The model still matters, but often less than the structure around it.

Frontier models will keep opening up new kinds of work that weren't possible before but for most of the work in a business today, a commoditised model inside a good harness will more than do the job.

This is the problem we've been building Wingman to solve —> Get in touch if you’d like to learn more about Wingman for your business.

We got into this and more this week in the Big Episode 60 of The Good Stuff.

Three Things in AI for SMEs.

Three Things in AI for SMEs

1. The AI agents that actually work are the narrow, boring ones — Margin Up

After a year of hype, the 2026 verdict on agents for small business is no longer whether they belong in the toolkit, but which ones deliver and which just drain budget. The pattern that works is a stack of narrow agents, each owning one defined job across support, sales, research or admin, with a human kept on anything involving trust, money or legal risk. Kaizen AI Consulting Mean CEO's BLOG

What it means for you: The margin comes from output, not headcount. Give one repeatable process real structure, point a narrow agent at it, and keep yourself on the exceptions.

Kaizen AI Consulting →

2. Real AI use is running well ahead of the official numbers — Capital Up

Fresh St. Louis Fed work this month untangles a striking gap: firm surveys put AI adoption at roughly 18% by the end of 2025, while around 43% of workers report using AI on the job in early 2026. Much of that gap comes down to how the surveys ask the question, so the headline firm-adoption figures meaningfully undercount what's really happening. TechInformed St. Louis Fed

What it means for you: The edge goes to whoever moves now. Don't read comfort into "only a fifth of firms use AI" headlines, because your competitors' teams are likely further along than their own numbers show.

St. Louis Fed →

3. Shadow AI is the exposure building under the hype — Risk Down

Cyber executives are warning that staff are feeding company secrets into AI chatbots without realising where the data goes, leaving businesses exposed as they race to adopt. Mimecast's 2026 research found 80% of organisations worry about data leaking through generative AI, yet 60% still have no strategy to address it. The European Mimecast

What it means for you: You can drive this risk down without killing the productivity. Banning the tools just pushes the use out of sight, so lead with visibility, give your team a sanctioned private option, and put a one-page policy around it. Barracuda Networks

Barracuda →

That's all for this week.

If this resonated, we’d love for you to forward this newsletter to someone who might enjoy exploring these ideas too. See you next week!

Cheers,
Pete & Andy