Journal

Sometimes the right answer is not an LLM.

The most valuable thing a senior team can say is "you don't need this." A field guide to diagnosing the actual problem before reaching for a model.

A prospective client once arrived with a budget, a deadline, and a sentence: “We want an AI copilot for our operations team.” It was a good sentence. It was also the wrong place to start. Two weeks later we’d shipped a gradient-boosted forecasting model, declined most of the budget, and the operations team had what they actually needed. No copilot in sight.

This happens more than the industry likes to admit. The pressure to “do something with AI” is real, and a large language model is the most legible thing to point a board at. But the job of a senior partner is not to build the thing you asked for. It’s to make sure the thing you build is the thing that solves your problem.

Diagnose before you build

Most failed AI projects didn’t fail at execution. They failed at framing — the moment someone decided what to build before understanding what was broken. A model is an answer. Before you commit to an answer, it’s worth being precise about the question.

We treat the first phase of any engagement as a diagnosis, not a kickoff. The deliverable isn’t a model; it’s clarity about whether one is warranted, and which one. Often the most expensive part of the work — an LLM, a vector store, an eval harness, an inference budget — turns out to be unnecessary.

The instinct to reach for the most powerful tool is the same instinct that makes a problem expensive to solve.

The questions we ask first

Before any architecture, we work through a short list. None of them are about models:

  • What does the output actually need to do? A decision, a ranking, a forecast, and a paragraph of prose are four different problems with four different right tools.
  • Is the data tabular? If the real shape of the problem is structured rows and columns, a classical model will usually be cheaper, faster, more accurate, and far easier to explain than anything generative.
  • How wrong can it afford to be? Error tolerance dictates everything downstream — oversight, evaluation, and whether a probabilistic system is appropriate at all.
  • Who has to trust it? A regulator, a clinician, and an internal analyst have very different relationships with a black box.

What “no” actually buys you

Declining to build the obvious thing is not caution for its own sake. It buys three concrete things. Lower cost: a classical model has no token bill and runs on hardware you already own. Higher trust: a system your team can interrogate is a system they’ll actually use. Faster delivery: the right tool, scoped to the real problem, ships in weeks, not quarters.

The opposite — saying yes to everything — is how organisations end up with an impressive demo and a maintenance burden nobody can justify. We’d rather cancel our own statement of work than hand you that.

A note on incentives: an agency paid by the size of the build has every reason to make the build bigger. We try to align against that. The engagements we're proudest of are often the smallest ones — the times we talked a client out of spending money.

None of this is anti-AI

We build with large models constantly, and when the problem genuinely calls for language understanding, retrieval, or generation, we go all in — with evaluation, guardrails, and the operational rigour that production demands. The point isn’t to avoid the technology. It’s to deserve it. A model you reached for because it fit the problem will always outperform one you reached for because it was in the headlines.

So before the architecture diagram, before the inference budget, before the kickoff: tell us what’s broken. Sometimes the honest answer is a model. Sometimes it’s three weeks and a far simpler one. Either way, you’ll know why.

Yoonefi — Studio notes Talk through a problem