Four Pillars
Four questions a board should ask before commissioning another agentic AI pilot.
Most agentic AI programmes fail not because the technology disappoints, but because the operating model around it was never honestly mapped. A diagnostic, designed for boards and executive committees to run on themselves before the next pound is committed.
By Gartner’s estimate, more than forty percent of agentic AI programmes currently underway in enterprise will be cancelled by the end of 2027. The reasons given — escalating costs, unclear value, weak risk controls — sound technical, but they are not. They are operating-model failures dressed up in technical language. The technology, in almost every case, did what it was supposed to do. The organisation around it did not.
Boards reading this will recognise the pattern in their own pilots. A promising proof-of-concept that never scaled. A use case that worked in the sandbox and broke in production. An ambitious six-month transformation that quietly became an indefinite one. The instinct, when these stall, is to ask harder technical questions: which platform, which model, which vendor. The instinct is wrong. The questions that decide whether an agentic programme produces value are organisational, not technical, and they are the same four questions in every firm we have worked with.
This is not a hypothesis. It is the recurring conclusion of the research that has matured around agentic AI deployment in the last eighteen months. Microsoft’s 2026 Work Trend Index found that organisational factors — culture, manager support, talent practices — account for sixty-seven percent of measured AI impact, more than twice the impact of individual factors like mindset and skill. McKinsey’s 2025 work on agentic organisations identified governance, work design, and learning systems as the three categories of intervention that separated value-producing programmes from cancelled ones. Bain reached a structurally similar conclusion in its 2025 Technology Report. The technology, in this body of evidence, is not the variable that explains success or failure. The operating model is.
The four questions that follow form the spine of the Fuchsia Agentic Operating Model — the framework we use to diagnose where a firm actually is, before any redesign work begins. They are deliberately framed not as features to install but as honest questions a board should be able to answer about itself. Most cannot, which is the point. The exercise of answering them carefully, in good faith, surfaces more useful information about a firm’s readiness than any number of pilot reviews.
The four questions
Is your work designed for human cognition, or for what agents make possible?
Most processes inside mid-market firms were designed around the scarcity of human attention — sequential handoffs, departmental silos, queues. Agents make that scarcity assumption obsolete. The question is whether your highest-volume workflows have been mapped end-to-end in the last twelve months with that assumption explicitly removed. Not optimised for speed; redesigned for a world where intelligence is no longer the constraint.
Where does human judgement sit, and is the answer written down?
Agents act. People decide. Where the boundary falls — what an agent may resolve autonomously, what must escalate, what is reviewed afterwards — is the most consequential design decision in an agentic programme. In most firms it has not been designed at all. It has emerged by accident, is described nowhere a regulator could read it, and varies subtly from one team to another in ways no one has the standing to challenge.
Does what your agents learn become institutional memory, or does it disappear?
An agent that processed ten thousand cases should perform measurably better than one that processed ten. Most agents in production today do not, because the learning — the corrections, the patterns, the edge cases — was never captured anywhere reusable. The firm has paid for the cases without keeping the intelligence. Worse, when a senior subject-matter expert leaves, the firm loses the only memory of what was actually learned.
Can you reconstruct, on demand, why an agent made the decision it did?
For an agent decision made thirty days ago, can your firm produce the reasoning trace, the data used, the policy constraints applied, and the human approvals that wrapped it? If the answer is anything other than yes, the SMCR question of accountability remains unanswered — and the next supervisory dialogue, whether prompted by a complaint, a thematic review, or simply the FCA’s biennial AI adoption survey, will surface that fact uncomfortably.
How to use the four questions
The discipline these questions enforce is simple but unusual: they require honest answers about the present, not optimistic answers about the future. The most common failure mode, when a board first runs through them, is to answer the question the firm wishes it could answer rather than the one it can. Pillar I becomes “we have a process improvement programme” rather than “no, our top five workflows have not been mapped end-to-end with the assumption of abundant intelligence.” Pillar IV becomes “we have a governance framework” rather than “no, we could not, today, reconstruct the reasoning behind the agent decision a customer is complaining about.”
Resisting this gravity is what makes the diagnostic useful. A board that runs through the questions seriously will, in our experience, surface three things. One, that the gap between current state and the operating model an agentic programme actually requires is wider than it had assumed. Two, that the gap is not evenly distributed — almost every firm is materially stronger on one or two pillars and materially weaker on the others, in patterns that correlate loosely with sector but vary widely within it. Three, and most importantly, that the pillars are interdependent: weakness in governance (Pillar IV) makes scaling impossible regardless of how good the work architecture (Pillar I) looks on paper, and strength in owned intelligence (Pillar III) is undermined by the absence of a clear human-agent interface (Pillar II), because there is no reliable feedback loop for the learning to compound through.
The right next move, once these are visible, is to pick the weakest pillar — not the strongest — and treat it as the load-bearing redesign. Most firms instinctively reach for the pillar they already understand best, because progress feels easier there, the team is already in place, and the board paper writes itself. This is the wrong move. The pillar that produces the most value is the one currently producing the most drag, and a programme that strengthens the weakest pillar makes the entire system perform better even before the next pilot is commissioned.
“The pillar that produces the most value is the one currently producing the most drag.”
This sequencing logic is not intuitive, and it cuts against a particular kind of executive instinct — the instinct that says we should build on our strengths. In a stable system, building on strengths is sound advice. In an interdependent system where the weakest component constrains the whole, it is precisely the wrong advice.
What this looks like in practice
Two recent examples, composited from our work and conversations with mid-market firms, illustrate how the diagnostic plays out when it is taken seriously. In both cases the firm came to the exercise expecting one answer and left with a different one.
The weakest pillar wasn’t where the executive team thought.
The firm, an £800m specialty insurer, had three agent prototypes underway in claims triage. The CTO’s view, going into the diagnostic, was that the firm’s weakest pillar was Embedded Governance — the regulatory pressure he was under from the CRO made that the obvious answer. The four questions, run honestly, surfaced something different. Governance was thin, but it wasn’t where the actual value drag was. The drag was in Owned Intelligence.
The three prototypes had been built by three different teams — underwriting operations, group transformation, and an embedded squad from a system integrator. None of them shared training data, decision logs, or learnings. Each was relearning, from zero, what the others had already worked out. The firm had paid for the same intelligence three times, and the marginal performance improvement of any one prototype after twelve months was indistinguishable from the variance of the first month.
The redesign was not another pilot. It was a shared knowledge layer the existing agents could query — cheaper than a single new pilot, deployable in eight weeks, and immediately uplifting the performance of all three existing systems. The diagnostic redirected the firm’s twelve-month investment from a fourth pilot toward an infrastructure layer that made the first three actually work.
The pilot was fine. The architecture around it wasn’t.
For a £2bn wealth manager, the diagnosis was different. The agent pilots — in client onboarding and suitability reporting — were producing visibly useful output. The COO was pleased with what the technology team was building. But the four questions surfaced two problems simultaneously, both in the human-side pillars.
Pillar II was undocumented. Each agent had an escalation policy, but the policies had been written by the team that deployed each agent, varied subtly from one to another, and were nowhere consolidated for the senior manager who would be accountable under SMCR. Pillar IV was the more acute problem: second-line risk had no way to challenge what the agents did, the COO had no view of agent activity beyond a weekly screenshot pasted into a slide deck, and the firm could not have produced a reasoning trace for any individual decision on demand.
The redesign was not a new platform but a monitoring and governance layer over the existing one, paired with a written human-agent interface policy that consolidated the escalation rules into a single document the senior manager could sign. Within sixty days, the second line could intervene in real time. The COO could answer the question the board was about to ask. And the next pilot — in regulatory reporting — was deployed three weeks faster than it would have been, because the governance architecture it would need already existed.
Neither firm needed a new agentic platform. Neither needed a more sophisticated model. Both needed the operating model around the agents they already had to be honestly designed. This is, in our experience, the typical pattern: the pilots are usually fine. The architecture around them is what fails.
Who should run this diagnostic
A question several readers will be asking at this point is whose job this actually is. The answer matters, because in most firms the question of “who owns agentic AI” has not been settled cleanly — and the wrong owner produces the wrong diagnosis.
The diagnostic is most usefully run by the Chief Operating Officer, with active participation from the Chief Risk Officer and the Chief Compliance Officer, and supported (rather than led) by the Chief Technology Officer. The reasoning is structural. The four pillars are operating-model questions: workflow design, accountability, institutional learning, governance. They are not technology questions. When the diagnostic is led by the CTO or Head of AI, it tends to converge on platform decisions and vendor selection — both of which are downstream of the operating-model question and considerably less consequential. When it is led by the COO with the CRO and CCO at the table, the conversation tends to surface the SMCR, Consumer Duty, and operational resilience tensions that determine whether an agentic programme can actually scale inside a regulated firm.
For boards, the implication is practical: if you are about to commission an agentic AI programme, look at the room and ask who is leading the work. If the answer is the CTO with the COO supporting, the diagnosis will be technical and the redesign will be small. If the answer is the COO with the CTO supporting, the diagnosis will be structural and the redesign will be material. The composition of the room determines the depth of the answer.
Boards considering this work often ask whether running the diagnostic delays the programme. It does not. Firms that have mapped themselves against the four pillars consistently deploy the next pilot faster, because they know which redesign work to do alongside it, and they know which one to do first. The diagnostic is a force multiplier on the work that follows, not a substitute for it.
A final thought
The seductive thing about agentic AI is that it produces visible activity quickly. A pilot can be stood up in weeks. A demo can be impressive in hours. The unseductive thing about operating-model redesign is that the work is largely invisible: workflow maps, escalation policies, governance scaffolding, knowledge layers. None of it shows up in a board pack as a screenshot. It produces, instead, the kind of slow compounding that becomes visible only over quarters, when the firm that did the work begins to scale and the firm that did not begins to stall.
Boards making the case for agentic AI investment, both internally to their executive committees and externally to their shareholders, tend to over-index on the visible work. The pilots in flight. The use cases identified. The vendor selected. They under-index on the invisible work, because there is no demo to show. This is, predictably, where the next eighteen months of disappointment will come from. The pilots in flight today will look much the same in eighteen months. The architecture around them, or its absence, will be the variable that separates the firms that scale from the firms that quietly cancel.
Four questions. Answered honestly, before the next pound is committed. Most of the value in an agentic programme is decided here, not in the technology selection that follows. The firms that understand this in the next six months will look very different in eighteen months from the ones that do not.
Run the four pillars on your own firm
The Fuchsia Agentic Readiness Assessment is the diagnostic workbook we use during a Tier I engagement. It expands the four questions above into a structured instrument with rubrics, evidence prompts, and a maturity score — suitable for board or executive use. Available on request.
Request a briefing → Request the workbook →