Gartner’s June 2025 forecast is unusually specific: more than 40% of agentic AI projects will be canceled by the end of 2027 — due to escalating costs, unclear business value, and inadequate governance. The same analyst note observes that only about 130 of the thousands of vendors claiming agentic AI capabilities are real — the rest are doing what Gartner calls “agent washing”: rebranding RPA scripts, chatbots, and assistants as agents.
The number is striking. But to anyone who has tried to deploy a multi-agent system into a real enterprise environment over the last 18 months, the surprise is that the failure rate isn’t higher.
The projects that fail don’t fail because the models can’t reason. They fail because the data architecture, governance layer, and operating model around the agent were never built. The agent shipped on top of foundations that couldn’t hold it. The 40% is a foundation problem, not a model problem.
Why agentic projects fail
Four failure modes account for most of the casualties.
Architecture deficit. The agent needs to read consistent state, write idempotent actions, and recover from partial failures. None of that works when the data layer is a tangle of inconsistent schemas, brittle ETL, and undocumented integrations. The agent makes calls; the calls land in the wrong place; the team patches the agent; the agent gets weirder. Six months in, no one can explain its behavior and the project gets shelved.
Hype-driven scope. The most-canceled projects are the ones that began as “AI-powered” reframings of work the buyer already had a solution for — a chatbot that became an “agent,” an RPA script that became a “workflow agent,” an assistant that became an “autonomous co-pilot.” The actual agent layer adds cost without adding capability. The first procurement review catches it.
Governance as afterthought. A surprising number of pilot projects ship without a written human-in-the-loop policy, without model-version pinning per inference, without drift dashboards, and without a tested rollback runbook. When the agent does something unexpected in production — and they do — no one knows whether to stop it, roll it back, escalate it, or ignore it. The next incident is the last one before cancellation.
No eval substrate. “We’ll know it’s working when users like it” is not an evaluation methodology. Without continuous evals — prompt regression, tool-use audits, jailbreak attempts, bias measurements — quality drift is invisible until a customer-facing failure makes it everyone’s problem.
The order matters
Every one of those failure modes traces back to the same root cause: organizations deploy agents before the substrate underneath can support them.
Our thesis, and the order we engage every project in, is: data architecture first, then the application surfaces that actually need rebuilding, then agents where they pay off.
This is contrarian in 2026 because most consultancies want to start with the most expensive piece — the agent. The architecture review is unglamorous; it doesn’t make a press release; it slows down the demo. But it’s the single intervention that prevents a project from joining the 40%.
The architecture review is unglamorous; it doesn’t make a press release; it slows down the demo. But it’s the single intervention that prevents a project from joining the 40%.
There is a reason the architecture-first sequence is so rare in services pitches: it’s harder to sell. It involves saying not yet to the work the buyer most wants to start. Boutiques and Big-4 alike find it easier to take the agent engagement and figure out the architecture as they go. We don’t.
What architecture-first actually looks like
Concretely, an architecture-first engagement opens with:
- An audit of the data layer the agent will read from and write to — schema consistency, freshness, observability, lineage.
- A decision about the agentic runtime. Our default is Amazon Bedrock AgentCore, because it’s framework-agnostic (CrewAI, LangGraph, LlamaIndex, Strands), model-agnostic, and ships with the production primitives — memory, identity, secure tool access via an MCP gateway — that we’d otherwise build ourselves.
- A decision about the orchestration framework. LangGraph for state-machine workflows; Temporal underneath when the agent needs to survive crashes, restarts, or 60-second-plus execution windows.
- A decision about the eval substrate. Braintrust by default; alternatives are Langfuse and Helicone. Evals are written before the agent ships, not after.
- A written human-in-the-loop policy that names which decisions the agent makes alone, which it escalates, and what the escalation surface looks like.
- Model-version pinning per inference, structured inference logging, drift dashboards, and a rollback runbook that has been tested at least once.
None of this is exotic. All of it is missing from the projects that get canceled.
We design these systems on the Model Context Protocol because it’s the only agent protocol the frontier has agreed on — Anthropic donated MCP to the Linux Foundation in 2025, and OpenAI, Google, Microsoft, AWS, Cloudflare, and Bloomberg are all behind it. By early 2026, more than three-quarters of enterprise AI teams reported running at least one MCP-backed agent in production. MCP-native isn’t a positioning claim; it’s the only sensible default.
The discipline of decline
The other reason we don’t end up in the 40% is what we won’t take on.
We don’t accept agentic engagements without an architecture review first. We don’t reframe chatbots or RPA scripts as agents to make a deal feel bigger. We don’t start engagements with fewer than 8 weeks of runway, because nothing that matters can be built in 4.
Saying no to work is a discipline most consultancies have lost. Recovering it is, in our view, the cheapest insurance policy any buyer of agentic AI can take out in 2026.
The shape of an engagement
The first conversation we have with a new client about agentic AI is almost always our Architecture Readiness Sprint: two weeks, fixed scope, deliverable is an audit of the data architecture the agent will sit on, plus a written 12-month roadmap for what to build, in what order. About a third of these conclude that the right next step isn’t an agent at all — it’s six to twelve weeks of data architecture work first.
That answer alone has prevented more projects from joining the 40% than any framework, model, or runtime we’d recommend.
→ Request a scoped proposal for an Architecture Readiness Sprint