Why Most Finance AI Pilots Die (And What the Survivors Do Differently)
Most Finance AI programs are built on the wrong unit of work. They count use cases and miss the two things that actually move the P&L — the end-to-end process redesign, and the finance ontology underneath it that turns a generalist model into a domain expert.
Ask a Finance leader how their AI program is going, and you will usually get a count.
Number of use cases live. Number of pilots in flight. Number of processes "AI-enabled." Maybe a percentage — "we've AI-enabled 40% of the close." The count is the language the organization has trained itself to speak, because it's the easiest thing to measure and the easiest thing to report.
The problem is that the count has almost nothing to do with whether the program is actually working.
Walk into any large Finance function today and you will find dozens of AI pilots, a handful of production deployments, and a CFO who genuinely cannot tell you whether any of it is meaningfully changing the business. Not because the CFO is disengaged. Because the unit of measurement is wrong.
The pilots that die, die for reasons that are mostly not technical. Here's the pattern I see in the market — what kills them, what the survivors do instead, and what almost no one is yet doing well.
The use-case trap.#
This is the one I think about the most. And the one I think is actively slowing down Finance AI at the enterprise level.
The whole industry has borrowed the "use case" frame from personal AI. At enterprise scale, it doesn't translate.
Use cases are the right unit for personal AI. When you're using Claude to draft an email, summarize a board pack, clean up a spreadsheet, pull together a memo — those are exactly the right unit. Small, incremental efficiency gains. You ship one, feel the benefit, move on. Stack a hundred of them and you've meaningfully changed how you work, because the distance between the small gain and your experience of work is zero.
Use cases do not work the same way at enterprise scale.
That "small wins stack up" math collapses the moment you move from a single person to an organization. A 3% efficiency gain on a Finance sub-process, multiplied by however many people touch it, does not show up on a P&L the CFO feels. A 5% accuracy lift on an ML forecast feeding one piece of FP&A is not a material change to how planning works for the business. You do not move enterprise OpEx in a meaningful way by stacking incrementalism.
This is why so many Finance AI programs feel vaguely frustrating from the CFO seat. You've funded a wall of use cases. Machine learning for forecasting over here. LLM summarization at month-end over there. A copilot in the close. A chatbot in the service center. Each one works in isolation. None of them add up to anything the CFO can point at and say, "this is how we reinvented Finance."
The reason is simple: use cases are the wrong unit. Finance does not run on use cases. It runs on processes.
The controllership close is not a use case. It's a process. Planning is a process. Reporting is a process. Tax, treasury, order-to-cash, procure-to-pay — these are all processes. Each has ten, twenty, fifty "use cases" buried inside it. Bolting AI onto those use cases one at a time, without redesigning the process they live in, is like putting a better engine in a horse-drawn carriage. You'll get incremental lift. You will not get a car.
What moves the needle is an end-to-end process redesign with AI assumed in the stack from day one.
The right question isn't "what AI can we apply to this step?" The right question is:
"What does this process look like if we rebuild it from scratch, assuming AI is a first-class part of the stack?"
What steps disappear? What consolidates? What work moves from human to agent? What operating model does Finance run underneath it? What does the org chart look like when this lands?
That is a very different kind of project. Harder to scope. Harder to sell up. Harder to pilot. It requires the CFO and the CIO genuinely partnered, not handing off. And it is the only kind of Finance AI work that moves the P&L in a way the CEO notices.
The finance ontology — the secret sauce nobody funds.#
Redesigning the process is necessary. It is not sufficient.
The second thing that separates Finance AI programs that create real value from the ones that quietly die is something almost no one is talking about yet: the finance ontology — the domain layer underneath the AI.
Most Finance AI today is generalist AI politely pointed at Finance. The model is smart. It can read, summarize, draft, and synthesize. But it has no structured understanding of your chart of accounts, your entity hierarchy, your consolidation rules, your close cycle, the difference between an accrual and a reclass, why a flux explanation is inadequate if it doesn't tie to a driver, or what materiality means in the context of a specific P&L.
That is the gap between a generalist and a specialist. And it is the difference between AI that produces plausible output and AI that produces decision-grade output.
The analogy I keep coming back to: generalist AI is an analyst. The finance ontology is what turns it into a PhD.
An analyst with no accounting background can pattern-match across a pile of reports and write you a summary. It will sound reasonable. A seasoned controller or FP&A leader will read three sentences and immediately know the summary misses the point — the flux that actually matters, the intercompany elimination that shifted the answer, the accrual timing that explains the variance. The analyst is doing the surface-level work. The domain expert is doing the work that actually informs the decision.
If you want Finance AI that operates at the domain expert level, you cannot skip the ontology. The way I think about it, it lives in four layers:
- The semantic foundation — your business architecture and driver models; the connected tissue of how your business actually fits together.
- The calculation and logic layer — KPIs, dimensions, the single source of truth for how every meaningful metric is defined and computed.
- The interpretation layer — materiality thresholds, vocabulary, narrative patterns. The part that separates a technically correct answer from one a controller would actually sign.
- The governance and security layer — how AI routes to the right data, respects permissioning, and honors your controls. Security built into the ontology itself, not bolted on at the tool layer.
This is the part that does not show up in a vendor demo. It does not get a line item in a pilot budget. It is invisible on the roadmap. And it is the thing that most reliably separates "we deployed AI" from "AI is now part of how Finance runs."
Build the ontology, and everything else compounds. Skip it, and you are asking a generalist to do a specialist's job, forever.
The execution patterns that still kill pilots.#
Everything above is about scoping and architecture. Plenty of pilots die on execution too, and the execution failures are the ones Finance leaders recognize fastest, because they have seen them in every technology wave before this one.
IT built it. Finance didn't champion it. The pilot gets scoped by IT, built by IT, demoed by IT, and dropped on Finance's doorstep. Finance was "engaged" — someone attended three steering committees — but nobody senior in Finance is personally accountable for adoption. The demo is impressive. The usage is zero. The pilot has no champion network to carry it through the integration friction that always shows up after launch.
Nobody said what "worked" was supposed to look like. The project started with a capability, not an outcome. "We're going to apply AI to forecasting" is not an objective — it is an activity. Without a baseline and a target, you cannot answer the only question that matters at the end: did this actually do anything? And an invisible pilot dies on the next budget review, because the only defense it has is "we deployed it." That is not a defense.
Both of these are solvable. Neither solves itself, and neither solves by adding more use cases to the backlog.
What the survivors actually do.#
The pilots that survive — the ones that compound into real change — share a pattern. It is not complicated, and none of it is about the model.
They start with a clear value objective. Before any code. Before any vendor. A named number the Finance leader is personally accountable for moving. Cycle time. Decision velocity. Forecast quality. OpEx. Something real, on the CFO's scorecard.
They have the right business sponsor. Not IT. Not a middle-management committee. A Finance leader senior enough to defend the program through the inevitable internal friction, and close enough to the work to scope it well.
They think bigger than use cases. They pick one or two end-to-end processes and redesign them with AI assumed in the stack. They fund the ontology work alongside it, because they understand that is where generalist AI becomes a specialist.
They let momentum compound. This is the part that surprises people: the data does not have to be perfect. The ontology does not have to be complete. The process redesign does not have to be finished to start producing value. You can start scrappy, ship something useful, and let the program build a snowball of credibility, proof points, and internal demand. The companies that wait for perfect data wait forever. The companies that start messy and iterate compound faster than anyone expects.
They treat adoption as the hardest problem. This is the one I would underline twice if I could. Every other piece of this — the scope, the ontology, the architecture, the model — is knowable, buildable, solvable. Adoption and change management is the one thing that consistently defeats otherwise excellent programs. Finance teams are trained by their profession to be careful. That is a feature, not a bug. The programs that win treat adoption as a first-class engineering problem, not a post-launch afterthought. They invest in enablement. They bring champions through the build. They sequence the rollout to create early wins. And they measure adoption as seriously as they measure the model's accuracy.
If you are running a Finance AI program today, the easiest thing to do is keep counting. Count your use cases. Count your pilots. Count your deployed capabilities. It feels productive.
The more uncomfortable — and more valuable — move is to stop counting and start asking different questions.
What process are we actually reinventing? What does the finance ontology underneath it need to look like? Who in Finance owns the outcome? What is the number we are going to move? How do we build momentum from where we actually are, not where we wish we were?
The survivors answer those questions first. The model comes last.