Why ML Projects Collapse and How to Diagnose Them

More than 80% of AI projects fail. That failure rate is double what organizations see with non-AI IT projects. In 2025, 42% of companies abandoned most of their AI initiatives, up from 17% the prior year. The average organization scrapped 46% of proof of concepts before reaching production.

These numbers are sobering, but they obscure something more useful: The failure modes are diagnosable and often fixable. The problem is that most postmortems focus on technical shortcomings when the real breakdowns happen elsewhere.

The researchers at RAND interviewed 65 data scientists and engineers to identify root causes. The most common driver of failure had nothing to do with technology. Projects collapsed because stakeholders misunderstood or miscommunicated what problem needed solving.

This dynamic plays out with particular intensity in identity resolution projects. The business case for matching customer records sounds straightforward: Reduce duplicate outreach, enable better marketing and build a unified view of customers across touchpoints. Teams build matching algorithms, test accuracy metrics and watch models perform well on paper. Then the product ships, and nobody uses it.

An identity graph product at a major automotive software company failed twice under two different teams before finally shipping. The first diagnostic finding had nothing to do with the model: There was no alignment between cross-functional partners on who should own the project. Identity graphs require information from multiple product teams, and without clear ownership, accountability diffused across the organization. The breakthrough required spinning up a dedicated tiger team with executive support and a single mandate.

The second question that mattered was: Why didn’t users trust the output?

The second major failure pattern RAND identified was insufficient data quality. One interviewee put it bluntly: “80% of AI is the dirty work of data engineering. You need good people doing the dirty work — otherwise their mistakes poison the algorithms.”

Identity resolution is particularly vulnerable here. Customer records arrive in inconsistent formats, with typos, outdated information and conflicting identifiers. The matching algorithm is only as good as the data it ingests. Teams that focus on matching logic while treating data quality as someone else’s problem are building on sand.

There’s also an expectation management problem. Dealership users look at the data and build a narrative based on a handful of instances where matches are incorrect. A 95% accuracy rate means nothing if the 5% of errors are the ones users remember. This requires proactive communication about what the system can and cannot do, not just better algorithms.

The diagnostic approach that works is to map data flows from source systems through the integration layer before touching model architecture. Identify where records degrade, where duplicates enter the system and where the ground truth needed for training simply doesn’t exist. This mapping often reveals that the ‘ML problem’ is actually a data infrastructure problem wearing a technical disguise.

The Shiny Object Problem

RAND’s research also identified what they called the ‘shiny object’ problem. Sometimes, engineers chase the latest techniques even when simpler solutions work.

For the identity graph, transparency came through multiple layers. A backend system tracked match rates for internal monitoring. A vendor feed provided external validation. Most importantly, users saw suggested matches in the interface before committing to a customer record. If they disagreed with the suggestion, they could provide feedback directly in the app. Trust, not accuracy, was the binding constraint.

Rebuilding credibility after failed ML attempts requires changing the conversation. Stop talking about model accuracy. Start talking about business outcomes. What would it mean for a dealership to identify repeat service customers across locations? How much would they save by eliminating duplicate mailings to the same household? These questions surface requirements that accuracy metrics never capture.

The phData guide on identity resolution describes the core challenge: Fragmented customer identities spread across different channels lead to data mix-ups and incomplete profiles, making it difficult to know who is doing what and where. Solving this requires more than a good algorithm. It requires understanding the specific workflows where identity matters and designing the product around those moments.

The same diagnostic framework applies beyond identity resolution. At a state insurance fund, I inherited a fraud detection model with abysmal precision. The investigation unit was drowning in false positives and had essentially stopped trusting the alerts.

The instinct was to retrain with more features or tune thresholds. Instead, the diagnostic questions were: What problem is this solving, and for whom? The model was built to ‘detect fraud’, but that’s not a user problem. The actual users were fraud analysts who needed to prioritize their queue and maintain credibility with leadership. They didn’t need more alerts. They needed actionable alerts with enough context to act on.

Mapping the data supply chain revealed that the model had strong predictive capability but operated as a black box. It flagged transactions without explaining why. The fix required going back to first principles, mapping out the scenarios that warranted flags and surfacing the reasoning to users. Trust returned when analysts could see and verify the logic.

The pattern transfers to ML ranking at a major retailer. Trust remains paramount. The diagnostic starts with identifying who the system serves: End customers and sellers. The goal is building a fair and equitable marketplace for both. Data pipelining follows, with explainability at every layer. Every offer’s probability of winning is tracked, creating visibility into opportunities for improvement. Changes go through rigorous A/B experiments before production deployment. Decisions are logged and tied back to customer, product and business goals.

A diagnostic framework for struggling ML projects follows a consistent pattern. Start by asking what problem the system solves and for whom. Map the data supply chain from source to model to output. Look for where users lose trust or stop engaging. Resist the urge to improve the model until the current failure mode is understood.

The 80% failure rate for AI projects reflects an industry-wide pattern, but it isn’t inevitable. Failed projects leave traces. The teams that ship successfully aren’t necessarily the ones with better models. They’re the ones who diagnosed what actually went wrong before building anything new.

Why ML Projects Collapse and How to Diagnose Them

The Shiny Object Problem

SHARE THIS STORY

FOLLOW US

Why ML Projects Collapse and How to Diagnose Them

The Shiny Object Problem

TECHSTRONG AI PODCAST

SHARE THIS STORY

RELATED STORIES:

FOLLOW US

NEWSLETTER SIGN UP