📈 Improvement Concepts

Root Cause Analysis

Root cause analysis (RCA) is the family of structured techniques used to identify why a problem occurred — not just what happened, but why the system produced that outcome. Finding the root cause is necessary but not sufficient. The fix must operate at the right level of the system, and Bootstrap CUSUM must verify that it worked.

What you’ll be able to do after this page
▶ Open the StepChange Analyzer

Next: Interpret resultsWhat to do next

StepChangeAnalysis.com  ·  Concepts series  ·  June 2026
☰  Contents — click to expand

What root cause analysis is

Root cause analysis is a structured investigation methodology applied after an adverse event, near-miss, or persistent problem. Its purpose is to identify the fundamental cause — the root cause — that, if addressed, would prevent recurrence. Not the proximate cause (what immediately triggered the event), not the contributing factors (what made it worse), but the underlying condition that made the event possible.

The distinction matters because most organisations respond to problems at the proximate cause level. A patient falls: the immediate response is to put up the bed rails. A medication error occurs: the immediate response is to retrain the nurse. These responses address the proximate cause — they may prevent this specific event in this specific way, but they leave the underlying condition unchanged. The next event of the same type will occur through a slightly different proximate cause, and the cycle repeats.

Root cause vs proximate cause vs contributing factor

Proximate cause: The immediate trigger of the event. What happened just before the adverse outcome. A necessary but not sufficient explanation — it tells you the mechanism but not the cause.

Contributing factor: A condition that increased the likelihood or severity of the event but did not directly cause it. Fatigue, understaffing, poor lighting, time pressure. Important context but not the root cause.

Root cause: The fundamental system condition without which the event either could not have occurred or would have been far less likely. Addressing the root cause prevents recurrence of the entire class of events, not just this specific instance.


The RCA tool family

Several complementary tools exist for root cause analysis. Each is suited to different types of problems and different organisational contexts.

● Linear chains

The 5 Whys

Ask why five times to trace a linear causal chain from symptom to root cause. Simple, fast, requires no special equipment. Developed by Toyota.

Best for: single-cause problems with a clear causal chain. Process failures where one thing led to another.
Limitation: produces a single causal chain. Misses multiple interacting causes. Different teams reach different answers.
● Multiple causes

Fishbone (Ishikawa) Diagram

Maps multiple categories of potential causes onto a diagram shaped like a fishbone, with the problem at the head. Developed by Kaoru Ishikawa at Kawasaki in the 1960s.

Best for: complex problems with multiple contributing causes across different categories. Team-based analysis sessions.
Limitation: generates hypotheses, not confirmed causes. Requires follow-up investigation to verify which branches are active.
● System failures

Fault Tree Analysis (FTA)

Top-down logical diagram that maps all possible combinations of failures that could produce an undesired top event. Uses Boolean logic gates (AND, OR). Standard in aerospace and nuclear industries.

Best for: safety-critical systems where all failure pathways must be identified and quantified. High-consequence, low-frequency events.
Limitation: complex and time-intensive. Requires detailed system knowledge. Overkill for most healthcare improvement work.
● Near-misses

Significant Event Analysis (SEA)

A reflective, team-based review of significant events — including near-misses and good outcomes — to learn from what happened. Standard in UK primary care.

Best for: primary care and community settings. Learning from both adverse events and positive outcomes. Building a culture of reflection.
Limitation: variable quality depending on team facilitation skills. Can become a tick-box exercise without strong facilitation.

The fishbone (Ishikawa) diagram

Fishbone (Ishikawa) diagram — the standard root cause analysis diagram
Fishbone root cause analysis diagram showing categories of causes leading to a problem Problem effect People skills, training, behaviour Process methods, procedures, steps Equipment tools, technology, systems Environment place, conditions, culture Materials supplies, information, data Management decisions, policy, oversight causes

Each bone represents a category of potential causes. Sub-branches identify specific causes within each category. The fishbone generates hypotheses — the 5 Whys then tests each one.

The fishbone diagram — also called the Ishikawa diagram or cause-and-effect diagram — was developed by Kaoru Ishikawa at Kawasaki Heavy Industries in the 1960s and is now used across healthcare, manufacturing, and service industries worldwide. It provides a structured way to brainstorm and organise potential causes across multiple categories simultaneously.

PROBLEM STATEMENT Effect People Skills, training, staffing Process Steps, procedures Equipment Tools, technology Environment Workplace, culture Management Decisions, resources Materials Supplies, information Start here Fishbone (Ishikawa) Diagram — 6M Categories Each branch is a category of potential causes. Sub-bones are specific causes within that category. ▲ Top branches: above the spine    ▼ Bottom branches: below the spine    Each sub-bone is a specific hypothesis to investigate

The 6M categories

The most widely used fishbone structure in manufacturing and healthcare uses six categories — the 6Ms. In healthcare the categories are sometimes adapted to the 4Ps (People, Process, Place, Policy) or to specific clinical frameworks.

CategoryManufacturing originalHealthcare equivalentExamples of causes
Man / PeopleOperator skills, trainingStaff knowledge, fatigue, communicationInsufficient training, unclear roles, handover failures
Machine / EquipmentTools, machineryMedical devices, IT systems, connectorsEquipment not available, alert fatigue, incompatible connectors
Method / ProcessProcedures, work instructionsClinical protocols, care pathwaysNo standard procedure, procedure not followed, outdated guideline
MaterialRaw materials, componentsMedications, supplies, patient informationLook-alike/sound-alike drugs, missing information, supply chain failures
MeasurementInspection methodsMonitoring, audit, reportingNo monitoring system, measurement error, metric not tracked
Mother Nature / EnvironmentTemperature, humidityWard culture, staffing levels, time pressureUnderstaffing, interruptions, normalisation of deviance

When to use which tool

SituationRecommended toolWhy
Single adverse event with a clear sequence of events 5 Whys Fast, simple, follows the causal chain directly
Complex event with multiple contributing causes across different departments or systems Fishbone diagram Captures multiple categories simultaneously, good for team sessions
Recurring pattern of similar events across multiple sites or time periods Bootstrap CUSUM + RCA CUSUM identifies the pattern and dates it; RCA explains the cause
Safety-critical system where all failure pathways must be mapped Fault tree analysis Systematic, quantifiable, maps all pathways including combinations
Learning from near-misses in primary care or community settings Significant Event Analysis Reflective format, culturally accessible, covers positive events too

Finding the root cause is necessary but not sufficient

Root cause analysis is widely assumed to lead naturally to prevention. Find the cause, fix the cause, prevent recurrence. In practice the chain frequently breaks at the third link. Two specific failures account for most of this.

The fix operates at the wrong level. The RCA correctly identifies the root cause at the system level. The fix is implemented at the process or output level because the system-level fix is too expensive, too slow, or outside the authority of the team conducting the analysis. The root cause remains unchanged. The event recurs through a different proximate cause. Another RCA is conducted. The pattern repeats. Joiner’s Levels of Fix is the diagnostic tool for this failure: if the fix is at Level 1 or Level 2 but the root cause is at Level 3, the fix will not prevent recurrence.

The fix is never verified. The fix is implemented and assumed to work. No pre-specified outcome measure was defined before the fix. No Bootstrap CUSUM prediction was made. When the next review occurs, the team reports that the action was completed — not that the outcome changed. Completing an action and changing an outcome are not the same thing. Without a pre-specified test, the improvement is asserted rather than confirmed.

The NHS RCA cycle that produces no change

A Never Event occurs. An RCA is conducted. A corrective action plan is produced. The actions are completed and signed off. The event occurs again the following year. Another RCA is conducted. The same root causes are identified. A similar action plan is produced. This cycle, documented in multiple NHS investigations, is the direct consequence of RCA without Joiner-level awareness and without Bootstrap CUSUM verification. The root cause is found, a Level 1 or Level 2 fix is applied, the system remains unchanged, and the event recurs. Bootstrap CUSUM on NHS Never Events data shows the result: 17.5 events per year, unchanged for 15 years, across thousands of individual RCA investigations.


Psychological “Why” frameworks — why people ask why differently

The 5 Whys is a logical technique. But the question “why?” also has a psychological dimension that determines whether an RCA reveals the true root cause or a socially acceptable one.

In organisations where fear is present — where pointing out problems or naming system failures carries personal risk — the 5 Whys produces a sanitised causal chain that stops at the level where blame becomes uncomfortable. The questioning process appears rigorous. The conclusions are systematically incomplete. The real root cause — the management system, the accountability structure, the incentive that produced the behaviour — is never named because naming it carries too high a personal cost.

Psychological safety is not a pre-condition for asking why. It is a pre-condition for the answers being honest. Amy Edmondson’s research on psychological safety in healthcare teams showed precisely this: teams with low psychological safety reported fewer errors, not because they made fewer errors, but because they were less willing to report them. RCA conducted in conditions of low psychological safety produces fewer root causes found, not because there are fewer root causes, but because the investigation stops before reaching the ones that are uncomfortable to name.

This is why Going to the Gemba is a pre-condition for honest RCA in complex organisations: the senior person who goes to where the problem manifests, in a culture of psychological safety, hears what the front line actually knows — not what the front line thinks it is safe to say. Deming’s Point 8 (drive out fear) is not a management philosophy. It is a prerequisite for root cause analysis to reach the root.

RCA in the NHS — why it frequently fails to prevent recurrence

The NHS conducts thousands of Root Cause Analyses every year through the Serious Incident framework, the Patient Safety Incident Response Framework (PSIRF), and clinical audit processes. The volume of RCA activity is not in question. The effectiveness is.

Three structural features of NHS RCA produce the recurring-event pattern:

1. Individual event focus without pattern analysis. Each RCA analyses one specific event. The systemic pattern — that the same type of event recurs at the same rate year after year — is visible only when multiple events are analysed as a series. Bootstrap CUSUM on the series answers the question that individual RCA cannot: has the rate of this type of event structurally changed? If it has not, the individual RCAs have not produced system change.

2. Action completion measured, not outcome change. NHS governance frameworks typically require trusts to report that RCA actions have been completed. They do not require trusts to demonstrate that completing those actions changed the outcome. The accountability framework measures activity, not effect. This is precisely the process measure vs outcome measure confusion: completing an action plan is a process measure. Reducing the event rate is an outcome measure. The NHS reports the former and calls it improvement.

3. PSIRF and the shift toward system learning. The Patient Safety Incident Response Framework (2022) is a genuine improvement on previous frameworks. It explicitly moves away from individual RCA of every serious incident toward a system-level analysis of themes and patterns. It is the right direction. The analytical tool that makes system-level pattern analysis rigorous — Bootstrap CUSUM applied to the event series — is not yet routinely used within it.


Closing the loop with Bootstrap CUSUM

📊 The complete RCA + Bootstrap CUSUM cycle

Step 1 — Identify the pattern with Bootstrap CUSUM. Apply Bootstrap CUSUM to the event rate series (incidents per month, adverse events per quarter, never events per year). If the process is stable with no change point, individual events are common cause variation — the system is producing them routinely. This tells you the problem is systemic, not episodic. If an upward change point appears, something specific changed and made things worse. Date the change point — that narrows the investigation window.

Step 2 — Identify the root cause with RCA. Use the 5 Whys or fishbone diagram to trace the causal chain. Apply the Joiner test at the end: is the proposed fix at Level 1 (output), Level 2 (process), or Level 3 (system)? If it is at Level 1 or 2, ask whether a Level 3 cause exists that the analysis has not yet reached.

Step 3 — Implement a Level 3 fix. The fix must address the root cause at the system level. Physical redesign where possible (making the wrong action impossible), structural change where physical redesign is not feasible, economic or accountability mechanism change where structural redesign is not feasible.

Step 4 — Pre-specify the Bootstrap CUSUM test. Before implementing the fix, state in writing: we expect a Bootstrap CUSUM change point in [outcome measure — the event rate series] within [Z] time periods at [Y]% confidence. This is the commitment that makes the verification step meaningful.

Step 5 — Verify with Bootstrap CUSUM. Run Bootstrap CUSUM on the event rate series periodically after the fix. When a downward change point appears at the predicted confidence level, the fix is confirmed. When it does not appear within the expected lag window, the root cause analysis was incomplete — return to Step 2.


Related concepts

📈 Part of the StepChange improvement concepts library

This concept sits within a broader framework for understanding why improvement programmes succeed or fail. Start with Why Nothing Changes for the full picture, or go to Start Here for a guided introduction to the method.