📊 Concept · Variation Theory

Common Cause vs Special Cause — Which Type of Variation Do You Have?

The single most important distinction in improvement work. Confuse the two and every response you make will be wrong — either tampering with a stable system or ignoring a real signal. This page tells you which type you have, how the StepChange Analyzer (Bootstrap CUSUM) detects the difference, and exactly what to do next.

StepChangeAnalysis.com · Open the StepChange Analyzer

▶ Key rule — common cause vs special cause

If the StepChange Analyzer (Bootstrap CUSUM) detects a change point, you have a special cause signal — something specific shifted the system. If it detects no change point, the variation is common cause — the system is stable and reacting to it will make things worse.

The response to each type is opposite. Special cause: find and address the specific cause. Common cause: do not tamper — redesign the system or wait. Applying the wrong response to either type produces worse outcomes than doing nothing.

☰ Contents

The two types — at a glance
How to tell which type you have
The two mistakes — and why both make things worse
What to do next — decision table
Real examples

The two types — at a glance

Common cause variation

The normal, expected variation of a stable system. The process is doing exactly what it is designed to do. The ups and downs you see month to month are noise — predictable within a range, produced by the system itself.

Bootstrap CUSUM result: no change point
System state: stable
Correct response: do not tamper
To improve: redesign the system

Special cause variation

A non-random signal that something specific changed. A sustained shift to a new level, a spike, a trend. Something outside the system’s normal behaviour caused this — a specific, identifiable event.

Bootstrap CUSUM result: change point detected
System state: shifting
Correct response: investigate the cause
To improve: address the specific cause

The distinction was first made precisely by Walter Shewhart in 1931 — he called them “chance causes” and “assignable causes.” Deming developed it into a management principle. Joiner built it into a practical decision framework. The terminology has changed but the insight is the same: variation has two sources, and they require opposite responses.

How to tell which type you have

The StepChange Analyzer (Bootstrap CUSUM) does this automatically. Upload your time-series data, run the analysis, and the result is unambiguous:

📊 Reading the Bootstrap CUSUM result

Change point detected You have a special cause signal. The cumulative sum crossed the statistical threshold — the system shifted from one stable level to another at a specific, dateable point. The algorithm returns that date. Your job is to investigate what changed in the system at or just before that date.

The change point may be an improvement (the metric moved in the right direction) or a deterioration (it moved in the wrong direction). Both are special causes. Both require investigation before attribution — the date is the starting point, not the conclusion. Go to Change detected: what to do next.

No change point You have common cause variation. The system is stable — fluctuating within its normal range. The variation you can see is real but it is produced by the system itself, not by a specific external event.

This has two very different meanings: the stable level may be acceptable (hold steady, monitor) or unacceptable (the system needs redesigning, not reacting to). In either case, do not tamper — launching a new intervention in response to a single bad month is reacting to noise. Go to No change detected: what to do next.

Approaching threshold A special cause may be forming — wait for confirmation. The cumulative sum is trending but has not yet crossed the threshold. Adding a new intervention now would muddy the signal. Set a review date and re-run as more data arrives. Go to No change detected: decide to wait.

The two mistakes — and why both make things worse

Shewhart identified two types of error. Deming demonstrated both with the funnel experiment. They are not symmetric — Mistake 1 is far more common in practice — but both produce worse outcomes than the correct response.

Mistake	What it looks like	Why it makes things worse
Mistake 1 Treating common cause as special cause	Reacting to every bad month with an action plan. Calling a meeting when the metric dips. Changing the process in response to normal fluctuation. Launching a new initiative because last quarter was worse than the previous one.	Each reaction adds variation to a stable system — Deming called it tampering. The system becomes less predictable. Staff learn that numbers trigger reactions regardless of whether they contain a signal. Data reporting becomes political. Performance worsens on average.
Mistake 2 Treating special cause as common cause	Dismissing a genuine shift as “just noise.” Not investigating a sustained deterioration. Assuming a real improvement will sustain itself without standardising the conditions that caused it.	A deterioration goes unaddressed and compounds. An improvement drifts back because its cause was never identified or standardised. Special causes that are not identified recur — sometimes repeatedly, sometimes with increasing severity.

Why Mistake 1 is so persistent in public services

Monthly performance reviews that respond to every data point with an action plan are institutional Mistake 1. The pressure to “do something” in response to a bad number is real and understandable — but if the number is common cause variation, any action taken is a reaction to noise. Bootstrap CUSUM applied to the metric will show a flat line throughout: no action plan has changed the system. The activity generated is real. The improvement is not.

What to do next — decision table

Bootstrap CUSUM result	Type of variation	Stable level acceptable?	Next step
Change point — improvement direction	Special cause	Better than before	Investigate the cause. Standardise if genuine. Check balancing measures. Path A →
Change point — deterioration direction	Special cause	Worse than before	Investigate the cause. Triage and stabilise. Apply Joiner levels to find the right response level. Path A →
No change point — stable at acceptable level	Common cause	Yes	Hold steady. Do not tamper. Monitor with Bootstrap CUSUM. Path B →
No change point — stable at unacceptable level	Common cause	No	Redesign the system — not the process. Stratify, experiment, disaggregate. Path C →
No change point — intervention recently applied	Common cause so far	Unclear — too soon	Wait. Do not add further interventions. Re-run as more data arrives. Path B →
No change point — multiple interventions applied	Common cause	No	Interventions have not reached the constraint. Move up a Joiner level. Joiner levels →

Real examples

📋 What each type looks like in practice

Special cause — improvement NHS A&E — the COVID effect. March 2020 produced a sharp downward change point in A&E attendances — not because the system improved but because people stopped attending during the pandemic. Bootstrap CUSUM correctly detected this as a special cause shift. It was not a system improvement; it was an external event with a specific date. When the special cause was removed (restrictions lifted), the metric returned toward its previous level. The change point was real. The attribution — “the system improved” — would have been wrong.

Special cause — deterioration DOAC adverse drug reactions — the rivaroxaban controversy. A detectable upward change point in DOAC-related adverse reactions at 2016 corresponds to the publication of concerns about rivaroxaban monitoring. A specific, dateable external event caused a sustained shift. The correct response was to investigate the specific cause (rivaroxaban monitoring failure) and address it specifically — not to redesign anticoagulation prescribing across the board. See Anticoagulation safety.

Common cause — stable at wrong level NHS A&E four-hour performance 2012–2026. Bootstrap CUSUM on 184 monthly observations finds four structural stages of decline — each a special cause shift downward — but within each stage, the metric is in common cause variation: stable, predictable, unresponsive to the multiple Level 1 and Level 2 interventions applied. Each action plan, each improvement programme, each turnaround team produced activity but no change point. The system was stable at the wrong level. The correct response was system redesign at Level 3. The response applied was repeated Mistake 1. See Why nothing has worked.

Common cause — stable at acceptable level A well-run process held stable. A pharmacy dispensing error rate has been at 0.3 per 1,000 items for 18 months. Bootstrap CUSUM shows no change point. The rate is within the acceptable threshold. The correct response is to continue monitoring — not to launch a new improvement programme because last month’s rate was 0.4 and this month’s was 0.2. Both are common cause variation. Reacting to either is tampering.

Run the test — find out which type you have

Upload your time-series data to the StepChange Analyzer. Bootstrap CUSUM will detect whether a structural shift has occurred, date it precisely if so, and give you the unambiguous starting point for the right response.

▶ Open the StepChange Analyzer

Common Cause vs Special Cause — Which Type of Variation Do You Have?

The two types — at a glance

How to tell which type you have

📊 Reading the Bootstrap CUSUM result

The two mistakes — and why both make things worse

What to do next — decision table

Real examples

📋 What each type looks like in practice

Run the test — find out which type you have

Change Detected: What to Do Next

No Change Detected: What to Do Next

No Change Detected: Decide to Wait (Don’t Tamper)

Bright Spots — Positive Deviance

Special Cause Variation — Eliminate It Before Improving the System

Common Cause Variation — Improving a Stable System

Tampering & Impatience