📊 Concept · Variation Theory

Special Cause Variation — Eliminate It Before Improving the System

Joiner’s rule, drawn from Deming and Shewhart: you must first eliminate special cause variation before attempting to improve a stable system. Acting on a special cause as if it were a systemic problem — or ignoring it and “improving” a system that is not yet stable — produces worse outcomes than doing nothing. The Bootstrap CUSUM step-change framework is built on this distinction.

StepChangeAnalysis.com  ·  Source: Joiner, Fourth Generation Management, p.138  ·  Open the StepChange Analyzer
☰  Contents

Two types of variation — and why the distinction matters

Every process produces variation. The critical question is not whether variation exists but what kind it is. Shewhart’s original insight, developed by Deming and then by Joiner, is that variation comes from two fundamentally different sources — and they require fundamentally different responses.

Type What it is Source Correct response
Common cause variation The normal, expected variation inherent in a stable system. The process is doing exactly what it is designed to do — and this is the result. Predictable within a range. The system itself — its design, its inputs, its structural conditions Do not tamper. To reduce it, you must change the system. See Joiner’s approach to common cause variation.
Special cause variation An unexpected, non-random signal that something outside the normal system has occurred. A spike, a shift, a one-off event. The process is behaving differently from its normal pattern. A specific, identifiable cause — an event, an error, a change in conditions — that is external to the system’s normal operation Identify and address the specific cause. Do not change the system to accommodate it.

The consequences of confusing the two are severe. Treating common cause variation as if it were a special cause — reacting to every dip and spike — is tampering: it adds variation to a stable system and makes performance worse. Treating special cause variation as if it were common cause — accepting an unusual event as “just noise” — means missing a signal that something specific has gone wrong or, occasionally, right.


What special cause variation looks like

Special cause variation appears as a data point or sequence of data points that cannot be explained by the system’s normal behaviour. In Bootstrap CUSUM terms, it is a shift in the process mean — a change point — that is attributable to a specific, identifiable external event rather than to a structural change in the system itself.

🔎 Common forms of special cause variation

Spike A single extreme data point. One month’s data is far outside the normal range — caused by an industrial action, a severe weather event, a system outage, or a data entry error. The process returns to its previous level in the following period. Bootstrap CUSUM will typically not flag a single spike as a structural change point; the algorithm is designed to detect sustained shifts, not one-off events.
Step A sustained shift to a new level caused by a specific, identifiable event. A new system was introduced, a key staff member left, a supplier changed, a ward closed. The process has genuinely moved to a new level — but the cause is a specific event, not a system redesign. Bootstrap CUSUM will detect this as a change point. The investigation (see Path A: What to do next) determines whether the cause is a system improvement or a special cause that needs to be understood and, if adverse, corrected.
Trend A consistent drift in one direction over several periods. Not random fluctuation but a systematic movement — caused by cumulative degradation (equipment wearing out, staff attrition), cumulative improvement (learning curve effects), or an ongoing external pressure. Bootstrap CUSUM detects trend-driven change points as the cumulative sum crosses the threshold.
Artefact A special cause in the measurement system, not the underlying process. A change in how the data is coded, a new reporting system, a change in the definition of the metric. The process has not changed; the measurement of it has. This is the most dangerous special cause because it produces a genuine change point signal that has no operational meaning. See the investigation checklist for how to rule this out.

Joiner’s rule — the correct sequence

Joiner states the rule precisely on page 138 of Fourth Generation Management: eliminate special cause variation before attempting to improve a stable system. The sequence matters. Trying to redesign a system that is not yet stable — that is still experiencing special cause events — produces unpredictable results, because you cannot measure the effect of your intervention against a baseline that keeps moving.

The Joiner sequence

Step 1 — Detect. Is the variation in your data common cause or special cause? Use Bootstrap CUSUM or a control chart to determine whether the process is stable (common cause only) or unstable (one or more special causes present).

Step 2 — Address special causes first. If special causes are present, identify each one and address it specifically — don’t change the system to accommodate it. Eliminate or standardise the special cause so the system returns to stable, predictable behaviour.

Step 3 — Confirm stability. Once special causes have been addressed, confirm the process is now stable — fluctuating within a predictable range with no further change points. Bootstrap CUSUM on recent data should show a flat line.

Step 4 — Then improve the system. Only now is a system-level improvement meaningful. You have a stable baseline to measure against. Any structural change you introduce will produce a detectable change point — and you will know it was your intervention that caused it, not a residual special cause.

⚠️ Why order matters — the baseline problem

If you attempt a system improvement while special causes are still present, you cannot interpret the result. Suppose you introduce a new discharge protocol and the following month’s data improves significantly. Was that your protocol — or the resolution of the industrial action that had been suppressing performance for the previous three months? Without first eliminating the special cause and stabilising the baseline, you cannot answer that question. Attribution is impossible. The improvement programme gets credited for a change it may not have caused, and the underlying structural issue remains unaddressed.


How Bootstrap CUSUM detects special cause variation

Bootstrap CUSUM is designed specifically to detect structural shifts — sustained moves from one stable level to another. Its relationship to special cause variation is precise:

The flat line is information too

If Bootstrap CUSUM returns no change point, that is a finding: the system is stable. It may be stable at an unacceptably poor level — in which case the task is system redesign (see common cause variation). Or it may be stable at an acceptable level — in which case the task is to hold it there and resist the temptation to tamper. A flat line from Bootstrap CUSUM on a stable system is not a failure of the method. It is the honest answer to an honest question.


The two mistakes — and their consequences

Mistake What it looks like Consequence Joiner’s term
Mistake 1: Treating common cause as special cause Reacting to every dip in the data with an intervention. Calling a meeting every time a metric falls below target. Changing the process in response to normal variation. Tampering. Each reaction adds variation to the system. Performance becomes less predictable, not more. Staff learn that the response to data is always an intervention — so data reporting becomes political rather than analytical. Tampering with a stable system
Mistake 2: Treating special cause as common cause Accepting an unusual event as “just noise.” Not investigating a sustained shift. Attributing a genuine change point to random variation because it is inconvenient to investigate. Missing a signal. An improvement goes uncredited and unstandardised; it drifts back. A deterioration goes unaddressed; it compounds. Special causes that are not identified recur. Ignoring a signal in a changing system

Deming estimated that the majority of management interventions in organisations he studied were Mistake 1 — reactions to common cause variation treated as if they were special causes. The result was systems that were more variable than they needed to be, with staff who had learned that numbers trigger reactions regardless of whether those numbers contained a real signal.


Once special causes are eliminated — what next

Once special cause variation has been identified and addressed, and the process is confirmed stable, you face a different question: is the stable level acceptable? Two paths follow.

▶ From stable system — two paths

If acceptable Hold the system stable and monitor. Use Bootstrap CUSUM as a monitoring tool. The next change point — whether improvement or deterioration — will be detectable against the clean, stable baseline. Do not introduce changes that are not improvements. Resist the pressure to “do something” in response to common cause fluctuations. See Path A: Change Detected for what to do when a future change point appears.
If unacceptable The system is stable at the wrong level — now redesign it. This is the domain of common cause variation. Reducing common cause variation requires changing the system itself — not reacting to individual data points. Joiner’s levels of fix apply: the question is at what level the constraint sits and what authority is required to address it. See Joiner’s Levels of Fix for the framework.

Test your data for special cause variation

Upload your time-series data to the StepChange Analyzer. Bootstrap CUSUM will detect whether a structural shift — a special cause step change — is present, and will date it precisely so you can investigate the cause.

▶ Open the StepChange Analyzer