Variation and Statistical Process Control
Every process produces variation. The central question in any improvement programme is not whether variation exists — it always does — but what kind of variation it is, where it comes from, and whether any given result is a signal or noise. Statistical Process Control is the family of methods that answers those questions with evidence rather than intuition.
- Tell the difference between common cause and special cause variation — and why it matters in QI.
- Decide when to act on a data point and when acting would make things worse (tampering).
- Read an SPC or run chart using the standard rules used in NHS improvement.
- Know when a change in your data is a genuine signal — not noise, regression to the mean, or seasonal variation.
☰ Contents — click to expand
What variation is
Variation is the inevitable difference between successive outputs of any process. No two patient consultations take exactly the same time. No two batches of a chemical product have exactly the same purity. No two monthly A&E performance figures are identical. This is not a sign that something is wrong. It is a property of every process that has ever existed.
The question variation raises is not “why is this result different from the last one?” — there will always be a difference. The question is: “is this difference within the range that the process normally produces, or is it evidence that something genuinely different has happened?” That is the question Statistical Process Control (SPC) was designed to answer.
Deming and his teacher Walter Shewhart both argued that the failure to understand variation is the single greatest source of waste and mismanagement in any organisation. Not fraud, not laziness, not incompetence — but the systematic misinterpretation of numbers that are simply doing what numbers from any process do: varying.
A manager who does not understand variation will respond to every result as if it were a signal. A good month triggers celebration and premature scale-up. A bad month triggers blame, restructuring, and new initiatives. Neither response is warranted if the variation is common cause — normal process noise. Both responses add cost and disruption without changing the underlying process. Over time, the organisation becomes exhausted by constant reaction to variation that is simply the system behaving normally. This is the management failure that Deming called tampering — and it is almost universal.
The two types of variation
Shewhart made a distinction that is simple to state and profound in its consequences. There are two fundamentally different sources of variation in any process.
Common Cause Variation
Variation inherent to the system as currently designed. Present in every output the process produces. Stable and predictable over time — you can describe its distribution, its mean, its spread. The hundreds of small factors that together constitute “how this process works.”
- Normal fluctuation in waiting times
- Seasonal variation in A&E attendance
- Batch-to-batch variation in a chemical process
- Month-to-month variation in a diagnosis rate
Special Cause Variation
Variation from a specific, identifiable cause that is outside the normal operation of the process. Not present in every output — it appears at a particular time and has a particular cause. Unpredictable. Detectable because it falls outside the range the process normally produces.
- A drug shortage causing a sudden spike in adverse events
- COVID lockdown collapsing the dementia diagnosis rate
- A catalyst change improving hydrogen plant efficiency
- A carbon price floor transforming electricity economics
The management consequences of confusing the two are severe and symmetrical. Treating common cause variation as special cause — reacting to normal noise as if it were a signal — is tampering. It adds variation and makes the process worse. Treating special cause variation as common cause — ignoring a genuine signal as if it were normal noise — allows a specific problem to persist unaddressed. Both errors are costly. SPC is the tool that distinguishes them.
Shewhart and the control chart
Walter Shewhart, working at Bell Telephone Laboratories in the 1920s, developed the control chart as a practical tool for making this distinction operational. His insight was that any stable process will produce outputs that fall within a predictable range — and that outputs outside that range are statistical evidence that something specific has changed.
Shewhart set the limits at three standard deviations either side of the process mean — the Upper Control Limit (UCL) and Lower Control Limit (LCL). The choice of three sigma is deliberate: it balances the two types of error. Narrower limits produce too many false alarms (treating common cause as special cause). Wider limits miss too many genuine signals (treating special cause as common cause). Three sigma gives a false alarm rate of approximately 1 in 370 for a normally distributed process — rare enough to treat every signal seriously, frequent enough to detect real events.
Blue points: common cause variation within control limits. Orange: escalating signal approaching UCL. Red: Rule 1 special cause — single point beyond 3σ.
Reading a control chart
A control chart has three elements: the data series plotted over time, the centre line (the process mean), and the control limits (UCL and LCL at ±3σ). Reading it correctly requires understanding what each element tells you.
The centre line is the expected value of the process — not a target, not a minimum standard, but the average of what the process actually produces. A result at the centre line is the most expected result. Results above or below are expected too, within limits.
The control limits are not specification limits. They are not the acceptable range set by a standard or a contract. They are the statistical description of what this process naturally produces. A result outside the control limits is evidence that something specific has changed — a signal worth investigating, not a target that has been missed.
The data pattern tells you more than individual points. A single point near the UCL may be common cause variation. Eight consecutive points all above the mean — even if none breaks the UCL — is a statistical signal that the process has shifted. This is the logic behind the Western Electric Rules: patterns of points within the zones are as informative as individual extreme values.
The most common error is treating the control limits as targets or acceptable ranges. A manager sees a result below the LCL and concludes performance is unacceptably low. A clinician sees a result above the UCL and concludes the patient is in crisis. Both may be correct — but only if the process is stable and the limits have been correctly calculated. A result below a poorly calculated LCL is not evidence of failure. A result within limits may still represent a genuine problem if the centre line itself is at an unacceptable level. The control chart tells you whether the process has changed. It does not tell you whether the level is acceptable. Those are different questions requiring different tools.
The SPC chart family
Shewhart’s original control chart has been extended into a family of charts, each suited to different types of data. Choosing the right chart matters: applying the wrong one produces misleading conclusions.
X-mR Chart
The most commonly used SPC chart in healthcare improvement. Plots each observation individually against its calculated control limits. The mR (moving range) chart alongside tracks the variation between consecutive observations.
Run Chart
Plots observations against the median and uses run rules (typically 6+ consecutive points above or below the median) to flag potential shifts. Simpler than the X-mR and does not assume normality.
p-Chart / c-Chart
Designed for count data and proportions. The p-chart tracks the proportion of non-conforming items (e.g. error rate). The c-chart tracks counts of events per unit (e.g. incidents per 1,000 bed-days). Limits vary with sample size.
Cumulative Sum Chart
Accumulates deviations from the mean rather than evaluating each observation independently. Specifically designed to detect sustained shifts that are too small to trigger individual Shewhart signals. Bootstrap CUSUM extends this to non-normal data using resampling to derive confidence levels from the data itself.
Why normality matters — and when it breaks
Classical Shewhart control charts assume the data follows a normal distribution — the familiar bell curve. When that assumption holds, the three-sigma limits correctly identify approximately 99.73% of common cause variation, giving a false alarm rate of 0.27% (1 in 370). When the assumption breaks, the limits are wrong and the chart misleads.
Healthcare data is frequently non-normal. Count data (incidents per month, deaths per year), rate data (infections per 1,000 patient days), and proportion data (percentage achieving a target) all follow distributions that differ significantly from normal, especially when counts are small or rates are low. For these series, applying X-mR control limits treats data as if it follows a distribution it does not, producing either too many false alarms or missed signals.
📊 The Bootstrap CUSUM solution to non-normality
“The Bootstrap CUSUM confidence level is earned from the data — not assumed from theory.”
Bootstrap CUSUM addresses non-normality directly. Rather than assuming a distribution and looking up critical values from a table, it resamples the actual data thousands of times to build an empirical distribution of what the CUSUM statistic looks like under common cause variation. The confidence level — 90%, 95%, 99.7% — is then derived from that empirical distribution.
This means Bootstrap CUSUM is valid for any distribution the data actually follows: normal, Poisson, binomial, right-skewed, or any combination. The only assumption is that the data is independent — that each observation is not directly caused by the previous one. For time-series data where that assumption is approximately met, Bootstrap CUSUM is distribution-free and therefore correct regardless of the data’s shape.
The practical consequence: the NHS A&E performance series, the sepsis mortality series, and the hydrogen plant residual series are all non-normal in different ways. The X-mR chart applied to each would produce misleading control limits. Bootstrap CUSUM produces valid confidence levels for each without modification.
From SPC to Bootstrap CUSUM
Classical SPC asks: is this individual result (or short pattern) outside the range the process normally produces? It is optimised for real-time monitoring — detecting signals as they occur, observation by observation.
Bootstrap CUSUM asks a different question: has the underlying process mean permanently shifted to a new level, and if so when? It accumulates evidence across the entire series rather than evaluating each observation independently. This makes it less sensitive to individual outliers (which may be single-observation special causes) and more sensitive to sustained shifts (which are step-changes in the process mean).
| Question | X-mR / Run chart | Bootstrap CUSUM |
|---|---|---|
| Is this individual result unusual? | ✅ Yes — directly | ❌ Not designed for this |
| Has the process permanently shifted? | ⚠ Partially — run rules help | ✅ Yes — directly, with confidence level and date |
| When did the shift occur? | ❌ Approximate at best | ✅ Precise — dated to within weeks |
| Valid for non-normal data? | ❌ Assumes normality | ✅ Distribution-free |
| Detects low signal-to-noise shifts? | ❌ Poor — discards history | ✅ Strong — accumulates evidence |
| Suitable for real-time monitoring? | ✅ Yes | ⚠ Better for periodic review |
The two approaches are complementary, not competing. For ongoing process monitoring — watching a patient’s INR, monitoring a production line reading by reading — the X-mR chart and Western Electric Rules are the right tools. For evaluating whether an improvement programme produced a genuine structural change, or for retrospective analysis of historical data, Bootstrap CUSUM is more powerful. See Three Charts, Three Stories for a direct comparison applied to the same dataset.
The two mistakes
Shewhart identified two types of mistake that result from misidentifying variation. They are asymmetric in their consequences and in how common they are.
Treating common cause as special cause
Reacting to a result that is within the natural variation of the process as if something specific has gone wrong. Looking for an explanation that does not exist. Introducing a change in response to noise.
This is tampering. It adds variation. Over time it makes the process worse.
Treating special cause as common cause
Dismissing a result that is outside the natural variation of the process as “just one of those things.” Failing to investigate a genuine signal. Allowing a specific problem to persist because it is attributed to background noise.
Allows one specific problem to continue unaddressed.
Shewhart’s three-sigma rule minimises the combined cost of both mistakes for normally distributed processes. Bootstrap CUSUM extends this to non-normal data, allowing the practitioner to choose the confidence level that matches the cost of each mistake in their specific context: 90% when the cost of missing a signal is high (early warning), 99.7% when the cost of a false alarm is high (irreversible action).
Bright Spots — special cause variation worth finding
The two mistakes described above focus on the downside of misidentifying variation. But there is a third failure mode that is equally costly and less often discussed: failing to recognise and investigate positive special cause variation.
A Bright Spot is a unit, team, site, or pathway that is performing significantly better than the rest of the system — not because of random good luck, but because something about how it is structured or how it operates is genuinely different. A Bootstrap CUSUM upward change point, or a value consistently above the upper control limit in a direction that represents improvement, is the statistical signature of a Bright Spot. It is special cause variation — and it is worth investigating for exactly the same reason that a harmful special cause is worth investigating: something specific is producing a result the system normally does not produce.
The key insight is that a Bright Spot is not simply a unit that is doing well. It is a unit whose performance is statistically distinguishable from the common cause variation of the system — a result that the system, operating normally, would not be expected to produce. This distinction matters because it changes the question. If a unit’s performance is within common cause variation, its good results may simply be noise — the same unit may perform poorly next period. If it represents genuine special cause variation, something structural is different. That structural difference is what needs to be identified and replicated.
The concept originates with Marian Zeitlin’s nutritional research (1990), was operationalised by Jerry and Monique Sternin for Save the Children in rural Vietnam, and was brought into mainstream improvement thinking by Pascale, Sternin and Sternin in The Power of Positive Deviance (Harvard Business Press, 2010). Chip and Dan Heath popularised the same idea as “Bright Spots” in Switch (2010). The statistical foundation — distinguishing genuine special cause from common cause variation — is what makes the concept operationally precise rather than anecdotal.
The practical implication for improvement work is significant. Rather than asking only “why is the average so poor?” — a question that leads to system-level analysis but rarely to actionable findings — the Bright Spots question asks: “which units are performing as genuine positive outliers, what is structurally different about how they operate, and how do we replicate that difference?”
🌟 What makes a Bright Spot structural rather than lucky
A unit whose performance appears good in a single period may simply be at the high end of common cause variation. It will likely regress toward the mean. Treating it as a Bright Spot and attempting to replicate its “methods” replicates luck, not structure.
A genuine Bright Spot shows one or more of the following in its SPC data:
A Bootstrap CUSUM upward change point — a dated, statistically confirmed permanent shift to a new higher mean, coinciding with an identifiable change in how the unit operates. This is the strongest possible evidence that something structural changed.
Sustained performance above the UCL across multiple periods — a pattern that the Western Electric Rules would identify as a special cause signal. Not a single outlier, but a consistent run of results that the system’s natural variation cannot explain.
A mean that is statistically distinguishable from the system mean when Bootstrap CUSUM is applied to the unit’s series in isolation — a different process operating within the same nominal environment.
The investigation question that follows is precise: what changed in this unit approximately 6–12 months before the Bootstrap CUSUM change point? That is the window in which the structural difference was introduced. The answer — a different staffing model, a different process design, a different physical layout, a different management approach — is the hypothesis to test at scale. The Sepsis Six article applies this logic directly: the national null result conceals substantial trust-level variation, and the Bright Spots — trusts with genuine upward change points in their in-hospital sepsis mortality — are the units that hold the answer to what actually works.
Applied at the unit level across a system, Bootstrap CUSUM identifies which units have a structurally different performance trajectory from the rest. This is more informative than ranking by average performance — ranking cannot distinguish a unit that has permanently shifted to a better level from one that happens to be at the high end of common cause variation this period. Bootstrap CUSUM can. A change point at 95% confidence or above, in the direction of improvement, dated to a specific period, is the evidence that warrants a Bright Spot investigation. Everything else is noise.
SPC in practice — worked examples from this site
| Article | Data type | SPC tool used | Key finding |
|---|---|---|---|
| Three Charts, Three Stories | 488 weeks of clinical data — right-skewed count series | X-mR, Run chart, Bootstrap CUSUM compared | X-mR reports flat mean of 22.57 across a journey from 42 to 6. Bootstrap CUSUM finds 6 structural stages. Same data, three completely different stories. |
| NHS A&E | 184 monthly observations — percentage series | Bootstrap CUSUM, 99.7% confidence | Four structural stages of decline. Not one policy intervention detectable as an upward change point. |
| Hydrogen Plant | Industrial efficiency residuals — SNR = 0.28 | Residual Bootstrap CUSUM | X-mR produces no signal across a full year containing a genuine 17% efficiency gap. Bootstrap CUSUM detects it. Signal-to-noise ratio below 1.0 requires accumulation of evidence. |
| Dementia diagnosis | Annual EDDR — N=9, percentage series | Bootstrap CUSUM + X-mR combined | Bootstrap CUSUM: one stage (N too small for stage detection). X-mR moving range: COVID collapse confirmed as special cause on rate-of-change chart. Both tools needed for the full picture. |
| UK Carbon Emissions | Annual emissions (MtCO2e) — N=35 | Bootstrap CUSUM, 90% and 99.7% | Electricity supply: one change point at 2013, 99.8% confidence. Transport: no structural change at 95% across 34 years. Same method, same confidence level, completely different results. |
Related concepts and tools
This concept sits within a broader framework for understanding why improvement programmes succeed or fail. Start with Why Nothing Changes for the full picture, or go to Start Here for a guided introduction to the method.