Regression to the Mean
An extreme result — good or bad — is likely to be followed by a more typical one, regardless of any intervention. This is not a theory. It is a mathematical inevitability whenever measurement contains any random variation. It is also the most commonly misattributed phenomenon in improvement work: the reason treatment gets credit for recovery, inspections get credit for school improvement, and winter programmes get credit for spring.
Never attribute improvement after an extreme period to an intervention without first asking: would the data have returned toward average anyway? Regression to the mean produces improvement without any intervention at all.
The only defence is a pre-committed prediction made before the intervention: specify the direction, metric, timing, and confidence threshold. If Bootstrap CUSUM (StepChange Analyzer) then detects a change point at the right time and in the right direction, the improvement is structural. If the metric simply drifts back toward average, regression to the mean was doing the work — not the intervention.
☰ Contents
- What regression to the mean is
- Galton's discovery — heights and "reversion to mediocrity"
- Why it happens — the mathematical inevitability
- Where it appears in improvement work
- The Deming connection — pre-committed predictions
- How Bootstrap CUSUM distinguishes it from genuine change
- The two mistakes it causes
What regression to the mean is
When a random variable produces an extreme value — unusually high or unusually low — the next measurement of the same variable is likely to be closer to its long-run average. This happens not because anything changed, but because extreme values are, by definition, unlikely to be repeated. The more extreme the initial observation, the more strongly the next observation tends to regress toward the mean.
This applies in both directions. An exceptionally good month is likely to be followed by a less exceptional one. An exceptionally bad month is likely to be followed by a less bad one. Neither movement requires explanation. Neither reflects genuine structural change. Both are statistical inevitability.
In statistics, regression toward the mean is the phenomenon where if one sample of a random variable is extreme, the next sampling of the same random variable is likely to be closer to its mean. Furthermore, when many random variables are sampled and the most extreme results are intentionally selected, a second sampling of those selected variables will produce less extreme results — closer to the mean of all variables. (Galton, 1886; formalised by Pearson, 1896.)
Galton's discovery — heights and "reversion to mediocrity"
Francis Galton stumbled on the phenomenon in the 1880s while measuring the heights of parents and their adult children. He expected tall parents to produce tall children and short parents to produce short children — and they did. But not proportionally. Tall parents produced children who were tall, but not as tall as their parents. Short parents produced children who were short, but not as short as their parents. Every generation drifted back toward the population average.
Galton called it "reversion to mediocrity" — a wonderfully blunt description. He was puzzled by it, initially suspecting some biological force pulling generations toward average height. The truth, as Karl Pearson later formalised, was simpler and more fundamental: it was a mathematical property of any measurement that contains random variation. No biological mechanism was needed. The phenomenon emerged from the structure of the data itself.
The insight generalised far beyond heights. Anywhere a measurement has both a systematic component (the true underlying level) and a random component (noise, measurement error, natural fluctuation), extreme values will contain more random variation than typical values — and will therefore tend to be followed by less extreme ones.
Why it happens — the mathematical inevitability
Consider a process with a true underlying mean of 100, subject to random monthly fluctuation of ±20. In any given month, the observed value might be 120 (extremely good) or 80 (extremely bad). Neither value reflects a change in the true underlying level — both reflect random variation around a stable mean.
If you observe a value of 120, what is the most likely next value? Not 120 again — extreme values are unlikely by definition. The most likely next value is something closer to 100. If you observe 80, the most likely next value is something closer to 100. The process has not changed. The true mean has not changed. Only the random component has varied.
The critical point: this happens regardless of what you do between the two measurements. If you intervene after the extreme value, the regression toward the mean will occur anyway — and your intervention will appear to have caused it.
The clinical version — why treatments appear to work
A patient develops a symptom severe enough to seek treatment. By definition, their symptom level at the point of seeking treatment is near its worst. The doctor prescribes a treatment. The symptom improves. The treatment receives the credit.
But consider the counterfactual: what would have happened without treatment? For many conditions, symptoms fluctuate naturally. A patient presenting at their worst is statistically likely to improve next week regardless of treatment — because extreme symptom levels are unlikely to persist. Regression to the mean is doing most of the work. The treatment gets the credit.
This is why randomised controlled trials require a control group. The control group experiences the same regression to the mean as the treatment group. Any additional improvement in the treatment group, beyond what the control group experienced, is attributable to the treatment. Without the control group, regression to the mean is invisible — and the treatment appears far more effective than it is.
Where it appears in improvement work
📋 Common patterns in healthcare and public services
The Deming connection — pre-committed predictions
Deming understood regression to the mean as one of the central obstacles to honest improvement evaluation. His insistence on pre-committed predictions — stating in writing, before an intervention, what change you expect to see and when — is the direct methodological response to the problem.
Without a pre-committed prediction, any improvement after an intervention can be attributed to the intervention, even if regression to the mean is the actual mechanism. The improvement feels real, the attribution feels logical, and the learning is false. The same intervention will be repeated regardless of whether it actually worked — because the evidence of its working was an artefact of measurement, not a signal of structural change.
With a pre-committed prediction, regression to the mean is visible. If the prediction specifies that Bootstrap CUSUM should detect a change point at a particular confidence level within a particular timeframe — and the data instead shows a return toward average without crossing the detection threshold — the prediction has failed. The intervention did not produce structural change. That is honest information, however uncomfortable.
Any intervention applied immediately after an extreme bad result will appear to work, because regression to the mean will produce improvement regardless. This is why the most confident improvement claims are often the least reliable: they were made after unusually bad periods, applied an intervention, and observed the inevitable return toward average. The intervention receives permanent credit for a temporary statistical phenomenon. The same intervention applied to a system at its average level would show no effect at all — because there is no extreme value to regress from.
How Bootstrap CUSUM distinguishes it from genuine change
Regression to the mean produces a characteristic pattern in time-series data: a return from an extreme value toward the previous stable level. Bootstrap CUSUM distinguishes this from genuine structural change in a precise way.
| Pattern | What it looks like in data | Bootstrap CUSUM result | What it means |
|---|---|---|---|
| Regression to the mean | Extreme value followed by return toward previous stable level. The new level is approximately the same as the pre-intervention level. | No change point. CUSUM shows a temporary excursion that returns to baseline. Or: a change point at the extreme value followed by a second change point as it regresses — ending at approximately the original mean. | The system has not structurally changed. The extreme value was a fluctuation. The intervention (if any) did not produce lasting structural improvement. |
| Genuine structural improvement | Sustained shift to a new stable level that is different from the pre-intervention level. The improvement is maintained through subsequent periods including the next hard season. | A single change point at the appropriate date, sustained. The new level does not return toward the previous mean. Confirmed across at least one full seasonal cycle. | The system has structurally changed. The improvement is not a regression artefact — it is maintained when the random component fluctuates in both directions around the new mean. |
| Mixed — partial regression plus genuine improvement | Improvement after an extreme value, but stabilising at a level better than the pre-intervention baseline rather than returning to it fully. | Change point detected, but the new mean is between the extreme value and the original baseline. Partial regression plus genuine structural shift. | Both effects are present. The intervention produced real improvement, but some of the observed improvement is regression. Bootstrap CUSUM quantifies the genuine structural component. |
The two mistakes it causes
Regression to the mean produces two distinct errors in improvement work — one more common, one more costly.
| Mistake | What happens | Consequence |
|---|---|---|
| Crediting the intervention (most common) |
An intervention is applied after an extreme bad result. Performance improves (regresses toward mean). The intervention receives the credit. The same intervention is rolled out at scale and repeated in future. | Resources and effort are invested in interventions that may have had no structural effect. The true cause of the extreme result remains unaddressed. The next extreme result produces the same pattern — and the same false attribution. |
| Discrediting a genuine improvement (less common, more costly) |
A genuine structural improvement is dismissed because it followed a bad period and therefore "must be regression to the mean." The intervention is not scaled or standardised. | A real improvement is lost. The conditions that produced it are not preserved. Performance returns to the previous level — which then appears to confirm the original scepticism, completing a self-fulfilling cycle of under-investment in what actually works. |
Bootstrap CUSUM resolves both mistakes with the same mechanism: the pre-committed prediction combined with the change point test. A genuine structural improvement produces a sustained change point at the predicted time. Regression to the mean produces a temporary excursion without a sustained change point. The distinction is visible in the data if you look for it correctly.
Test your data — genuine change or regression to the mean?
Upload your time-series data to the StepChange Analyzer. If the improvement is structural, Bootstrap CUSUM will detect a sustained change point. If it is regression to the mean, the CUSUM will show a temporary excursion with no confirmed change point.
▶ Open the StepChange Analyzer