📈 Reading your results honestly

How to Interpret Bootstrap CUSUM Results

Interpret structural change honestly. Avoid tampering. Know what to do next.

The tool has given you a result. This page tells you what it means — and what the most dangerous misreadings are. Choose your outcome below, then follow the pathway.

✅ Outcome A

Change detected

A structural shift confirmed at your confidence level. Now investigate when, why, and whether it is sustained.

What to do next →

⚠ Outcome B

No change detected

The most common and most important result to get right. Default to waiting — do not tamper.

Decide what to do →

❌ Outcome C

Inconclusive or data problem

Too few points, date format issues, missing values, or mixed frequencies. Fix the data and re-run.

How to fix it →

▶ Run Bootstrap CUSUM 🔍 Data Validator 📁 Fix CSV data

📊 Not sure what the numbers mean?

If you have run the Analyzer and are looking at stages, confidence levels, means, SD, or the CUSUM chart and want to know what each output means — see Interpreting StepChange Analyzer Output. It explains every number, the slope and turning points of the CUSUM line, the run chart, the X-mR limits, and the settings in plain English.

📚 Read this first if you are new to Bootstrap CUSUM

Three charts, three very different stories

The same data looks completely different through a run chart, a control chart, and Bootstrap CUSUM — and only one of them tells you whether the system has structurally changed. Takes 5 minutes and makes everything below easier to read.

Three charts, three stories →

☰ Contents

Why honest interpretation matters
Change detected — what it means
No change detected — what it means
Common interpretation pitfalls
Signal vs noise — the core distinction
Seasonality and patterns
Multiple change points
Concepts that support interpretation

💡 Why honest interpretation matters

These are common ways good people accidentally fool themselves — and why well-intentioned improvement claims go wrong:

Measuring too soon — before the change has had time to produce a structural shift
Picking the metric that moved rather than the one pre-specified before the intervention
Attributing seasonal improvement to a programme — summer data credited to a winter initiative
Regression to the mean — an extreme result is likely to be followed by a more typical one regardless of any intervention; improvement after a bad period may be statistical inevitability, not structural change
The Hawthorne Effect — behaviour changes when people know they are being measured; metrics improve during observation and may revert when observation ends; the most important confounder in NHS improvement evaluation
Definition changes that shift the number without changing the underlying reality
Claiming structural change without testing sustainability — a Bootstrap CUSUM change point confirms the system shifted at a point in time; it does not confirm the shift is permanent; only sustained data 12–24 months after the intervention — particularly after any external observation or programme support ends — confirms structural sustainability

Bootstrap CUSUM closes all these gaps — but only when combined with a pre-committed prediction made before the intervention begins. A pre-committed prediction specifies which metric will change, in which direction, by how much, and within what timeframe — before any outcome data is collected. Without it, the Bootstrap CUSUM result can always be reinterpreted after the fact. With it, the data is allowed to say no. What is a pre-committed prediction?

Change detected — what it means

A change point means the Bootstrap CUSUM algorithm has found statistical evidence that the process mean shifted permanently at a specific date, at your chosen confidence level (typically 90% or 95%). This is a strong result. It means the system — not just the data — appears to have moved to a new level.

What to check first: Is the change in the direction you expected? An upward change point in a measure you are trying to reduce is bad news, not good. A downward change point in harm events is what a successful intervention looks like.

When did the change happen? The date of the change point is as important as the fact of change. If the change point precedes your intervention by several months, something else caused it — and your intervention may be receiving credit it does not deserve. If it follows your intervention with a lag of several months, that is consistent with a genuine structural response, since most system-level changes take time to propagate through the process.

The ULEZ lesson on attribution

Bootstrap CUSUM applied to London air quality data found that the structural improvement in NO2 levels appeared approximately 18 months before the ULEZ was introduced. The intervention received credit for a change that had already occurred. This is the attribution problem in practice: without Bootstrap CUSUM, before-and-after comparisons will always find improvement in the period after an intervention, even when the improvement predates it. See the ULEZ analysis for the full data.

Is the change sustained? A single change point followed by a return to the previous level is not a structural improvement — it is a temporary excursion. Run the full series including post-intervention data to confirm the new level is holding. Structural change and sustained structural change are not the same thing — the Hawthorne Effect can produce genuine change points that reverse when external observation ends. Continue monitoring for 12–24 months after any programme or external support is withdrawn before claiming sustained structural improvement.

What caused the change? Change detected is the beginning of the investigation, not the end. The Bootstrap CUSUM tells you that something changed and when. It does not tell you what changed or why. The next step is causal investigation: what was different in the system at or before that change point date? See What to do next: change detected for the full decision pathway.

No change detected — what it means

This is the result that most improvement programmes dread — and the result they most need. No change detected means the Bootstrap CUSUM has found no statistical evidence that the process mean has shifted. The system is producing the same distribution of results it was producing before the intervention.

This is not the same as saying nothing happened. It means that whatever happened did not produce a structural shift detectable at your chosen confidence level. There are several possible explanations:

The intervention was real but too recent. System-level changes have lag times. If you are looking at data from the first three months after a structural intervention, the change point may not yet be visible. Set a review date rather than acting now.
The intervention addressed the wrong level. A training programme, a policy, or a checklist rarely produces a structural change in outcome measures. The intervention may have improved compliance without changing the system that produces the outcome. See Joiner’s Levels of Fix.
The measure is wrong. You may be measuring a process metric rather than an outcome metric. A process metric can improve while the outcome is unchanged. The question to ask: does this measure capture what the system is actually for?
The theory was incomplete. The intervention addressed one necessary condition but not all of them. The necessary/sufficient test was not applied. See Necessary But Not Sufficient.
The constraint was not addressed. You improved something that is not the bottleneck. See Theory of Constraints.

The most important rule: do not tamper

The most dangerous response to “no change detected” is to immediately launch a new initiative. Deming called this tampering: reacting to a result as if it is a signal when it may simply be the system behaving as it always has. Tampering adds variation to the system and makes future signals harder to detect. The correct response to no change detected is to decide deliberately — either wait with a pre-committed review date, or return to Step 3 of the 7-step method and revisit the root cause analysis. See What to do next: no change detected for the full decision pathway.

Common interpretation pitfalls

Results look surprising or wrong?

Before re-interpreting, check the data. A surprising change point — especially one dated to an implausible period — is often caused by a data issue: mixed date formats, missing rows, a changed denominator, or an outlier. Run the Data Validator on your CSV first, then re-run the Analyzer. See also Fix CSV date format for the most common causes.

Pitfall1

Declaring success from a single good month

A single month that is better than average is almost always common cause variation — the normal fluctuation of a stable process. Bootstrap CUSUM requires a sustained shift across multiple data points before confirming a change point. A single good result is noise. Acting on it as if it is signal is the definition of tampering. Wait for the Bootstrap CUSUM to confirm the change before declaring success or scaling up.

Pitfall2

Attributing a change point to the most recent intervention

If a change point appears in your data, the temptation is to attribute it to whatever intervention is most prominent in the narrative. But the change point date is the evidence — and the causal investigation must start from that date, not from the intervention date. Work backwards from the change point: what was different in the system at that time? Multiple things will have changed simultaneously. The Bootstrap CUSUM tells you when; the investigation must establish why.

Pitfall3

Using too short a data series

Bootstrap CUSUM requires enough data to distinguish a genuine change from normal variation. As a general rule, fewer than 20 data points produces unreliable results for most processes — the algorithm cannot accumulate enough evidence. If your series is short, the result is inconclusive rather than “no change.” Extend the series if possible, or reduce the frequency (use quarterly instead of monthly) to get more historical data points.

Pitfall4

Conflating process metrics with outcome metrics

A process metric measures whether the process was followed. An outcome metric measures whether the system is producing the right result for the people it serves. Bootstrap CUSUM on a process metric will detect when the process changed. It will not tell you whether patient outcomes, customer experience, or system performance changed. Run Bootstrap CUSUM on the outcome measure. Process improvement without outcome improvement is activity without impact. See Types of measures.

Pitfall5

Stopping the analysis at the first change point

A data series can contain multiple change points. An improvement followed by a deterioration followed by another improvement will show three change points. If you run Bootstrap CUSUM only on the period after your intervention, you may miss a preceding deterioration that inflates the apparent magnitude of the improvement, or a subsequent reversal that undermines it. Always run the full available series and examine all change points in context.

Pitfall6

Treating a 90% confidence result as certain

Bootstrap CUSUM reports change points at a specified confidence level — typically 90% or 95%. A 90% confidence level means there is a 10% probability that the detected change is a false positive. If you are making a significant resource or policy decision, use 95%. If you are making a preliminary assessment to guide further investigation, 90% is reasonable. Never treat a 90% confidence result as certain — and never treat “below the threshold” as certain absence of change.

Signal vs noise — the core distinction

Every data series contains two things mixed together: signal (genuine information about the state of the system) and noise (random variation around the process mean). The fundamental challenge of improvement measurement is separating them. Standard charts make this separation poorly. Bootstrap CUSUM makes it well.

Signal is a structural shift — the process has moved to a new mean and is likely to stay there. Noise is variation around an unchanged mean — some months are higher, some lower, but the underlying system has not changed. Acting on noise as if it were signal (tampering) adds more variation to the system. Ignoring signal because it looks like noise (failure to act) means genuine improvements or deteriorations go unrecognised and unconfirmed.

The Bootstrap CUSUM algorithm accumulates evidence of change over time. A single data point far from the mean is not enough to confirm a change point — it could be an outlier. A sustained series of data points consistently above (or below) the previous mean, accumulating evidence in the CUSUM statistic, is what triggers the change point detection. This is why Bootstrap CUSUM is more sensitive to real change and more robust against false alarms than visual inspection of a run chart.

The Deming funnel experiment

Deming demonstrated the cost of confusing signal and noise with his funnel experiment. A ball is dropped through a funnel onto a target. If you move the funnel to compensate for each previous miss (Rule 3 or Rule 4 tampering), the balls scatter more widely than if you leave the funnel fixed (Rule 1). The same principle applies to improvement: reacting to each month’s result as if it contains a signal produces a more variable system than leaving it alone. Tampering & Impatience develops this in full.

Seasonality and patterns

Many processes have seasonal patterns — NHS A&E attendance peaks in winter, retail sales peak in December, agricultural yields vary by season. Seasonal variation is common cause variation: it is a predictable feature of the process, not a signal of structural change.

Bootstrap CUSUM on a series with strong seasonality will sometimes detect the seasonal peak as a change point if the series is short. The practical check: does the apparent change point occur at the same time each year? If so, it is probably seasonal rather than structural. Options for handling seasonality include using annual rather than monthly data (which averages out seasonal variation), using seasonally adjusted data, or running Bootstrap CUSUM on a longer series where seasonal patterns are clearly visible as recurring cycles rather than structural shifts.

The safest approach for seasonal series is to run Bootstrap CUSUM on annual totals or annual averages rather than monthly data. This removes seasonal noise and makes genuine structural changes more visible.

Multiple change points

A data series that spans several years may contain multiple structural changes: a deterioration, then an improvement, then a further deterioration. Bootstrap CUSUM detects all of them if the series is long enough. Reading a multi-change-point result requires placing each change point in its chronological context:

What was happening in the system at the time of each change point? Policy changes, funding changes, staffing changes, external events — all are candidates. The change point narrows the investigation window; your knowledge of context fills in what happened.
Is the overall trend positive or negative across all change points? Four change points that alternate up-down-up-down may represent a system oscillating around its mean rather than improving. Four upward change points represent genuine sustained improvement. The cumulative direction matters as much as any individual change point.
What is the current level relative to the starting point? A series that improved, then deteriorated, may now be at a lower level than it started despite showing a recent improvement change point. Bootstrap CUSUM shows the changes; you must judge the absolute level.

The NHS A&E analysis is the most detailed worked example of a multi-change-point series on this site: 15 years of monthly data, four structural stages, and no upward change point despite multiple policy interventions.

Concepts that support interpretation

Explainer

How to Interpret Bootstrap CUSUM Results

Change detected

No change detected

Inconclusive or data problem

Three charts, three very different stories

Change detected — what it means

No change detected — what it means

Common interpretation pitfalls

Declaring success from a single good month

Attributing a change point to the most recent intervention

Using too short a data series

Conflating process metrics with outcome metrics

Stopping the analysis at the first change point

Treating a 90% confidence result as certain

Signal vs noise — the core distinction

Seasonality and patterns

Multiple change points

Concepts that support interpretation

Three Charts, Three Stories

Tampering & Impatience

Variation & SPC

Types of Measures

Behaviour Over Time

False Alarms in Performance Charts

Seasonality Mistaken for Improvement

Interpreting the Output

What to Do Next

Common Cause vs Special Cause

Change Detected: What to Do Next

No Change Detected: What to Do Next