📊 Output guide — what every number means

Interpreting StepChange Analyzer Output

You have run the analysis. The Analyzer has produced numbers, a chart, and a summary bar. This page explains what each output means, what it is telling you about your data, and what to do with it.

Back to Interpret Results  ·  the three outcomes (change detected, no change, data problem) and what to do next
StepChangeAnalysis.com  ·  Output guide  ·  June 2026
Method: Bootstrap CUSUM  ·  Open the StepChange Analyzer

📚 Five key terms in plain English

Mean (average) — the typical value of your data over a period. If your ward had 12, 15, 11, 14, and 13 incidents over five months, the mean is 13. The Bootstrap CUSUM uses the mean to ask: has the typical level permanently shifted upward or downward? A change in mean is a change in the underlying level of performance — not just a good or bad month.

Median (middle value) — the middle value when all your data points are arranged in order from lowest to highest. If you have 11 monthly values, the median is the 6th value — half the values are above it, half below. The median is used by run charts (see the Run Chart tab). It is more resistant to extreme outliers than the mean — one unusually high or low month shifts the mean but barely moves the median. The Bootstrap CUSUM uses the mean, not the median, because the mean captures the full magnitude of a shift rather than just its direction.

Standard deviation (SD) — the spread — how much your values vary around the mean. A small SD means values are consistent and predictable — close to the mean most of the time. A large SD means values are spread widely — some months much higher, some much lower. Two wards with the same mean but different SDs have very different processes: one is stable and predictable, the other is volatile. The ±3SD limits show the range within which almost all values (99.7%) would fall if the process were stable.

CUSUM (Cumulative Sum) — a running total of how far your values have drifted from the overall average. Imagine keeping a running score: every month your value is above the mean, add the difference; every month it is below, subtract it.

Three things to read on the green CUSUM line:

Slope — the steeper the line, the further from average. A steeply rising CUSUM means values are running well above the mean — the process is performing significantly better (or worse, depending on direction) than average. A gentle slope means values are only slightly above average. A flat CUSUM means the process is running right at its mean.

Direction — rising or falling. A rising CUSUM means values are consistently above the mean. A falling CUSUM means values are consistently below. The CUSUM cannot rise indefinitely — if the process returns to its mean, the slope flattens; if it falls below the mean, the CUSUM starts to fall.

Turning points — where the slope changes direction. A turning point on the CUSUM — where it shifts from rising to falling or vice versa — is where the algorithm looks for a potential step change. The Bootstrap test asks: is this turning point large enough to be unlikely by chance? If yes, it is declared a change point. The sharpness of the turn indicates how abruptly the process shifted.

If the CUSUM stays close to zero throughout, the process is varying randomly around its mean — no structural shift. The Bootstrap CUSUM finds the turning point where the drift became too large to be explained by random chance.

Bootstrap — the honest confidence test — the word “bootstrap” refers to the method used to calculate how confident the algorithm is in a detected change point. Here is the problem it solves: how do you know whether a change point is real or just a fluke of your particular dataset? The bootstrap method answers this by repeatedly shuffling your data into thousands of random sequences and asking: how often does a change point of this size appear purely by chance? If a change point of this size appears in only 1 in 20 random shuffles, the confidence level is 95%. If it appears in only 1 in 370 shuffles, the confidence is 99.7%. The more shuffles performed (the Loops setting), the more precise the estimate. This approach makes no assumptions about the shape of your data — it does not require a bell curve or any particular distribution — which makes it especially suited to NHS data, which is often skewed, counted, or bounded.

☰  Contents

The summary bar — the headline result

After the analysis runs, the summary bar appears above the chart. It contains the most important information in the output.

📊 Example summary bar

2 stages detected at 99.7% confidence
N = 48   |   Mean = 142.3   |   SD = 18.7

What each part means:

2 stages detected — the algorithm found one change point, dividing the series into two distinct stages with different means. A single change point creates two stages. Two change points create three stages.

at 99.7% confidence — this is the confidence level of the strongest change point detected. See Confidence level below.

N = 48 — the number of data points the algorithm analysed. This should match the number of rows in your CSV minus any skipped rows.

Mean = 142.3 — the overall mean of the entire series across all stages.

SD = 18.7 — the standard deviation of the entire series. A large SD relative to the mean suggests high variation; a small SD suggests a stable process.

If the summary bar says “1 stage detected”

One stage means the algorithm found no structural change point in your data. The series has one mean throughout. This is a result, not an error — it is telling you that the process has not permanently shifted. See No change detected — what it means for what to do next. See also Decide to wait for the decision rules.


Stages — what a structural change looks like

The Analyzer divides your time series into stages — periods where the process was running at a consistent level. Each stage has its own mean, shown as a horizontal stepped line on the Bootstrap CUSUM chart. The transition between stages is the change point — the date when the process permanently shifted to a new level.

1 stage
No structural change

The process has been running at one consistent level throughout the series. No permanent shift detected at your confidence threshold. The overall mean applies to the entire series. Common cause variation is present but no step change.

2 stages
One change point

One structural change point found, dividing the series into two stages with different means. Stage 1 mean and Stage 2 mean are shown on the chart. The change point date is the transition between them. Was Stage 2 mean higher or lower than Stage 1?

3+ stages
Multiple change points

Multiple structural changes found. Each stage has its own mean. Read the stages in sequence — is the trend consistently up, consistently down, or mixed? Multiple stages are common in long series or series spanning major system changes. See Multiple change points.

The stage count and the confidence level are separate things

The number of stages tells you how many change points were found. The confidence level tells you how certain the algorithm is about the strongest one. A result of “2 stages at 90% confidence” means one change point was found, but the evidence is borderline — there is a 1-in-10 chance of this being a false alarm. A result of “2 stages at 99.7% confidence” means one change point was found with very strong evidence. See Confidence level below.


Confidence level — how sure is the algorithm?

The confidence level is the probability that the detected change point is genuine — that is, the probability that a change point of this size would not appear by chance in a stable process. It is calculated by the bootstrap resampling procedure: the algorithm shuffles the data thousands of times and measures how often a change point of this magnitude appears in the shuffled data. The confidence level is 100% minus the false alarm rate.

Confidence level False alarm rate What it means in practice When to use
90% 1 in 10 Sensitive. Will detect smaller changes but will also produce more false alarms. Every 10 stable series you test, one will show a spurious change point at this threshold. Early warning monitoring where missing a real signal is more costly than investigating a false one. Safety-critical monitoring.
95% 1 in 20 Standard. The conventional threshold for most statistical analysis. A change point at 95% confidence is the equivalent of p<0.05 in traditional statistics. Appropriate for most improvement programme evaluation. Most NHS improvement reporting. PDSA Study phase. Standard governance evidence.
99.7% 1 in 370 Conservative. Equivalent to a three-sigma signal in statistical process control. Very unlikely to be a false alarm. Appropriate before making irreversible decisions based on the result. Before scaling nationally, decommissioning a service, or making major capital investment. High-stakes governance decisions.
The confidence level in the summary bar is the threshold setting, not the result

The Conf % setting in the controls panel is the minimum confidence level required for the algorithm to declare a change point. If you set 95% and the algorithm finds a change point at 97.3% confidence, the summary bar will report “detected at 99.7% confidence” — it reports the actual confidence level of the strongest change point found, not just whether it crossed your threshold. If no change point reaches your threshold, the result is “1 stage” regardless of any weaker signals present.


Mean and SD — what the stages tell you

Each stage has its own mean shown on the Bootstrap CUSUM chart as a horizontal stepped line. The Stage Summary tab shows the mean, standard deviation, and number of data points for each stage individually.

📊 Reading the Stage Summary tab

The Stage Summary tab shows a table with one row per stage:

Stage 1
Dates: 2018-01 – 2021-06
N: 42
Mean: 156.2  SD: 12.4
Change point:
Stage 2 — new stage
Dates: 2021-07 – 2024-12
N: 42
Mean: 121.7  SD: 9.8
Change point: 2021-07  99.7%

Start and end dates — the date range of each stage.

N — the number of data points in this stage. A stage with very few points (under 5–6) should be treated with caution — the mean estimate is based on limited data.

Mean — the average value for this stage. Compare Stage 1 mean to Stage 2 mean to understand the direction and magnitude of the change.

SD — the standard deviation within this stage. A much lower SD in Stage 2 than Stage 1 suggests the process is more stable as well as having a different mean.

Change point date — the date the algorithm identifies as the structural transition. Note: the Bootstrap CUSUM dates the confirmation of the change — the actual cause may predate this by weeks or months. Investigate what changed in the system at or before this date.

Confidence — the confidence level of this specific change point.


Reading the Bootstrap CUSUM chart

The Bootstrap CUSUM chart is the main output. It contains three elements that are easy to confuse.

Element What it is Read against Common confusion
Red line with dots Your actual data — the values from your CSV, plotted over time Left axis (your data values) This IS your data. It is not the CUSUM. Each dot is one row of your CSV.
Green line The Bootstrap CUSUM — the running cumulative sum of deviations from the overall mean Right axis (CUSUM value) This is NOT a second data series. When your values run above the mean the green line rises; when below the mean it falls. A near-vertical drop or rise is the CUSUM accumulating rapidly — not a data error.
Blue stepped line The stage means — the mean value for each stage, shown as horizontal steps Left axis (your data values) Each step represents one stage. The vertical jump between steps is the magnitude of the structural change. Larger jumps = larger change.
Reading the green CUSUM line — slope and turning points are the story

Slope tells you how far from average: a steep rise means values are running well above the mean; a gentle rise means slightly above. A steep fall means well below the mean. A flat line means the process is running right at its overall mean.

Turning points indicate potential step changes: wherever the CUSUM changes direction — from rising to falling or vice versa — the algorithm examines whether the shift is large enough to be statistically significant. Each declared change point corresponds to a turning point on the CUSUM line. A sharp, abrupt turn indicates a sudden shift in the process; a gradual curve indicates a slower drift.

Change point date: the vertical step in the blue stage mean line marks the confirmed change point. It is positioned at the first data point of the new stage. Hover over the chart (on desktop) to see exact dates. On mobile, use the Stage Summary tab for precise dates.

The CUSUM arrow guide — ↗ Above Avg  |  ↘ Below Avg  |  → On Avg

The summary bar shows a small CUSUM guide with three arrows. These describe the direction the green CUSUM line is currently moving:

↗ Above Avg — the most recent values are running above the overall mean. The CUSUM is rising. If this continues, a new upward stage may be detected.

↘ Below Avg — the most recent values are running below the overall mean. The CUSUM is falling. If this continues, a new downward stage may be detected.

→ On Avg — the most recent values are close to the overall mean. The CUSUM is flat. The process is stable around its current level.


The X-mR tab — individuals and moving range

The X-mR tab shows two complementary charts. The Individuals chart (X chart) plots each data point against natural process limits. The Moving Range chart (mR chart) plots the difference between consecutive data points, with its own limit.

Output field What it means Signal if...
UNPL — Upper Natural Process Limit The upper boundary of expected variation. Calculated as mean + 3×estimated SD using the average moving range method. About 99.7% of data points from a stable process will fall below this line. A data point above this line is a special cause signal — something unusual happened in that period specifically.
LNPL — Lower Natural Process Limit The lower boundary of expected variation. Mean − 3×estimated SD. Symmetric with the UNPL around the mean. A data point below this line is a special cause signal — unusually low performance or unusually low event rate for that period.
URL — Upper Range Limit The upper limit for the moving range chart — the maximum expected difference between consecutive data points in a stable process. A moving range above this line means the rate of change between two consecutive periods was unusually large — the system shifted abruptly rather than gradually.
Mean The overall mean of the series used to calculate the X-mR limits. This is the same as the overall mean in the Bootstrap CUSUM summary bar.
SD The standard deviation estimated from the average moving range — not the same as the standard deviation calculated directly from the data. The moving range method is more robust to outliers and step changes.
Bootstrap CUSUM and X-mR answer different questions

The Bootstrap CUSUM answers: has the process permanently shifted to a new mean? It looks at the whole series and finds structural step changes. The X-mR chart answers: is any individual data point (or rate of change) outside the expected range for this process? It identifies specific periods that were unusual. Use Bootstrap CUSUM for programme evaluation. Use X-mR to identify specific events worth investigating.



The Run Chart tab — advantages and limitations

The Run Chart tab plots your data over time with a median line. It applies standard run chart rules to detect non-random patterns — sustained runs above or below the median, or too few direction changes. Run charts are widely used in NHS quality improvement and will be familiar to many improvement practitioners.

What a run chart detects

Run chart rules look for patterns that are unlikely to occur by chance in a stable process. The most common rules are: six or more consecutive points on the same side of the median (a sustained shift); and five or more consecutive points all going in the same direction (a trend). These patterns suggest the process is no longer running randomly around its median — something may have changed.

Run Chart Bootstrap CUSUM
Based on Median Mean
Confidence level Not provided — binary signal only Quantified (90%, 95%, 99.7%)
Change point date Not precise — indicates a shift has occurred Dated precisely to the data point
Magnitude of change Not quantified Stage means show size of shift
Multiple change points Difficult — median shifts with data Handled cleanly as multiple stages
Minimum data needed 10–12 points for reliable rules 20+ points recommended
Familiarity in NHS QI Very high — widely taught and used Lower — less familiar but more rigorous
Best for Real-time monitoring, clinical team communication, early-stage improvement Governance reporting, PDSA Study phase, programme evaluation
The run chart’s key limitation — it cannot answer the governance question

A run chart signal says “something non-random is happening.” It cannot say when the change occurred, how large it was, or how confident you can be that it is genuine rather than a false alarm. For a board report, a published evaluation, or a PDSA Study step, you need a dated change point with a confidence level. That is what Bootstrap CUSUM provides.

The practical approach: use the run chart for real-time monitoring and team communication — it is visual, intuitive, and familiar. Use Bootstrap CUSUM when you need the rigorous answer for governance or evaluation.

Advantages of run charts — where they excel

Familiarity: Most NHS QI practitioners know run charts. Presenting one to a clinical team requires no statistical explanation.

Early detection: Run chart rules can signal a shift before enough data exists for Bootstrap CUSUM to confirm it. Six consecutive points above the median can be detected with as few as 12 data points — Bootstrap CUSUM needs 20+ for reliable confidence estimates.

Real-time monitoring: Run charts work well when data arrives one point at a time — each new point either extends or breaks the current run.

No assumptions about distribution: Like Bootstrap CUSUM, run chart rules are non-parametric — they make no assumptions about the shape of your data. Both methods are suited to NHS data that does not follow a normal distribution.

Settings — turn length, loops, and confidence threshold

Three settings control the analysis. Understanding what they do helps you choose the right values for your data.

Setting Default What it does When to change it
Turn Length 5 The minimum number of data points required in a segment before a change point can be detected at either end of it. A turn length of 5 means no change point can be placed within the first or last 5 data points of any segment. Increase to 7 or 8 for short series (under 30 points) to reduce false alarms at the edges. Decrease to 3 or 4 for very long series (100+ points) where early change points matter. Do not set below 3.
Loops 1,000 The number of bootstrap resamples used to calculate the confidence level. More loops = more stable confidence estimates but slower analysis. At 1,000 loops, confidence levels are accurate to roughly ±1–2 percentage points. Use 5,000–10,000 for governance reports or published results where precision matters. Use 1,000 for exploratory analysis. Do not use fewer than 500 — the confidence estimate becomes unreliable.
Conf % 99.7 The minimum confidence level required to declare a change point. Change points below this threshold are not reported as structural changes. Lower to 95% for standard improvement evidence. Lower to 90% for sensitive early-warning monitoring. Raise to 99.7% for high-stakes irreversible decisions. See Confidence level above.
Show 3SD Limits Off Adds ±3 standard deviation bands to the Step Change chart, calculated from the overall series SD. These are reference bands showing where 99.7% of values would fall if the process were stable at the overall mean. Useful for a quick visual overview of how spread the data is. Note these are overall series bands — if the series has multiple stages they straddle all stages and are less precise than the stage-specific X-mR limits on the X-mR tab. For rigorous signal detection, use the X-mR tab limits instead.
3SD limits vs X-mR UNPL/LNPL — what is the difference?

Both are “±3 standard deviation” limits but they are calculated differently and answer different questions:

Show 3SD Limits (Step Change chart) — calculated from the simple standard deviation of the whole series. Applied as fixed bands across all stages. Quick to read but less sensitive to the stage structure of the data.

UNPL/LNPL (X-mR tab) — calculated from the average moving range (Shewhart method). More robust to outliers and step changes. The standard method for statistical process control. These are the limits recommended for identifying special cause signals.

Use the 3SD overlay for a quick visual sanity check. Use the X-mR tab limits for formal signal detection and governance reporting.

The width of the 3SD band tells you about variation — not just the mean

The wider the ±3SD band, the greater the variation in your data. A narrow band means values cluster tightly around the mean — a stable, consistent process. A wide band means values are spread widely — a noisy or unstable process.

This matters for two reasons:

1. Detecting change in variation, not just mean. If your process has become more erratic over time — even if the mean has not shifted — a widening band is the visual signal. A structural change in variation (not just level) is worth investigating even if Bootstrap CUSUM finds no change point in the mean.

2. Understanding why the band may look misleadingly wide. If your series has two stages with different means, the overall SD used to calculate the 3SD band is inflated by the gap between Stage 1 and Stage 2 — even though within each stage the process may be quite consistent. This is why the X-mR limits (calculated from moving ranges, which are not affected by step changes between stages) give a cleaner picture of within-stage variation. A wide 3SD band alongside narrow X-mR limits is a good signal that the width is driven by the step change, not by genuine process instability.

The default confidence threshold of 99.7% is conservative

The Analyzer default is 99.7% — the three-sigma threshold. This is appropriate for high-stakes decisions but may miss genuine change points in shorter series or noisier data. For most NHS improvement programme evaluation, 95% is the appropriate standard. If you are getting “1 stage” results and suspect a genuine change has occurred, try lowering the confidence threshold to 95% and re-running. If a change point appears at 95% but not at 99.7%, it is a real signal but a weaker one — treat with appropriate caution.


Borderline results — when the number is close

Sometimes the confidence level sits close to a threshold — 93%, 96%, 98.5%. These borderline results require careful interpretation.

A borderline result is information, not a failure

A confidence level of 93% does not mean “nearly significant.” It means the algorithm found a change point, but the evidence is not yet strong enough to meet the 95% conventional threshold. This can happen for three reasons:

1. Insufficient data: Fewer than 20–30 data points means the bootstrap cannot estimate confidence levels reliably. The confidence figure understates the evidence. Add more data and re-run.

2. High process variation: If the series has high natural variation (large SD relative to the mean), a genuine step change may not reach 95% confidence until it has been sustained for longer. This is not a false alarm — it is the honest reflection of a noisy system.

3. A genuine borderline result: The change may be real but small. A 93% confidence change point is worth monitoring. Set a review date 6 months ahead and re-run with the extended series. If it is genuine, confidence will increase as more post-change data accumulates.


When the result looks wrong

If the Bootstrap CUSUM result seems implausible — a change point at an unexpected date, a very low or very high confidence, a stage mean that doesn’t match what you know about the data — the most likely cause is a data problem rather than an algorithm problem.

Symptom Most likely cause Fix
Change point at the very start or end of the series An outlier in the first or last few data points is distorting the mean for one stage Check first and last rows for zero values, blanks, or unusual spikes. Run the Data Validator.
Change point at a date when nothing happened Mixed date formats causing rows to be out of chronological order; or a missing period creating a false jump Check date format consistency and chronological order. Run the Data Validator.
Stage mean much higher or lower than expected A zero or blank value being read as a genuine observation, or a comma in a large number creating extra columns Filter value column for zeros and blanks. Check for comma separators in large numbers. See Fix CSV date format.
N is lower than expected Rows being skipped due to blank date cells, unrecognised date format, or header rows in the data Run the Data Validator. Check that your date column is correctly selected in the dropdown.
Always “1 stage” even when you expect a change Confidence threshold set too high (99.7%) for the amount of data; or fewer than 20 data points; or the change is genuine but common cause variation is masking it Try lowering Conf % to 95%. Check N — if under 20, extend the series. Consider whether the expected change was actually structural or may have been seasonal variation.
The Data Validator is the first step when results look wrong

Run your CSV through the Data Validator before adjusting settings or re-interpreting results. Most surprising outputs have a data cause that the Validator will identify and explain. Fix the data problem, re-run the Analyzer, and interpret the new result.

Run Bootstrap CUSUM on your data

Upload your CSV, run the analysis, and use this page to interpret every number in the output.

▶ Open the StepChange Analyzer