⚠ Interpretation guide

False Alarms in Performance Charts

A false alarm is when a chart appears to show improvement — or deterioration — but the system has not actually changed. Acting on a false alarm is one of the most common and most costly mistakes in improvement work. This page explains what causes them, what they cost, and the decision rules that protect you from reacting to noise.

StepChangeAnalysis.com · Interpretation guide · June 2026

Method: Bootstrap CUSUM · Open the StepChange Analyzer

☰ Contents

What a false alarm is
What false alarms cost improvement teams
The three most common causes
Decision rules: when to wait vs investigate
How Bootstrap CUSUM reduces false alarms
Related concepts

What a false alarm is

A false alarm occurs when a measurement appears to signal a genuine change in performance — but the underlying system has not actually shifted. The data moved. The process did not.

In statistical terms, a false alarm is a Type I error: declaring that something changed when it did not. Every measurement system has a false alarm rate — the probability that normal variation will be misread as a signal. Standard run charts and RAG status reports have high false alarm rates because they are designed to be sensitive, not specific. They flag a lot of things. Most of those flags are noise.

The classic example

A ward's infection rate drops from 4.2 to 2.8 per 1,000 bed days in a single month. The improvement team celebrates, writes it up, and begins planning how to scale the intervention. Three months later the rate is back at 4.1. Nothing changed — the drop was within the normal variation of a stable process. The team acted on a false alarm, consumed resources, and undermined confidence in the measurement system.

What false alarms cost improvement teams

False alarms are not just a statistical inconvenience. They have real organisational costs that compound over time:

Cost1

Tampering — making the system worse

When teams react to a false alarm by changing the process, they introduce variation into a system that was stable. Deming called this tampering. The funnel experiment demonstrates it precisely: adjusting the process in response to each result produces more variation, not less. A system subjected to repeated tampering becomes harder to improve because the signal is buried in the noise that tampering created. See Tampering & Impatience.

Cost2

Resource waste on interventions that were not needed

Every false alarm that triggers an improvement programme consumes time, money, and attention that could have been directed at a genuine problem. In NHS quality improvement, where improvement capacity is limited and demand is high, the opportunity cost of false alarms is significant. Teams that habitually chase noise have less capacity to address genuine structural problems.

Cost3

Loss of trust in measurement

When an improvement team declares success and the metric reverts, or when successive initiatives appear to produce results that then disappear, staff learn to distrust the measurement system. This makes future improvement work harder: people discount signals that might be genuine because they have been burned by false alarms before. The measurement system loses credibility precisely when it most needs it.

Cost4

Improvement fatigue

Teams that repeatedly respond to false alarms experience improvement fatigue: the exhaustion of sustained effort that produces no lasting change. This is one of the most underestimated costs of poor measurement. When people believe their efforts are not working, they disengage — making genuine improvement harder to achieve even when the right intervention is eventually found.

The three most common causes

1. Comparing a single good month to the previous month

A single data point better than the previous one is almost never a signal. In a stable process, roughly half of all months will be above the process mean and half below. A single better month is exactly what you would expect from normal variation. The trap is that the human eye is pattern-seeking: it wants to attribute the good month to whatever intervention is most salient in the narrative. One data point is not a change point.

2. Before-and-after comparisons with a short post-intervention window

Before-and-after comparisons produce false alarms when the comparison window is too short. If an intervention was implemented in January and you compare December to February, you are comparing two single months separated by a process change — but both months are subject to normal variation. A better February proves nothing. Bootstrap CUSUM requires a sustained shift across multiple data points before confirming a change point. This is not a limitation — it is the design.

3. Seasonal patterns misread as improvement

Many healthcare metrics have strong seasonal patterns: A&E performance worsens in winter, infection rates vary by season, referral volumes peak at certain times of year. A seasonal improvement in spring that is compared to a winter baseline will almost always appear as genuine improvement — even if the underlying system has not changed at all. For seasonal series, always compare like-for-like: the same month in successive years, or annual averages. See Seasonality Mistaken for Improvement.

Decision rules: when to wait vs investigate

The practical challenge is not understanding false alarms in theory — it is knowing what to do in real time when a metric moves. These decision rules give you a defensible framework:

✅ Default rule

One or two data points in a different direction: wait and measure. Do not act.

A single month better or worse than average is noise until proven otherwise. The burden of proof is on the signal, not on the null hypothesis. Document the result, note any contextual factors, set a review date, and continue measuring. Acting on one data point is tampering by definition.

When Bootstrap CUSUM confirms a change point: investigate. The algorithm has accumulated evidence across multiple data points and found a sustained shift at your chosen confidence level. This is not a false alarm — it is a genuine signal. Go to Path A: Change detected.

When Bootstrap CUSUM finds no change point: default to waiting. The system is behaving as it always has. Do not launch a new initiative in response to recent variation. Go to Path B: No change detected.

For cases where you are not sure whether a result is a false alarm or a genuine early signal, ask these three questions:

Is this sustained across multiple consecutive data points? One point is noise. Three or more consecutive points consistently above or below the mean, accumulating evidence in the CUSUM statistic, may be a genuine shift.
Does the change point date match the intervention date? If Bootstrap CUSUM places the change point weeks before your intervention, something else caused it. The intervention is receiving credit for a change it did not produce.
Is there a plausible mechanism? Can you explain, in system terms, why this intervention would produce a structural change in this measure at this time? If the mechanism is not clear, the change point may be genuine but unattributed — which is still useful information, but not evidence that your intervention worked.

How Bootstrap CUSUM reduces false alarms

Bootstrap CUSUM is specifically designed to reduce false alarms without sacrificing sensitivity to genuine change. It does this in two ways:

Evidence accumulation. Rather than evaluating each data point independently, Bootstrap CUSUM accumulates deviations from the process mean over time. A single outlier barely moves the CUSUM statistic. A sustained series of consistently above-average results accumulates into a genuine signal. This is why Bootstrap CUSUM is more sensitive to real change and more resistant to false alarms than visual inspection of a run chart or a Shewhart chart that evaluates each point independently.

Bootstrapped confidence levels. The decision threshold for declaring a change point is not assumed from theory — it is calculated by resampling the actual data thousands of times. The confidence level tells you the probability that the detected change is genuine rather than a random feature of the data. At 95% confidence, there is a 5% probability of a false positive. At 99.7%, that probability falls to 0.3% — 1 in 370.

The false alarm rate at different confidence levels

90% confidence: 1 in 10 detected change points may be false alarms — suitable for early warning, not governance decisions. 95% confidence: 1 in 20 — standard working threshold for improvement programmes. 99.7% confidence: 1 in 370 — use when a false positive would trigger a costly or irreversible decision. The confidence level is your dial for controlling the trade-off between sensitivity and false alarm rate.

No method eliminates false alarms entirely. But Bootstrap CUSUM gives you a statistically defensible, data-derived threshold rather than a visual impression — which is the difference between honest measurement and wishful thinking.

Related concepts

Hub page

False Alarms in Performance Charts

What a false alarm is

What false alarms cost improvement teams

Tampering — making the system worse

Resource waste on interventions that were not needed

Loss of trust in measurement

Improvement fatigue

The three most common causes

1. Comparing a single good month to the previous month

2. Before-and-after comparisons with a short post-intervention window

3. Seasonal patterns misread as improvement

Decision rules: when to wait vs investigate

One or two data points in a different direction: wait and measure. Do not act.

How Bootstrap CUSUM reduces false alarms

Related concepts

Interpret Results

Tampering & Impatience

Variation & SPC

Seasonality Mistaken for Improvement

No Change Detected — What to Do

▶ Open the StepChange Analyzer