📚 Reference

Glossary of Key Terms

Definitions of the statistical, analytical, and management concepts used across StepChangeAnalysis.com — with links to the article sections where each term appears in context. Terms may appear in more than one section where relevant to multiple disciplines.

StepChangeAnalysis.com · Last updated May 2026 · 47 terms across 8 sections · All 9 published articles

Jump to section

Bootstrap CUSUM & SPC Charts Deming — Key Concepts Systems Thinking Safety & Hierarchy of Controls Improvement Methods Quality Improvement Clinical & Policy Terms Data & Statistics

No terms matched your search. Try a shorter word or phrase.

Bootstrap CUSUM & SPC Charts

Bootstrap CUSUM Core Method

A statistical method for detecting structural change points in time-series data. Works by accumulating deviations from the process mean over time (the CUSUM line), then using bootstrap resampling to test whether observed turning points are statistically significant or genuine, or could have arisen by chance. Unlike classical CUSUM, the decision threshold is derived directly from the data — making it valid for non-normal data such as counts, rates, and rare events. The confidence level is earned from the data, not assumed from theory.

Explained in: Three Charts → Why Bootstrap? · Hydrogen Plant → Residual CUSUM

See also: X-mR Chart · Run Chart · CUSUM Line · SPC (Improvement Methods)

CUSUM Line (the green line) Reading Charts

The green cumulative sum line is calculated by subtracting the overall series mean from each observation and adding that deviation to a running total. Its slope shows whether the process is above or below its long-run average: rising slope = consistently above the mean; falling slope = consistently below; flat = around the mean. A peak or turning point marks the moment the process changed direction — the most important feature. Steepness indicates how far above or below average the data is running. Changing the confidence level or Turn Length does not move the green line — it is derived entirely from the data.

Explained in: Three Charts → How to Read the Green CUSUM Line · Hydrogen Plant → Two Different Things on the Same Chart

See also: Stage Boundaries · Change Point (Data section)

Stage Boundaries (the blue lines) Reading Charts

The blue horizontal step-mean lines represent the Bootstrap algorithm’s statistically tested verdict on where genuine structural changes occurred. Unlike the green CUSUM line, stage boundaries depend on the analyst’s chosen confidence level and Turn Length. Increasing confidence demands stronger evidence — fewer but better-supported stages. The stage boundaries show what can be statistically defended; the CUSUM line shows what actually happened.

See: Three Charts → Choosing Your Confidence Level

See also: Confidence Level · Turn Length

Confidence Level Settings

The minimum statistical weight of evidence required before Bootstrap CUSUM declares a change point. 90%: 1-in-10 chance the detected change is noise — suitable for early warning. 95%: 1-in-20 — standard working threshold. 99.7% (3-sigma): 1-in-370 — use when a false positive would trigger a costly or irreversible action, or for formal governance submissions. The confidence level is earned by resampling the actual data — not looked up from a table assuming normality.

See: Three Charts → Choosing Your Confidence Level · Hydrogen Plant → Setting the Confidence Level

Turn Length (TL) Settings

The minimum number of observations that a stage must contain before Bootstrap CUSUM will declare a change point. For 3 readings per day where no genuine change could last less than 2 weeks, TL = 3 × 14 = 42. Too low: spurious boundaries from noise. Too high: genuine events missed. Practical rule: set TL to the minimum number of observations you would need to be convinced a genuine change had occurred.

See: Hydrogen Plant → Setting the Turn Length

Bootstrap Loops (iterations) Settings

The number of times the Bootstrap algorithm randomly resamples the data. More loops = more stable results. At 1,000 loops, marginal change points may appear inconsistently. At 5,000, results stabilise for most datasets. For formal governance submissions or publication, 10,000 loops is recommended. For low signal-to-noise datasets — NHS A&E monthly series (SNR = 0.09) or hydrogen plant residuals (SNR = 0.28) — 5,000 loops minimum is essential.

See: Three Charts → Bootstrap Convergence · Hydrogen Plant → Bootstrap Loops Note

Residual CUSUM Industrial Monitoring

A CUSUM variant where the tracked variable is the deviation of an observed measurement from what the process should be producing at its current operating point — the residual — rather than the raw measurement. Strips out legitimate variation caused by production rate, load, or throughput, leaving only the condition signal (such as catalyst degradation). Essential whenever the monitored metric is confounded by a legitimate operating variable.

See: Hydrogen Plant → The Residual CUSUM Method

See also: Signal-to-Noise Ratio (Data section)

X-mR Chart (Shewhart Individuals Chart) SPC Chart

The most commonly used SPC chart in NHS quality improvement. Evaluates each observation independently, discarding all previous history — insensitive to sustained step-changes buried in noise, and invalid for non-normal data where SD exceeds roughly 40% of the mean. CUSUM accumulates evidence; Shewhart charts discard it. The X-mR reports a flat mean of 22.57 across data that started at 42 and ended at 6 — an average of a journey reported as a destination.

See: Three Charts → Chart 1: The X-mR Chart

See also: SPC (Improvement Methods) · Common Cause Variation (Deming section)

Run Chart SPC Chart

Plots observations against the overall median, using run rules — typically runs of 6+ consecutive points above or below the median — to flag potential shifts. More honest than the X-mR for multi-stage datasets but cannot identify how many structural changes occurred, when each happened, what each was worth, or with what confidence. A staircase described as a slope.

See: Three Charts → Chart 2: The Run Chart

Deming — Key Concepts

Deming, W. Edwards Person

American statistician and management theorist (1900–1993). Central argument: a result is the output of a process, and you cannot sustainably change a result without changing the process that produces it. His question applied to every management target: “By what method?” Author of Out of the Crisis (1982). His System of Profound Knowledge comprises four lenses: appreciation of a system, understanding of variation, theory of knowledge, and psychology.

See: NHS A&E → Deming’s Critique · Never Events → The Conclusion Deming Would Have Drawn

See also: Common Cause Variation · Tampering · Joiner: Levels of Fix (Systems Thinking)

Common Cause Variation Variation

Variation inherent to the system as designed — the normal, predictable noise produced by the process itself. Examples: seasonal winter dips in NHS A&E performance; month-to-month fluctuation around a stable mean. Can only be reduced by fundamentally changing the system. Responding to it as if it were a specific problem — introducing a new policy after a bad month — is tampering and will make things worse.

See: NHS A&E → Common Cause vs Special Cause · Three Charts → Deming/Shewhart Terminology

See also: Special Cause Variation · Tampering · Common Cause in QI context (Quality Improvement section)

Special Cause Variation Variation

Variation caused by something outside the normal operation of the system — an assignable, specific cause. Can be positive (a genuine structural improvement) or negative (a deterioration from a specific incident). A Bootstrap CUSUM change point surviving at 95% confidence is a statistically significant or genuine special cause. The most common management mistake: treating common cause variation as if it were special cause, reacting to noise as if it were a signal.

See: NHS A&E → Common Cause vs Special Cause

See also: Common Cause Variation · Change Point (Data section)

Tampering Failure Mode

Adjusting a system still within its natural (common cause) variation based on a single data point or short run of results, making things worse. NHS A&E policy shows tampering at national scale: each new intervention was layered onto a system that had not yet had time to respond to the last one. Bootstrap CUSUM is designed to prevent tampering by requiring statistically significant evidence before declaring change.

See: NHS A&E → Policy Lag & Tampering

See also: Common Cause Variation · PDSA (Improvement Methods)

“By what method?” Deming Principle

Deming’s three-word challenge to any numerical target. Setting a goal without providing the method, resources, and system changes needed to achieve it is “a numerical goal without a method is nonsense.” Applied to NHS A&E: setting a 95% four-hour target without providing the social care discharge capacity that determines whether it is physically achievable. The question is not what the target is — the question is: by what method will you get there?

See: NHS A&E → Deming’s Critique

See also: Hierarchy of Controls (Safety section) · Joiner: Levels of Fix (Systems Thinking)

Deming’s 14 Points Deming Principle

Deming’s 14 management obligations from Out of the Crisis (1982), describing the system-level changes required for genuine quality transformation. The points most relevant to the articles include: constancy of purpose — a steady, long-term commitment to improvement rather than reacting to short-term pressures; cease dependence on inspection — build quality into the process rather than inspecting failures after the fact; break down barriers between departments — the constraint on NHS A&E performance lies in social care, not in A&E itself; eliminate management by fear — A&E staff held accountable for a target determined by social care capacity experience exactly this; eliminate numerical quotas — setting a 95% four-hour target without changing the system that produces the outcome; and remove barriers to pride in workmanship — staff cannot take pride in results they cannot influence. Together the 14 points describe Joiner’s Level 3 fix in Deming’s own words: change the system, not just the people or the process.

See: Never Events → Joiner: Levels of Fix · Carbon Grid → Meadows, Joiner & COMAH

See also: Deming · Joiner: Levels of Fix (Systems Thinking) · “By what method?”

Joiner, Brian — Deming Disciple Deming Connection

Brian Joiner worked directly with W. Edwards Deming and his Levels of Fix framework is best understood as Deming’s System of Profound Knowledge made operational. Deming identified that most organisational problems are system problems, not people problems — that 94% of failures are caused by the system, not the individual. Joiner translated this into a practical three-level diagnostic: Level 1 (fix the output) and Level 2 (fix the process) are where most managers spend their time; Level 3 (fix the system) is what Deming was asking for. Fourth Generation Management (1994) is explicitly a continuation of Deming’s work, extending it into a practical management framework. Joiner’s concept of “constancy of purpose” at the system level directly echoes Deming’s first of the 14 Points.

Full entry: Joiner: Levels of Fix (Systems Thinking section)

See also: Deming · Deming’s 14 Points · “By what method?”

Systems Thinking

Meadows, Donella — Leverage Points Systems Thinking

Donella Meadows identified 12 places to intervene in a system, ranked least to most effective, in Thinking in Systems (2008). Least effective: parameter adjustments (taxes, subsidies, targets). Most effective: paradigm changes — altering the mindset from which the system’s goals arise. Most people instinctively reach for the least effective leverage points because they are most familiar and least threatening. Most policy interventions operate at leverage point 9 (parameters) when leverage point 6 (changing material flows and nodes) or higher is required for genuine structural change.

See: Carbon Grid → Meadows, Joiner & COMAH · Never Events → Policy Timeline

See also: Joiner: Levels of Fix · Hierarchy of Controls (Safety section)

Joiner, Brian — Levels of Fix Systems Thinking

From Fourth Generation Management (1994). Level 1: fix the output — correct problems as they appear without preventing recurrence. Level 2: fix the process — change the process that allowed the problem. Level 3: fix the system — change the system that allowed the faulty process to exist. Most organisations spend most time at Level 1. NHS medication errors, transport emissions, and A&E performance all receive Level 1 and 2 interventions when Level 3 is required.

See: Never Events → Joiner: Levels of Fix · Carbon Grid → Meadows, Joiner & COMAH

See also: Meadows: Leverage Points · Hierarchy of Controls (Safety section) · Joiner as Deming disciple (Deming section) · Leading Indicators — designing the upstream measures that reveal whether a Joiner Level 3 intervention is working before the outcome moves

Goldratt, Eliyahu — Theory of Constraints Systems Thinking

From The Goal (1984). In any system there is always one binding constraint — one weakest link — that limits the throughput of the whole. Improving anything other than the constraint is largely wasted effort. Applied to NHS A&E: the constraint is social care discharge capacity — 13,700 beds per day occupied by patients ready for discharge but with nowhere to go. Targeting A&E performance without addressing social care is managing the wrong part of the system.

See: NHS A&E → The Real Constraint

Senge, Peter — The Fifth Discipline Systems Thinking

Two mechanisms central to the articles: Policy resistance — interventions trigger compensating feedback loops that absorb the change (e.g. raising fuel duty triggers efficiency improvements that maintain total driving). Reinforcing loops — changes that amplify themselves (e.g. carbon price floor made coal uneconomical; renewables filled the gap, further accelerating coal’s exit).

See: Carbon Grid → Why Transport Failed

See also: PDSA (Improvement Methods)

Common Cause vs Special Cause Variation QI Concept

See the full entries in the Deming section. In the QI context: the failure to distinguish common cause from special cause variation is the single most common error in healthcare quality improvement reporting. A bad December in A&E (common cause — seasonal, predictable) triggers a new policy. A slightly better April (equally common cause) is claimed as evidence of success. Neither conclusion is supported by the data. Bootstrap CUSUM filters common cause variation automatically, only declaring change when the evidence passes the chosen statistical threshold.

See: Three Charts → Bootstrap CUSUM Analysis

See also: Common Cause Variation (Deming section) · Special Cause Variation (Deming section)

Clinical & Policy Terms

DOACs (Direct Oral Anticoagulants) Clinical

Oral anticoagulants that directly inhibit specific clotting factors, offering more predictable dosing than warfarin without regular INR monitoring. The four licensed in the UK for AF stroke prevention: apixaban, rivaroxaban, dabigatran, edoxaban. Bootstrap CUSUM identifies three structural change points: adverse event rate doubled in 2012 as DOACs arrived before renal contraindications were fully understood; prescription volumes doubled in 2015 when NICE mandated funding; DOAC adverse event rate fell by two-thirds in 2016 following the ROCKET-AF controversy and EMA renal dosing label update.

See: Anticoagulation → The Anticoagulation Revolution

Four-Hour A&E Target NHS Policy

The standard that 95% of A&E patients should be admitted, transferred, or discharged within four hours. Announced by Tony Blair in January 2000; formally introduced in 2004 at 98%; relaxed to 95% in 2010; missed nationally for the first time in July 2015 and not met since. Bootstrap CUSUM on 184 monthly observations finds four structural stages of decline and not one policy intervention visible as an upward change point at 99.7% confidence across 15 years.

See: NHS A&E → The Four-Hour Target

See also: Tampering (Deming section) · “By what method?” (Deming section)

Sepsis Six Bundle Clinical Protocol

Six time-critical interventions — oxygen, blood cultures, IV antibiotics, IV fluids, lactate measurement, urine output monitoring — to be completed within one hour of identifying sepsis. Bootstrap CUSUM applied to NHS sepsis mortality and admission data asks whether the campaign produced a detectable structural change in outcomes. The ratio of sepsis deaths to sepsis admissions is the most meaningful measure — more informative than raw mortality counts, which reflect both case fatality rate and detection rate improvements simultaneously.

See: Sepsis Six → The Bundle · Sepsis Six → The Key Ratio

Carbon Price Floor (Carbon Price Support) Policy

A minimum price for carbon in UK electricity generation, introduced April 2013. Designed as a revenue instrument. Produced the strongest structural change signal in 35 years of UK emissions data: electricity supply emissions fell 55.4% in 11 years, detected at 99.8% Bootstrap CUSUM confidence. Worked because it changed the economics of coal generation simultaneously for every generator (Layer 2 engineering control) without requiring any individual behaviour change. The proof of concept for what a genuine system-level policy mechanism looks like in CUSUM data.

See: Carbon Grid → The Policy That Worked

See also: Hierarchy of Controls (Safety section) · Unintended Consequences (Systems Thinking)

ULEZ (Ultra Low Emission Zone) Policy

London’s Ultra Low Emission Zone, charging non-compliant vehicles in central London. Introduced April 2019; expanded London-wide August 2023. Bootstrap CUSUM on 11 years of Marylebone Road NO2 data finds four structural stages: an anticipatory effect beginning approximately 18 months before launch; COVID improvement in 2020 (the largest single change); post-COVID recovery; and a further improvement following the London-wide expansion at a site already inside the original zone for four years. The WHO health standard has not yet been achieved despite the legal limit now being met.

See: ULEZ → Key Findings

ARRs — Additional Roles Reimbursement Scheme NHS Policy

An NHS England funding scheme introduced in 2019 as part of the NHS Long Term Plan, paying GP practices and Primary Care Networks to employ non-GP clinical staff alongside traditional practice teams. Roles include clinical pharmacists (the most common), paramedics, first-contact physiotherapists, social prescribers, mental health practitioners, physician associates, dietitians, and occupational therapists. By March 2023, the scheme had funded 17,588 full-time equivalent staff at a cost of £1.027 billion per year, up from 280 FTE in March 2020. The intention was to free GP time for complex cases by routing appropriate work to other clinicians. Bootstrap CUSUM analysis of NHS GP appointments data shows the scheme reached structural scale in mid-2022, producing a detectable step-down in GP workforce share (from 51% to 45% of all appointments) at 99.7% confidence. GP doctor contact rates per 1000 patients remained flat throughout.

See also: GP Appointments Data (GPAD) · Multimorbidity · Theory of Constraints

In context: GP workforce share — what does it mean? · Mid-2022: restructuring, not recovery

Multimorbidity Clinical

The co-existence of two or more long-term health conditions in the same patient. Common combinations include type 2 diabetes with hypertension, heart disease with depression, or chronic obstructive pulmonary disease with musculoskeletal conditions. Multimorbid patients require longer consultations, more frequent contact, and more complex clinical decision-making than patients with a single condition — and cannot safely be managed through simple triage to a specialist role. Over 14 million people in England have multiple long-term conditions, and more than half of all GP consultations are with multimorbid patients. Multimorbidity increased across all older age groups between 2005 and 2019 and continues to rise, driven by improved survival from conditions that previously had higher mortality, rising obesity prevalence and its downstream diseases, and an ageing population. The post-WW2 baby boomer cohort (born 1945–1955), now aged 70–80, is at peak multimorbidity age, representing the demographic demand peak whose management is now arriving at the GP surgery. A second peak is arithmetically visible: the 2020s migration cohort will reach the same age in the 2060s and 2070s.

See also: ARRs · Theory of Constraints · Common Cause Variation

In context: Demand up, capacity down · The wood from the trees

GP Appointments Data (GPAD) NHS Data

The NHS England monthly dataset covering every appointment recorded across all GP practices in England, published since September 2018. Records total appointments, appointment mode (face-to-face, telephone, home visit, video), healthcare professional type (GP doctor, nurse, ARRs staff), and appointment status (attended, did not attend). Three major releases (February 2021, October 2022, October 2025) each cover different time windows and use slightly different classification systems, requiring careful stitching for longitudinal analysis. The article on GP appointments on this site stitches all three releases to produce 80 monthly observations from September 2018 to October 2025.

In context: Data notes and methodology

CFAS II (Cognitive Function and Ageing Study II) Research

A large-scale epidemiological study of cognitive function and dementia in people aged 65 and over, conducted across three areas of England (Cambridgeshire, Newcastle, and Nottingham) between 2008 and 2011. CFAS II was led by the Medical Research Council and produced age- and sex-specific prevalence rates for dementia in the English population. These rates are used as the fixed denominators in the Estimated Dementia Diagnosis Rate (EDDR) calculation — applied each month to the current registered GP population to estimate how many people are expected to have dementia. Because the CFAS II rates have not been updated since 2011, and because actual dementia prevalence may have changed since then (evidence suggests a modest decline in age-specific rates due to better cardiovascular health in successive cohorts), the EDDR denominator may be systematically over- or under-estimating true prevalence. CFAS II found lower prevalence rates than the earlier CFAS I study (1991–1993), which led to a revision of national dementia prevalence estimates and a change in the EDDR methodology in 2017/18.

See also: EDDR

In context: Data notes and methodology

EDDR (Estimated Dementia Diagnosis Rate) NHS Data

The national metric used to monitor how many people with dementia have a formal recorded diagnosis. Calculated monthly by NHS England by comparing the number of people aged 65 and over with a coded dementia diagnosis on their GP practice register to the estimated number expected to have dementia, derived from age- and sex-specific prevalence rates from the Cognitive Function and Ageing Study II (CFAS II). Expressed as a percentage. The national target, set by PM David Cameron’s Dementia Challenge in 2012, is 66.7% — i.e. two-thirds of all people estimated to be living with dementia should have a formal diagnosis. The target was first achieved in November 2015. Bootstrap CUSUM on 2017–2025 data finds two stages at 95% confidence: Stage 1 (2017–2020) mean 67.88%, above target; Stage 2 (2020–2025) mean 63.76%, a structural step-down of 6.1 percentage points driven by COVID diagnostic disruption. As of March 2025, the rate is 65.6% — still below the 66.7% target set 13 years earlier. Important limitation: The denominator is recalculated every month using the current registered GP population applied against CFAS II prevalence rates fixed in 2011. This means the denominator grows automatically as the population ages, regardless of diagnostic activity. A practice diagnosing the same number of patients each month will show a falling EDDR as its patients age. The CFAS II rates have not been updated since 2011, and the 66.7% target is a political round number (two-thirds) rather than a clinically derived figure. Deming would note that this is a metric whose denominator changes for reasons entirely outside the control of those being measured against it.

See also: Multimorbidity · CFAS II

In context: The 66% Target: What the Dementia Diagnosis Data Actually Shows

PM’s Challenge on Dementia NHS Policy

A national policy initiative launched by Prime Minister David Cameron in February 2012, with a second phase in 2015, that prioritised improving dementia diagnosis rates, post-diagnostic support, and research funding. The first Challenge set the 66.7% diagnosis rate target and was accompanied by CQUIN (Commissioning for Quality and Innovation) incentives paying NHS organisations to proactively identify patients at risk of dementia and refer them for assessment. The target was first achieved in November 2015. The second Challenge (2015) doubled the funding commitment and extended the programme. Bootstrap CUSUM analysis confirms that the Challenges produced a genuine structural effect: Stage 1 mean 67.88% (2017–2020) was consistently above the target. When the political focus and financial incentives reduced, the rate began a modest drift before COVID created the structural collapse of 2020–2021.

See also: EDDR · CQUIN

In context: The target and the story behind it

Data & Statistics

Change Point Statistics

The moment in a time series when the underlying process mean permanently shifted to a new level. A Bootstrap CUSUM change point is dated within a confidence window and accompanied by a confidence level indicating statistical strength of evidence. The confidence level at which a boundary survives is itself diagnostic: a boundary at 99.7% represents a large change relative to process noise; one appearing only at 95% was smaller and more marginal.

See: Three Charts → Bootstrap CUSUM Analysis

See also: CUSUM Line · Stage Boundaries

Signal-to-Noise Ratio (SNR) Statistics

The ratio of the magnitude of a genuine change (signal) to the background variability of the data (noise). An SNR below 1.0 means the signal is smaller than the noise on any individual observation — undetectable without a method that accumulates evidence. The hydrogen plant case has SNR = 0.28 at the action threshold. This is why the X-mR chart produces not a single signal across a full year containing a genuine, financially significant step change. CUSUM is specifically designed for low-SNR environments.

See: Hydrogen Plant → Monitoring Challenge