The Model for Improvement
Three questions. A cycle. Repeated until the improvement holds. The Model for Improvement is the most widely used quality improvement framework in healthcare and other industries — and the one most frequently applied without rigorous answers to its second, most important question. This page explains the framework, how PDSA cycles operate within it, and why Bootstrap CUSUM is the objective answer to the question every improvement programme must answer but rarely does.
- Answer the three Model for Improvement questions clearly — including the one most teams skip.
- Design a PDSA cycle that produces learning, not just activity.
- Know when your PDSA data shows genuine improvement versus common cause variation.
- Close the loop honestly: use Bootstrap CUSUM to confirm whether a step change has occurred.
☰ Contents — click to expand
Origin and context
The Model for Improvement was developed by Associates in Process Improvement — Gerald Langley, Ron Moen, Kevin Nolan, Thomas Nolan, Clifford Norman, and Lloyd Provost — and published in The Improvement Guide (first edition 1996, second edition 2009). It integrates Walter Shewhart’s Plan–Do–Study–Act cycle with a structured set of questions that frame the purpose of any improvement effort before the cycle begins.
The model was adopted by the Institute for Healthcare Improvement (IHI) as its primary framework for clinical quality improvement and is now used in healthcare organisations across more than 50 countries. In the UK it underpins NHS improvement programmes, clinical audit, and quality management systems at every level. Beyond healthcare it applies equally to industrial process improvement, policy evaluation, education, and any other setting where a change is being made and its effect needs to be evaluated.
The model has two components: the three questions, and the PDSA cycle. The questions define what you are trying to do. The cycle is how you do it, test it, and learn from it. Neither component is sufficient without the other.
The three questions
Before any PDSA cycle begins, three questions must be answered. They appear simple. They are not. Each one requires specific, pre-committed answers — not general statements of intent.
The questions must be answered in order. An aim without a measure produces activity without accountability. A measure without an aim produces data without direction. A change theory without both produces action without any possibility of learning. Most programmes jump to Question 3 — what should we change? — before Question 2 has been answered rigorously. The result is that the change is implemented, some data is collected, and a narrative is constructed retrospectively to claim success. Bootstrap CUSUM closes that gap: it makes Question 2 answerable with a pre-specified, objective, dated result.
Question 2 — the one most often answered badly
Question 2 is where the Model for Improvement either produces genuine knowledge or collapses into activity reporting. It requires three things that most improvement programmes do not pre-specify:
The right type of measure. Outcome, process, and balancing measures serve different purposes. Question 2 requires an outcome measure — the result that ultimately matters to the person or system being improved. Process measures (compliance rates, audit scores, training completions) answer a different question: are we doing what we said? That is necessary but not sufficient. Answering Question 2 with a process measure is the most common measurement failure in quality improvement, across every sector.
A stable, attributable measure. The measure must not itself change during the improvement programme. If the recording system, coding practice, or measurement methodology changes at the same time as the intervention, it becomes impossible to separate the clinical or operational effect from the measurement effect. The Sepsis Six analysis on this site is the clearest example: the CQUIN coding incentive changed how sepsis deaths were recorded at the same time as the clinical intervention was implemented, making the primary outcome measure uninterpretable.
A pre-specified statistical test. The test of whether a change is an improvement must be stated before the data is collected. “We will apply Bootstrap CUSUM to [outcome measure] and expect a structural change point to appear within [X] time periods at [Y]% confidence, in the direction of [improvement], coinciding with the implementation of [change].” Without this pre-commitment, any result can be rationalised as confirmation after the fact.
In practice, most improvement programmes answer Question 2 with one of three inadequate responses: “we will run regular audits,” “we will track compliance with the new protocol,” or “we will collect staff feedback.” Audits measure current state, not change. Compliance measures process fidelity, not outcome. Feedback measures perception, not result. None of these answers Question 2. None tells you whether the change produced a genuine improvement in the outcome that motivated the programme. All three are useful. None is sufficient.
PDSA — the engine of the Model
Once the three questions are answered, PDSA cycles provide the mechanism for testing changes and building knowledge. The cycle has four phases, each with a specific purpose and a specific relationship to the three questions.
State the objective of this cycle. Make a specific prediction: if we implement change X, we expect outcome measure Y to shift in direction Z within W time periods. Plan how data will be collected, by whom, where, and when. The prediction is the commitment that makes the Study phase meaningful.
Implement the change, preferably on a small scale first. Collect data as planned. Document unexpected observations, problems, and deviations from the plan — these are as important as the planned data. Begin preliminary analysis.
This is the phase most programmes skip or rush. Compare actual results to the prediction made in Plan. Did the outcome measure change in the predicted direction? At the predicted time? Did anything unexpected happen in the balancing measures? Apply Bootstrap CUSUM to test whether a structural change point occurred.
⚠ The tampering danger: If the outcome measure has not yet moved, it may simply be within the expected lag window — not evidence that the intervention failed. Adding new interventions before the lag has elapsed resets the clock. See Tampering and Impatience.
Based on what was learned in Study: adopt the change and standardise it; adapt it and run another cycle with modifications; or abandon it and test a different theory. The Act phase feeds directly into the Plan phase of the next cycle. Learning accumulates across cycles, not within a single one.
The relationship between the three questions and PDSA is precise. Question 1 (what are we trying to accomplish?) defines the aim that governs all cycles. Question 2 (how will we know?) defines the measure and statistical test used in every Study phase. Question 3 (what changes can we make?) defines the change theory tested in each Plan phase. The PDSA cycle is the mechanism by which the theory stated in Question 3 is tested against the measure defined in Question 2, in pursuit of the aim stated in Question 1.
Repeated cycles of learning
A single PDSA cycle rarely produces definitive knowledge. The Model for Improvement is explicitly iterative: each cycle produces learning that informs the next. The diagram below represents how knowledge accumulates across repeated cycles, with each successive cycle building on what the previous one revealed.
1
First test
2
theory
3
test
4
context
5
adoption
The iterative nature of PDSA has an important implication for measurement. A small-scale first test (Cycle 1) may not produce a Bootstrap CUSUM change point — the sample is too small and the signal too weak. This is not evidence that the intervention failed. It is evidence that the test was appropriately cautious. As cycles accumulate and the intervention is tested at wider scale, the Bootstrap CUSUM change point becomes more detectable. Full-scale adoption (Cycle 5 or beyond) should, if the intervention genuinely works, produce a statistically significant structural change in the outcome measure.
Each PDSA cycle should answer a specific question that the previous cycle raised. Cycle 1 might test whether the change is feasible. Cycle 2 tests whether it works in a second context. Cycle 3 tests whether it scales. Each cycle’s Act phase explicitly states what question the next cycle will answer. A programme that runs the same PDSA cycle repeatedly without modifying either the change or the question is not learning — it is confirming what it already knows. The Model for Improvement is a ratchet: each turn should lock in knowledge and advance the position.
Bootstrap CUSUM as the answer to Question 2
Bootstrap CUSUM is the statistical method that makes Question 2 rigorously answerable. It takes the outcome measure defined in Question 2, applies it to the time series data accumulated across PDSA cycles, and asks: did the underlying process mean permanently change, and if so when?
The answer is precise: a change point dated to within weeks, accompanied by a confidence level stating the statistical weight of evidence, and a stage mean showing the magnitude of the shift. This is exactly the form of evidence that governance bodies, commissioners, and improvement teams need: not a trend line, not an audit score, not a compliance rate, but a dated, bounded, confidence-quantified structural change.
📊 How Bootstrap CUSUM answers Question 2
In the Plan phase: State the Bootstrap CUSUM prediction. “Following implementation of [change], we expect Bootstrap CUSUM applied to [outcome measure] to detect a structural change point within [X] time periods at [Y]% confidence.” This is the pre-committed test. It cannot be modified after the data is collected without invalidating the study.
In the Study phase: Run Bootstrap CUSUM on the accumulated outcome data. Three results are possible. A change point appearing at the predicted time at or above the predicted confidence level: the theory is supported — proceed to Act with evidence. A change point appearing at a different time: something changed, but not because of this intervention — investigate what else occurred. No change point: the intervention did not produce a structural change at the specified confidence level — revise the theory or the scale of implementation and run another cycle.
In the Act phase: The Bootstrap CUSUM result determines the action. A confirmed change point at full-scale implementation justifies adoption and standardisation. A change point only at small scale justifies cautious wider testing. No change point across multiple cycles at adequate scale justifies abandoning the current change theory and testing a different one.
“In God we trust. All others bring data.” — W. Edwards Deming
Bootstrap CUSUM can be applied to all three measure types — outcome, process, and balancing. Applied prospectively within a PDSA cycle, it is the Study step made objective. Applied retrospectively to historical data, it identifies when structural changes occurred and whether their timing is consistent with the intervention timeline. Both uses serve Question 2: together they answer whether the change produced a genuine improvement and whether it did so without creating unintended consequences elsewhere in the system.
The Model in practice — worked examples
| Setting | Q1 — Aim | Q2 — Measure & test | Q3 — Change theory | Bootstrap CUSUM result |
|---|---|---|---|---|
| UK electricity grid | Reduce coal generation emissions | Electricity supply emissions (MtCO2e/year). Bootstrap CUSUM change point expected within 2 years of April 2013. | Carbon price floor makes coal generation uneconomical for all generators simultaneously | Change point 2013, 99.8% confidence. −55.4% in 11 years. Theory confirmed. Full analysis. |
| Clinical sepsis care | Reduce in-hospital sepsis mortality | Age-standardised sepsis mortality rate. Bootstrap CUSUM change point expected within 3–5 years of 2013 rollout. | Sepsis Six bundle — six bedside actions within one hour of recognition | No change point in either public mortality series at 90% confidence. Coding change point at 2013 (95.5%). Outcome measure compromised. Full analysis. |
| Patient safety engineering | Eliminate wrong-route medication administration | Annual wrong-route Never Events. Bootstrap CUSUM change point expected following ENFit connector deployment. | ENFit connectors — physical incompatibility makes wrong-route administration impossible | One stage, no change point. Mean 17.5 events/year across 6 years. Engineering solution not deployed. Layer 3 and 4 interventions only. Full analysis. |
| Industrial process | Detect catalyst degradation before efficiency loss exceeds threshold | Residual CUSUM of efficiency vs predicted at current throughput. Change point expected at catalyst replacement decision threshold. | Residual CUSUM monitoring — strips throughput variation, tracks condition signal only | Change point detected at 17% efficiency gap, SNR = 0.28. X-mR chart detected nothing across the same period. Full analysis. |
| Clinical safety — anticoagulation | Distinguish genuine INR variation requiring dose adjustment from common cause variation that should not be acted on | Bootstrap CUSUM on INR time series per patient. A structural change point justifies dose review. Common cause variation within natural process limits does not. | Warfarin dosing — narrow therapeutic range INR 2.0–3.0. Joiner (Fourth Generation Management, Ch.8, p.128) uses anticoagulation management as his worked example of tampering. | Adjusting warfarin dose in response to INR readings within natural process limits is tampering — acting on common cause variation as if it were special cause. Bootstrap CUSUM on the residual warfarin population shows structural change in adverse event rates following DOAC transition. Full analysis. |
| Economic policy | Understand structural changes in UK GDP growth trajectory | Cumulative GDP index (1949=100). Bootstrap CUSUM applied to identify structural stage changes. | N/A — retrospective analysis, not a prospective intervention | 8 distinct stages at 90% confidence. 60-year structural deceleration invisible in annual growth rate data. Full analysis. |
How the Model fails in practice
The Model for Improvement is sound. Its application is frequently not. Four failure patterns repeat across every sector where the Model is used.
The most common failure. An improvement team identifies a problem, identifies a solution, and implements it — without a specific aim statement (Question 1) or a pre-specified outcome measure and statistical test (Question 2). The change is implemented with enthusiasm. Data is collected informally. A narrative is constructed retrospectively. The programme is reported as successful. No one can say whether the outcome improved, by how much, or whether it was caused by the change. This is not quality improvement. It is quality activity.
Question 2 is answered with a process measure: “we will know the change is an improvement when compliance reaches 80%.” Compliance reaching 80% tells you the change is being delivered. It does not tell you whether delivery produced the intended outcome. The three measure types are not interchangeable. Using a process measure as a proxy for an outcome measure produces confident reporting on the wrong question. See the Sepsis Six for the clearest example.
PDSA cycles are run sequentially — Plan, Do, then straight to Act — without completing the Study phase. The results are not compared to the prediction. The prediction, if one was made, is forgotten. The Act phase proceeds on the basis of impressions rather than evidence. This is the failure Deming described as management by experience rather than management by knowledge: decisions made from the result of the last cycle rather than from the accumulated learning of all cycles.
A small-scale PDSA cycle produces promising results. The change is immediately scaled to the whole organisation or system without running additional cycles to confirm the finding at wider scale and in different contexts. The promising result was noise, or context-specific, or the product of a particularly engaged team in a particular setting. At full scale, the effect disappears — but by then the change is embedded, the budget is committed, and the governance reporting has already claimed success. The repeated cycles structure exists precisely to prevent this. Each cycle should confirm the previous finding before scale increases.