📊 Quality Improvement · PDSA · NHS Improvement Method · Joiner p.141

PDSA Cycle: How to Tell If Your Improvement Is Real

When a run chart or SPC chart shows a system is stable but performing at the wrong level, the instinct is to intervene harder. Joiner’s answer — drawn from page 141 of Fourth Generation Management — is to think harder first. Three strategies — Stratify, Experiment, Disaggregate — reveal what the aggregate data conceals: who is already doing it better, whether any tested change actually moves the measure, and which part of the process is driving the result you need to change.

StepChangeAnalysis.com · Source: Joiner, Fourth Generation Management, p.141 · Open the StepChange Analyzer

The PDSA Study step — done properly

Most NHS improvement programmes implement at scale first and evaluate afterwards. The result: interventions that feel like they worked, data that is inconclusive, and no way to know what to do next. This page shows how to run a genuine PDSA experiment — with a pre-specified prediction, a rigorous Study step, and an honest answer from your own data.

Already know what to do next? Go straight to the tool.

▶ Open the StepChange Analyzer

Next: Interpret results → No change detected — what next?

📋 The three strategies at a glance

1. Stratify — find who is already doing it better

Break the aggregate into subgroups. Which ward, team, or trust is achieving better results within the same system? That is a Bright Spot — and the starting point for a transferable, testable improvement hypothesis. Used in NHS quality improvement to identify positive deviance before designing interventions.

2. Experiment — test whether the change is real

Run a genuine PDSA cycle: write the prediction before implementation, then test it with Bootstrap CUSUM in the Study step. A pre-specified prediction is what separates honest improvement evidence from "the numbers look better this month." The method used by NHS QI teams to confirm structural change.

3. Disaggregate — find what is driving the measure

Divide the process into its component mechanisms. The aggregate metric combines multiple distinct pathways. Disaggregation identifies which pathway is responsible for most of the problem — so the PDSA experiment targets the right constraint, not the most visible one.

☰ Contents

When the run chart shows no improvement — the problem these strategies solve
Strategy 1 — Stratify: find who is already doing it better
Strategy 2 — Experiment: how to run a PDSA cycle that gives an honest answer
- Checking your prediction with the StepChange Analyser
- A worked example — DOAC review programme
Strategy 3 — Disaggregate: find which pathway is driving your measurement
The sequence — how the three strategies work together in a QI programme
The Deming connection — System of Profound Knowledge
The PDSA cycle — where Bootstrap CUSUM fits in the Study step
NHS applied examples

When the run chart shows no improvement — the problem these strategies solve

A system is stable. The run chart or SPC chart shows common cause variation — the process is in control, but at a level that is not good enough. Previous PDSA cycles have not moved it. Previous improvement programmes have not moved it. The temptation is to try a bigger version of the same intervention, implement at scale, or commission another review.

Joiner’s diagnosis is precise: the problem is not insufficient effort. It is insufficient understanding. The aggregate metric obscures three things that are essential for quality improvement — who is doing it differently, whether any tested change actually works, and which specific mechanism within the process is responsible for most of the problem.

Until all three questions are answered, any intervention is a guess. It may be a well-intentioned, evidence-informed guess — but it is still a guess. Stratify, Experiment, and Disaggregate replace guessing with knowing. They are the method that makes the PDSA Study step mean something.

Why aggregate NHS data blocks quality improvement

An aggregate metric — national A&E four-hour performance, trust-level HSMR, ward-level medication error rate — is a summary. Summaries are useful for tracking direction. They are useless for identifying causes. An aggregate that is stable in common cause variation is telling you the average of a set of processes that may be very different from each other. Some wards or trusts may be performing well. Others may be failing. The average obscures both. Stratification is the act of looking inside the average — and it is the first move in any honest quality improvement investigation.

Strategy 1 — Stratify: find who is already doing it better (positive deviance in QI)

Stratification means breaking the aggregate data into meaningful subgroups and asking whether the variation between those subgroups explains the overall result. The subgroups might be sites, teams, patient groups, time periods, geographies, product types, or any other dimension that might plausibly drive different performance.

The stratification question

If the aggregate metric is stable at an unacceptable level, ask: is everyone performing at this level, or does the aggregate conceal some units performing well and others performing badly?

If performance is uniformly poor across all subgroups, the cause is in the shared system — the conditions, resources, or design that all subgroups have in common. The solution is a system redesign that reaches every subgroup simultaneously.

If performance varies significantly between subgroups, the cause is in what differentiates them. Some subgroups have found a way to perform better within the same overall system. Those subgroups are Bright Spots — and the question becomes: what are they doing differently, and can it be standardised?

🔎 How to stratify

Step 1 Choose the stratification dimensions. Start with the dimensions most likely to explain variation: geography (site, region, trust), patient group (age, diagnosis, complexity), time (season, shift, day of week), and process pathway (route, channel, team). Do not stratify by everything at once — choose the two or three dimensions most plausible given what you know about the system.

Step 2 Run Bootstrap CUSUM on each subgroup separately. A change point that is invisible in the aggregate may be clearly visible in a subgroup. Equally, a change point that appears in the aggregate may be driven by a single outlier subgroup — detectable only by looking at each subgroup independently.

Step 3 Identify the Bright Spots. Which subgroups are performing significantly better than the aggregate? Are they sustaining that performance over time — a genuine structural difference, detectable as a consistently lower level in their Bootstrap CUSUM output — or is it a temporary fluctuation? A Bright Spot that is genuinely structurally different is worth studying. See Bright Spots for the investigation framework.

Step 4 Ask what the Bright Spot is doing differently — and whether it can be standardised. Deming’s question applies: “by what method?” A Bright Spot that cannot describe its own mechanism cannot be replicated. A Bright Spot that can describe a specific, transferable practice is the seed of the next experiment.

⚠️ The stratification trap — explaining away variation

Stratification can be used honestly or defensively. Honestly: to find subgroups that are performing better and understand why. Defensively: to explain away poor performance by finding a reason why “our subgroup is different.” The test is simple — does the stratification lead to a transferable insight that could improve the aggregate, or does it lead to a conclusion that nothing can be done? If the latter, the stratification is not analytical discipline. It is rationalisation.

Strategy 2 — Experiment: how to run a PDSA cycle that gives you an honest answer

Experimentation is the act of testing a change on a small scale, with a pre-specified prediction, before implementing it at scale. It is the direct application of Deming’s PDSA cycle — and it is the strategy most consistently absent from NHS improvement programmes, which tend to implement at scale first and evaluate (or not) afterwards.

The pre-specification is not a bureaucratic formality. It is the mechanism that makes the PDSA Study step honest. Without a pre-specified prediction, any subsequent data movement — including random common cause fluctuation in a run chart — can be interpreted as evidence the intervention worked. This is the most common form of false attribution in quality improvement work, and it is almost invisible because it feels like rigour. It is how improvement programmes produce confident-sounding reports while the underlying measure stays flat.

📝 What a pre-specified experiment looks like

Before implementing a change, state in writing:

What will change: the specific intervention, applied to which subgroup, starting when.
What metric will move: the primary outcome measure — not a process measure, not a proxy, but the metric that matters.
Which metric will move first: the leading indicator, if one exists — and the expected lag between the leading indicator change point and the outcome change point.
The direction: up or down.
The timing: within how many periods of the intervention start do you expect a Bootstrap CUSUM change point to be detectable?
The confidence threshold: 90%, 95%, or 99% — and why.
The balancing measures: what could get worse if this intervention works — and you will monitor those simultaneously.

If the change point appears at the predicted time, in the predicted direction, in the predicted metric, at the predicted confidence level — and the balancing measures have not deteriorated — the experiment has produced genuine evidence. That is the Study step of PDSA. Act by standardising and scaling. Then Plan the next PDSA cycle.

If the change point does not appear: the intervention did not work at this level for this constraint. Return to Stratify or Disaggregate to refine the understanding before the next experiment. Do not scale the intervention.

Bootstrap CUSUM is the Study step of PDSA

The PDSA cycle is only as strong as its Study step. Without a rigorous method for detecting whether a change produced a genuine structural shift — rather than a temporary fluctuation or a seasonal effect — the Study step defaults to judgement: “it feels like it worked” or “the numbers are better this month.” Bootstrap CUSUM replaces that judgement with a pre-committed statistical test. It answers, at a specified confidence level, whether the data shows a structural change point — and it dates that change point so you can verify it coincides with the intervention. That is the Study step done properly.

The pre-specified prediction is written before implementation. The StepChange Analyser is how you test it afterwards. Here is the exact sequence:

Step 1 Collect the data in the required format. One row per time period. Two columns: date (YYYY-MM or YYYY for annual data) and the metric value. Include data from before the intervention — you need the pre-intervention baseline so Bootstrap CUSUM can detect where the level shifts. A minimum of 12 pre-intervention periods is recommended; more gives greater detection power.

Step 2 Upload to the StepChange Analyser and set the parameters to match your pre-specification. Set the confidence threshold to exactly the level you committed to in your written prediction — 90%, 95%, or 99%. Do not raise or lower the threshold after seeing the data. Changing the threshold post-hoc is the statistical equivalent of moving the goalposts. Set Turn Length to approximately 10% of your total data length (minimum 5).

Step 3 Click Recalculate Chart. Read the CUSUM line, not the raw data line. The green CUSUM line shows cumulative deviation from the mean. A genuine structural change point appears as a sharp inflection — a clear change in the slope of the CUSUM line — at the point the algorithm identifies as statistically significant. The vertical marker shows the change point date.

Step 4 Compare the change point date to your intervention date. The question is not "did the numbers improve?" It is "did a structural change point appear at the predicted time, in the predicted direction, at the predicted confidence level?" If the change point precedes your intervention, something else caused the shift. If it follows by more than your predicted lag, the intervention may have worked but later than expected — note that and investigate why. If no change point appears, the intervention did not produce a detectable structural change at this confidence level.

Step 5 Check the balancing measures using the same method. Upload the data for each balancing measure and run the same analysis. If a change point appears in a balancing measure in an adverse direction, that is a harm signal — even if the primary metric confirmed the prediction. A confirmed primary change point with an adverse balancing change point means the intervention worked on the target but created a cost elsewhere. Do not scale until that cost is understood.

📋 A worked example — what the written prediction looks like

Before implementing a pharmacy-led DOAC review programme across three wards:

What will change: Pharmacist review of all new DOAC prescriptions within 24 hours of initiation, Wards 7, 8, and 9, commencing 1 September.
Primary metric: Monthly rate of DOAC-related adverse drug reactions per 1,000 patient-days on those three wards.
Leading indicator: Proportion of DOAC prescriptions with documented renal function check at initiation — expected to shift first, within 4–6 weeks.
Direction: Down (reduction in adverse reaction rate).
Timing: A Bootstrap CUSUM change point detectable within 6 months of the intervention start date.
Confidence threshold: 90% — chosen because the dataset is small (monthly data, three wards) and a lower threshold is needed for adequate detection power at this sample size.
Balancing measures: Rate of anticoagulation omissions (risk that review creates delay or omission); time from admission to first DOAC dose (risk of therapeutic gap).

How to check it: On 1 March (six months later), upload the September–February data appended to the pre-intervention baseline to the StepChange Analyser. Set confidence to 90%, X-axis to Month, Y-axis to ADR Rate. Click Recalculate. If a change point appears between September and November and the CUSUM line slopes downward from that point — and the balancing measure uploads show no adverse change point — the prediction is confirmed. Act: standardise and prepare to scale to other wards. Plan the next PDSA.

PDSA phase	What it requires	Bootstrap CUSUM role
Plan	A specific change, a specific prediction, a specific metric, a specific confidence threshold, a specific timeframe. Written down before implementation.	Defines the test: which metric, which threshold, which timeframe. Without this, the Study step has no benchmark.
Do	Implement the change at small scale. Collect data consistently. Do not make additional changes during the Do phase — they muddy the baseline.	Data is collected in the format required for Bootstrap CUSUM input: date and metric value, one row per period.
Study	Test the prediction against the data. Did a change point appear? When? In which metric? At what confidence level? Did balancing measures hold?	Bootstrap CUSUM is the Study step. Run the algorithm at the pre-specified confidence level. Compare the change point date to the intervention date. Check the direction. Check the balancing measures.
Act	If the change point confirmed the prediction: standardise and scale. If not: return to Plan with the new understanding. Do not scale an intervention that did not produce a confirmed change point.	The change point date and magnitude inform the standardisation — you know exactly when the shift occurred and how large it was. Future monitoring runs Bootstrap CUSUM to detect any subsequent reversal.

Strategy 3 — Disaggregate: find which pathway is driving your measurement

Disaggregation means dividing the process into its component mechanisms and managing those mechanisms separately. An aggregate metric — total events, overall rate, national average — is a sum of several distinct pathways, each with its own root cause and its own solution. Managing the total without understanding the components is managing the wrong thing.

Joiner’s insight is that the dominant mechanism — the one pathway responsible for the majority of the total — is rarely obvious from the aggregate. It requires dividing the data by process type, route, mechanism, or cause until the dominant contributor is visible. Once visible, it can be addressed specifically. Fix the dominant mechanism and the aggregate will move. Fix a minor mechanism and the aggregate will barely shift, regardless of how successful the local intervention was.

⛭️ How to disaggregate

Step 1 Map the pathways that contribute to the aggregate metric. For a patient safety metric: list every distinct mechanism by which the adverse event can occur. For a waiting time metric: list every pathway through the system that contributes to total wait. For an error rate: list every step in the process where the error can originate. A fishbone diagram or process map is the right tool here — not statistical analysis. You need to understand the structure before you can measure the components.

Step 2 Measure each pathway separately. What proportion of the aggregate total does each pathway contribute? Run Bootstrap CUSUM on each pathway independently. A pathway that has already improved but is hidden in a stable aggregate is now visible. A pathway that is driving the aggregate upwards is now identifiable.

Step 3 Identify the dominant mechanism. Which pathway contributes the largest proportion of the total? This is the binding constraint within the process — the one that, if addressed, will move the aggregate most. In Goldratt’s terms: this is where the constraint sits within the process itself, once the system boundary question has been answered.

Step 4 Design the intervention for the dominant mechanism — not the aggregate. An intervention designed for the aggregate metric is usually too broad to address any specific mechanism effectively. An intervention designed for the dominant pathway can be precise, testable, and measurable. Run the PDSA experiment on that specific pathway, with Bootstrap CUSUM on the pathway metric — not the aggregate — as the Study step.

⚠️ The dominant mechanism trap — fixing the wrong pathway

The most common disaggregation failure is addressing a visible or politically salient pathway rather than the dominant one. In NHS wrong-route medication errors, the NRFit connector mandate addressed neuraxial-to-IV misconnection — a real problem with a clear engineering solution, and one that generated significant advocacy. But 16 of 20 wrong-route events in 2023–24 were oral-to-IV: a different mechanism, a different root cause, a different solution. An intervention that successfully eliminates neuraxial-to-IV errors entirely would — if oral-to-IV is unchanged — reduce the aggregate by at most 20%. That is the dominant mechanism trap. Fix what is visible and politically tractable rather than what is driving the number.

The sequence — how the three strategies work together in a quality improvement programme

The three strategies are not independent options — they are a sequence. Each one sets up the next, and skipping one reduces the precision of those that follow.

Stage	Strategy	Question answered	Output
1	Stratify	Is the variation between subgroups or within the system? Who is doing it better?	Bright Spots to study. Hypothesis about what differentiates better-performing subgroups. Candidate intervention derived from observed practice, not theory.
2	Experiment	Does the candidate intervention actually produce a structural change point when tested at small scale?	A Bootstrap CUSUM-confirmed change point — or an honest null result that sends you back to the hypothesis. Either way, knowledge rather than assumption.
3	Disaggregate	Which specific pathway or mechanism within the process is responsible for the dominant proportion of the aggregate result?	A precisely targeted intervention designed for the dominant mechanism — not the aggregate. A PDSA experiment that can detect success or failure in the pathway metric specifically.
Repeat	Iterate	Once a change point is confirmed in the dominant pathway, what is the next constraint?	Return to Stratify at the new baseline. Goldratt’s Step 5 applies: do not let inertia become the next constraint. The constraint will move.

The Deming connection — System of Profound Knowledge

Joiner’s three strategies are a direct application of Deming’s System of Profound Knowledge — specifically its four components, each of which maps precisely onto the Stratify/Experiment/Disaggregate framework.

Deming’s component	What it means	Where it appears in Joiner’s three strategies
Appreciation for a system	Understanding that outcomes are produced by the system, not by individuals. The system has structure, interdependencies, and boundaries. Improving it requires understanding those boundaries.	Disaggregate. Mapping the pathways within the process is the act of understanding the system’s internal structure. Identifying the dominant mechanism is identifying where the system most needs to change.
Knowledge of variation	Distinguishing common cause variation (the system) from special cause variation (a specific event). Not reacting to common cause variation as if it were a special cause. Using statistical methods to tell the difference.	Experiment. Bootstrap CUSUM is the application of knowledge of variation to the Study step of PDSA — detecting whether a genuine structural shift has occurred or whether the observed change is within the system’s normal common cause range.
Theory of knowledge	Knowledge requires a theory — a prediction that can be tested. Data alone is not knowledge. An observation that was not predicted in advance is not confirmed evidence. Improvement requires prediction, test, and update.	Experiment. The pre-specified prediction — direction, metric, timing, confidence threshold — is Deming’s theory of knowledge applied to improvement. Without the pre-specification, the Study step has no theory to test.
Psychology	Understanding how people respond to measurement, management, and change. Fear of data produces gaming. Blame produces concealment. Intrinsic motivation produces genuine improvement.	Stratify. Finding Bright Spots — units that are performing better within the same system — is an act of positive psychology. It reframes the question from “who is failing?” to “who has found a better way?” That is a fundamentally different relationship between measurement and the people being measured.

The PDSA cycle — where Bootstrap CUSUM fits in the Study step

The PDSA cycle and Joiner’s three strategies are not parallel frameworks — PDSA is the operational container inside which Stratify, Experiment, and Disaggregate are run. Each strategy generates a PDSA cycle, and the cycles run in sequence. The worked example above shows a single PDSA 2 cycle (Experiment). The three cycles below show how the full sequence connects end to end — from Bright Spot discovery through to confirmed structural change in the aggregate metric.

🔁 Three PDSA cycles — one for each strategy

PDSA 1 Stratify PDSA — find the Bright Spot.

Plan: Hypothesise which subgroup dimension is most likely to explain variation. Predict that at least one subgroup will show significantly better performance than the aggregate. Define “significantly better” in Bootstrap CUSUM terms: a sustained level at least X% below the aggregate, visible as a structurally different baseline.

Do: Collect and disaggregate the data by the chosen dimension.

Study: Run Bootstrap CUSUM on each subgroup. Is any subgroup genuinely structurally better? Is the difference sustained or a fluctuation?

Act: If a Bright Spot is confirmed: investigate what it is doing differently. Produce a transferable hypothesis — a candidate intervention. If no Bright Spot: the variation is in the shared system, not between subgroups. Proceed to disaggregation.

PDSA 2 Experiment PDSA — test the candidate intervention.

Plan: Define the intervention precisely. State the pre-specified prediction: which metric, which direction, within how many periods, at what confidence threshold. Define the balancing measures. Apply to a small number of sites or patients — enough to generate detectable data, small enough to limit harm if the intervention does not work.

Do: Implement. Collect data. Make no other changes during the Do phase.

Study: Run Bootstrap CUSUM at the pre-specified confidence threshold. Did the change point appear? When? Does the date match the intervention? Are balancing measures stable?

Act: If confirmed: standardise the conditions that produced the change point. Prepare to scale. If not confirmed: return to Plan. What was wrong with the hypothesis? Was the intervention at the right level? Was the constraint correctly identified?

PDSA 3 Disaggregate PDSA — scale to the dominant mechanism.

Plan: Having confirmed the intervention works in the experiment, design the scaled application targeted at the dominant mechanism. Define the aggregate change point you expect as a result — the timing, direction, magnitude, and confidence threshold.

Do: Implement at scale across the dominant pathway.

Study: Run Bootstrap CUSUM on both the pathway metric and the aggregate. The pathway change point should appear first; the aggregate change point should follow with a lag proportional to the pathway’s share of the aggregate.

Act: If both change points confirmed: standardise, monitor, and return to Stratify at the new baseline. Identify the next dominant mechanism. If aggregate does not follow pathway: the pathway share of the aggregate was smaller than estimated, or a compensating adverse change occurred in another pathway. Disaggregate further.

NHS applied examples — the three strategies in practice

📋 How the three strategies have been applied on this site

Wrong-route medication errors Stratify: Trust-level analysis would reveal whether wrong-route events are distributed uniformly across NHS trusts or concentrated in a minority. Trusts with zero sustained events over multiple years are the Bright Spots — they have achieved something within the same national system that others have not.

Experiment: Any intervention — pharmacy unit-dose dispensing, separated storage, ENFit rollout — should be tested with a pre-specified Bootstrap CUSUM prediction on the pathway metric before national rollout. Previous mandates (NatSSIPs, MSO, colour coding) were implemented at scale without pre-specified predictions and are not detectable as change points in the aggregate data.

Disaggregate: 16 of 20 wrong-route events in 2023–24 were oral-to-IV. The NRFit mandate addresses neuraxial-to-IV — a real problem, but not the dominant mechanism. Managing the aggregate number without addressing the oral-to-IV pathway specifically is addressing the wrong constraint. See Never Events — Wrong Route.

Anticoagulation safety Stratify: Regional variation in DOAC prescribing appropriateness ranges from 53% to 99% (Bassi 2020). The aggregate national figure conceals regions where almost all prescribing is appropriate alongside regions where half of it is not. The regions performing well are the Bright Spots — they have protocols, pharmacy review processes, or clinical leadership structures that others do not.

Experiment: The COBRRA trial (NEJM, March 2026) is the correct form of this strategy — a pre-specified, controlled experiment testing apixaban versus rivaroxaban with a defined outcome measure. The Bootstrap CUSUM change point in DOAC-specific adverse drug reaction rates at 2016 corresponds to the ROCKET-AF controversy and the subsequent rivaroxaban-to-apixaban switch — a natural experiment with a dateable cause.

Disaggregate: The aggregate adverse event rate combines events from inappropriate prescribing (wrong drug, wrong dose, wrong patient), monitoring failures (renal function not checked), and drug interactions (50% of DOACs not adjusted for renal function by 2019). Each has a different cause and a different solution. See Anticoagulation Safety.

NHS A&E performance Stratify: The national four-hour performance figure is the average of 136 NHS trusts, some of which consistently outperform the aggregate. Those trusts are the Bright Spots. The question is whether their performance is explained by case-mix (lower-acuity populations) or by genuinely transferable practice. Bootstrap CUSUM on individual trust data would separate structural outperformers from lucky averages.

Experiment: No A&E intervention in 15 years was implemented with a pre-specified Bootstrap CUSUM prediction. The consequence: not one improvement change point is detectable in 184 monthly observations. The absence of pre-specified predictions makes it impossible to know whether any intervention worked — or would have worked at a different scale or in a different context.

Disaggregate: “A&E performance” is an aggregate of at least four distinct bottlenecks: inflow (GP access failure, self-referral), triage and treatment capacity, acute bed availability, and discharge delay (DTOC/NCR). The dominant constraint — blocked discharge — sits outside the trust’s system boundary. Improving any of the in-trust pathways without addressing discharge produces a pathway-level change point that does not appear in the aggregate. See Why Nothing Has Worked.

📋 The three questions — applied to your system

Before designing the next intervention, answer these in order:

Stratify: Is anyone achieving significantly better results within this same system? Under what conditions? What are they doing differently? Run Bootstrap CUSUM on subgroup data before designing the intervention.
Experiment: What specific, small-scale, pre-specified test would confirm whether the candidate intervention actually produces a structural change point? Write the prediction — direction, metric, timing, confidence threshold, balancing measures — before implementation.
Disaggregate: Which specific pathway or mechanism within this process is responsible for the dominant proportion of the aggregate result? Is the planned intervention aimed at that pathway — or at a more visible but smaller contributor?

If any of the three questions cannot be answered before implementation, the intervention is a guess. Stratify, Experiment, and Disaggregate replace guessing with knowing.

Run the Study step

Write the prediction first. Then upload your pre- and post-intervention data to the StepChange Analyser. Set the confidence threshold you committed to in advance. Click Recalculate. The CUSUM line tells you whether a structural change point appeared — and exactly when.

▶ Open the StepChange Analyser