Leading Indicators — Measuring Progress When the Outcome Is 15 Years Away
When a system fix takes 10–15 years to show results in the final outcome, decision-makers need earlier evidence that the intervention is working. This page introduces a six-step method for designing leading indicators — intermediate metrics that are causally connected to the long-term outcome and visible within two to three years. The method is grounded in Deming’s “Production viewed as a system,” Weiss’s Theory of Change, and Joiner’s levels of fix. The central question: if the outcome is 15 years away, what honest evidence of progress can you measure in year two?
- Design leading indicators that are causally connected to the long-term outcome — not vanity metrics.
- Understand why the right metric shifts as a programme matures along its causal chain.
- Recognise a missing change point as a diagnostic finding, not just a disappointment.
- Apply the six-step causal chain method to any long-lead-time system — NHS, education, sport, policy.
SFA — Scottish Football Association (governing body of football in Scotland, based at Hampden Park, Glasgow)
SPFL — Scottish Professional Football League (the top four divisions of Scottish club football, from the Premiership to League Two)
CAS — Club Academy Scotland (the SFA’s framework for governing and part-funding youth player development, covering ages 11–19)
UEFA — Union of European Football Associations (governing body for football in Europe; runs the Champions League and European Championship)
CUSUM — Cumulative Sum (the statistical method used in the StepChange Analyzer to detect structural change points in time-series data)
☰ Contents — click to expand
- The problem with long lead times
- The six-step method
- The causal chain — Scottish football
- Vanity metrics vs leading indicators
- The resulting metric set
- The primary metric shifts as time passes
- How StepChange Analysis connects
- Generalising the method
- Proof points — what the evidence actually shows
- Links to the Improvement Model and Types of Measures
- References
1. The problem with long lead times
Most serious improvement problems have long causal chains. The intervention happens now. The outcome appears years — sometimes decades — later. In between, multiple steps must occur in sequence, each one necessary for the next.
This creates a governance problem. Decision-makers are asked to invest today in something that will not be verifiable until they have left office. Without intermediate evidence, the argument for sustained investment collapses into faith — and faith does not survive budget cycles, political pressure, or a poor run of short-term results. The standard response is to reach for the outcome measure — the number that everyone understands, that governance bodies demand, and that cannot move for a decade. That response is a belief as much as a choice: my job is to report the outcome. Joiner called this Level 3 Deep — a mindset that no technical fix can dislodge. The shift required is to a different belief: my job is to understand the system that produces the outcome, and to measure it at each stage. That is what this page is about.
The standard response to the governance problem is to pick metrics that are easy to measure but not causally connected to the outcome: activity counts, participation numbers, money spent. These are vanity metrics. They go up readily without driving the outcome. They give the appearance of progress while the underlying system remains unchanged.
The NHS has measured millions of A&E attendances, GP appointments, and waiting list entries for 25 years. Not one policy intervention in that period is detectable as a structural improvement in StepChange Analysis. The metrics moved. The system did not. — Why Nothing Has Worked
Deming made the same point structurally in 1950. His “Production viewed as a system” diagram — first presented to Japanese engineers in August 1950 and reproduced in Out of the Crisis (1986, p.4) — shows quality flowing from inputs (suppliers A, B, C…) through the production process to the consumer. His conclusion: “Improvement of quality envelops the entire production line, from incoming materials to the consumer.” You cannot improve the output by measuring only the output. The inputs to the system are where improvement is made — and therefore where improvement must be measured. Leading indicators are, in Deming’s terms, the measures of those inputs and the process that transforms them.
The alternative is to design leading indicators: metrics that sit earlier in the causal chain, are measurable within a short window, and genuinely precede and cause the long-term outcome. These are not proxies for success — they are necessary conditions for it.
2. The six-step method
The method works backwards from the desired outcome. It is a simplified application of Theory of Change — formally proposed by Carol Weiss in 1995 and widely used in international development, public health, and social policy precisely because those fields face the same problem: long lead times, multiple causal steps, and decision-makers who need to see progress before the ultimate outcome arrives. Weiss argued that complex programmes are difficult to evaluate because stakeholders give too little attention to the early and mid-term changes — the “mini-steps” — that must happen before a long-term goal can be reached. Making those mini-steps explicit is the method on this page.
State the long-term goal in measurable terms. Vague outcomes produce vague metrics. “Improve performance” is not an outcome. “Scotland qualifies for and reaches the knockout stage of a World Cup” is. The precision determines what the causal chain leads to — an imprecise outcome allows the chain to be drawn in multiple directions, making it impossible to identify which intermediate steps are genuinely necessary.
Starting from the outcome, ask at each step: what must be true immediately before this can happen? Work backwards until you reach the current intervention point. Draw the chain explicitly — do not skip steps. Every arrow represents an assumption; making the chain explicit forces those assumptions into the open, where they can be tested rather than taken on faith.
For each arrow in the chain, ask: what must be true for this step to reliably produce the next one? These conditions are your candidate metrics. Each must be falsifiable — it must be possible for the condition not to be met. Be suspicious of conditions that are always true regardless of what you do. A condition that cannot fail is not a condition — it is decoration.
From your list of necessary conditions, retain only those that: (a) can be measured within two to three years, and (b) use data that already exists or can be collected without a new bureaucracy. Prefer metrics where collection is already happening — you are assembling, not inventing.
Apply this test to every candidate: could this metric improve substantially without the long-term outcome improving at all? If yes, it is a vanity metric. Discard it.
Leading indicator: “Number of Scottish-developed players making professional debuts per year.” Cannot rise without the development pipeline actually producing professional players. Causally connected. Passes the test.
Every metric must have one named body or role responsible for collecting and publishing it on a fixed schedule. Shared ownership means no ownership. Publish the baseline before the intervention begins — without a baseline, there is no honest before-and-after. The baseline is what StepChange Analysis will eventually test for a structural change point.
3. The causal chain — Scottish football
The chain below reads from intervention (top) to final outcome (bottom). The right column shows the approximate time before each step becomes visible in data, and identifies the four points where leading indicators can be measured.
Each “measure here” point is a leading indicator. Four of the eight steps in the chain are measurable within five years — well before the final outcome can be honestly tested.
4. Vanity metrics vs leading indicators
The distinction matters because vanity metrics are politically attractive — they move quickly, can be reported at press conferences, and rarely get worse. Leading indicators are harder to game because they are causally connected to something that cannot be faked.
GP appointment numbers rose consistently from 2018 to 2024 and were cited repeatedly as evidence of improvement. StepChange Analysis on the same period found that GP doctor contact rates — the metric causally connected to patient access — showed no structural change at all across those years. The system grew. The outcome the system exists to produce did not move.
Vanity metric: Total GP appointments recorded. Moves easily — add appointment slots, telephone contacts, online bookings. Does not require the patient to actually receive GP care.
Leading indicator: Rate of patients who see or speak to a GP within 48 hours of requesting an appointment. Cannot improve without the GP contact system structurally changing. Causally connected. Non-gameable.
Time to outcome: If the leading indicator shows a structural improvement in year one, the downstream outcome — reduced avoidable emergency attendances driven by unmet primary care need — would be expected to appear in StepChange Analysis within 18–30 months. The lag is predictable and pre-committable before the data arrives.
The same trap appears in every domain with long lead times. In education: school inspection ratings improve while pupil outcomes stagnate. In safety: near-miss reporting rates rise while incident rates remain flat. In football development: youth participation numbers climb while the number of players reaching professional level does not change. The metric moves. The system does not.
5. The resulting metric set — Scottish football development
Applying the six steps produces four leading indicators, each at a different point in the causal chain and each visible within two to five years of the intervention beginning — well before the senior team’s tournament results can be honestly tested.
| Metric | Position in chain | Data held by | Visible within |
|---|---|---|---|
| Number of CAS clubs reaching gold licence or above | Environment quality | SFA (annual audit) | 1 season |
| Number of Scottish-developed players making professional debuts per year | Pipeline output | SPFL registration data | 1–2 seasons |
| Number of Scotland age-group teams qualifying for UEFA youth tournaments | International competitiveness | UEFA / SFA records | 2–3 years |
| Percentage of Scotland senior squad from a CAS elite academy | System-to-squad conversion | SFA squad records | 5–8 years |
Each metric passes the vanity test — none can improve without the underlying system changing. Each uses data that already exists. Each has a natural single owner. And each is far enough upstream in the causal chain to be visible well before the senior team’s results change.
6. The primary metric shifts as time passes
This is the insight Weiss’s “mini-steps” argument points to that is most often missed in practice. The four metrics above are not four measures to track simultaneously for 15 years. They are four measures that each become the primary test at a specific window of the causal chain — and then yield primacy to the next one as the programme matures.
At year one, you cannot measure whether Scotland reaches a World Cup knockout round. But you can measure whether CAS licence grades are rising. That is the year-one test. By year three, the meaningful question has shifted to whether the improving environments are producing professional players. By year five, the primary question shifts again to whether those players are internationally competitive. By year eight, the relevant test is whether the senior squad is drawing from the system that was built.
Using the year-one metric as the primary test at year eight tells you almost nothing new. Using the year-eight metric at year one produces no signal — it simply cannot move yet. Each metric is the right measure for a specific window. The evaluation strategy must move along the chain as the programme matures.
The most common failure in long-lead-time improvement is picking one metric at the start and reporting against it for the entire programme — regardless of where the programme has reached in the causal chain. Early on, the chosen metric (often an outcome measure) cannot move yet, so the programme appears to be failing when it may be working as intended. Later, the programme continues reporting an early-chain metric long after the question has shifted to whether the outcome has moved. The metric becomes a governance comfort blanket rather than an honest test.
This sequential logic also applies to Bootstrap CUSUM. You do not run the same CUSUM series for 15 years and wait. You run CUSUM on the metric that is currently the primary test — and when a change point appears there, that is both confirmation that the previous link worked and the signal to shift attention to the next metric. The sequence of change points across the chain, appearing in the right order and with the right lead times between them, is the strongest possible evidence that the causal theory is correct.
If a change point appears in one metric but the next metric fails to follow within its expected window, you have located the broken link precisely — and you know exactly where to investigate rather than re-examining the entire programme.
| Window | Primary metric | Change point confirms | No change point means | Next step |
|---|---|---|---|---|
| Year 1–2 | CAS clubs reaching gold licence or above | Infrastructure investment is reaching academies | Investment not converting: coaching standards or governance failing at club level | Shift to debut counts |
| Year 2–4 | Scottish-developed players making professional debuts | Improved environments are producing players — pipeline working | Environment quality improved but coaching or competitive exposure still failing | Shift to age-group results |
| Year 3–5 | Scotland age-group teams qualifying for UEFA youth tournaments | Players are internationally competitive at youth level | Pipeline producing players but not at international standard | Shift to senior squad composition |
| Year 5–8 | Percentage of senior squad from a CAS elite academy | System is converting academy players to senior international level | Players progressing to professional level but not reaching senior squad: transition pathway failing | Shift to tournament results |
| Year 10–15 | Tournament qualification and knockout stage results | The full causal chain has worked | One or more links failed despite earlier change points — return to chain and identify where sequence broke | Restart: what is now invisible that wasn’t before? |
The “no change point means” column is as important as the confirmation column. A missing change point at any stage is a diagnostic finding, not just a disappointment. It locates the broken link with precision. Without the explicit causal chain, a failure at any stage looks like programme failure. With the chain, it looks like what it actually is: a specific mechanism that is not working, surrounded by mechanisms that are.
The six-step method and sequential table above are technical tools. But the thing that prevents them from being used is not technical — it is a belief. The belief is: my job is to report the outcome measure. That belief is held at every level of NHS governance, in every improvement programme, in every sport governing body that measures tournament results without measuring the academy system that produces the players.
Joiner’s Level 3 Deep — fix the mindset — is the shift from that belief to a different one: my job is to understand and measure the system that produces the outcome. This is not a small adjustment to reporting practice. It is a different theory of what management is for. Deming called it the difference between managing by results and managing by method. The outcome measure tells you what the system produced. The causal chain tells you how — and therefore where to act to produce a different result next time.
Without this belief change, every technical improvement to measurement design hits the same governance wall. Decision-makers continue to demand the outcome number regardless of whether it can yet move. Programmes are declared failures before the causal chain has had time to produce a signal. Interventions are abandoned and replaced — Deming’s tampering — resetting the lag clock and making it impossible for any signal to appear. The technical method is necessary. The Level 3 Deep belief change is what makes it possible to use it.
See: Corridor Care 2029 — where three of four NHS initiatives are Level 1 or Level 2 fixes applied to a Level 3 problem, and the pre-committed prediction is that Bootstrap CUSUM will find no structural change point because the mindset driving the measurement has not changed.
7. How StepChange Analysis connects
Once a leading indicator has been collected for long enough — typically 15–20 observations — the StepChange Analyzer can test whether a genuine structural change has occurred, or whether observed movement is within normal variation.
This is the honest test: not “did the number go up?” but “has the process that generates the number structurally changed?” A change point in a leading indicator, at 95% confidence or above, is the earliest honest evidence that the intervention is working at a systemic level — years before the final outcome can be tested.
A flat CUSUM on a leading indicator, with no change point despite the intervention having had time to act, is equally important: it tells you the intervention is not reaching the causal mechanism assumed. That is Joiner’s diagnosis — the fix is at the wrong level of the system.
Test your leading indicators with your own data
Upload your leading indicator time series. The Analyzer will show you whether a structural change point has occurred — or whether what you are seeing is noise.
▶ Open the StepChange AnalyzerThis is the link between Joiner’s levels of fixes and StepChange Analysis: Joiner tells you where in the system to intervene. The causal chain method tells you what to measure to know if the intervention is working. StepChange Analysis tells you whether what you are seeing in the data is a genuine structural change or noise. Used together, they form a complete improvement analysis framework applicable to any domain with long lead times.
8. Generalising the method
The six steps are domain-independent. The same logic applies directly to NHS and public policy settings:
- NHS workforce planning. Ultimate outcome: adequate staffing ratios in five years. Leading indicators: medical school places filled, foundation training completion rates, specialty training fill rates — all visible within 12–24 months and causally connected to the workforce outcome.
- Education reform. Ultimate outcome: improved literacy at age 16. Leading indicators: proportion of primary teachers trained in structured literacy programmes, phonics screening pass rates at age 6 — measurable years before the secondary outcome appears.
- Public health. Ultimate outcome: reduction in type 2 diabetes incidence in ten years. Leading indicators: pre-diabetes identification rate, referral-to-programme uptake, programme completion — each a necessary condition for the outcome, visible within two years.
- A&E corridor care. Ultimate outcome: elimination of corridor care by 2029. Leading indicator: delayed transfer of care bed-days per 1,000 admissions — the actual constraint, not corridor hours logged or SDEC capacity built. See the full pre-committed prediction →
In every case: map the causal chain backwards, identify necessary conditions, filter for measurability and lead time, test for vanity, assign ownership, publish the baseline. The domain changes. The logic does not.
9. Proof points — what the evidence actually shows
The football development examples used in this page are not hypothetical. Each has a verified source.
France — INF Clairefontaine (1988). The national football centre opened in January 1988, 50km southwest of Paris. Ten years later, France hosted and won the 1998 World Cup, with Clairefontaine serving as the base camp for the winning squad. The centre is one part of a broader network of 16 elite academies supervised by the French Football Federation. The claim that Clairefontaine alone produced the 1998 victory overstates it — it was the flagship of a system, not the system itself. Source: Wikipedia — INF Clairefontaine; FIFA Training Centre interview with FFF Technical Director Hubert Fournier.
Iceland — KSÍ facility programme (mid-1990s planning, built from 2000). The KSÍ began discussions on overcoming its population and climate challenges in the mid-1990s. The first “football house” opened at Keflavík in 2000; eventually 15 such facilities were commissioned, all publicly owned. By January 2016, Iceland had approximately one UEFA-qualified coach per 500 inhabitants, compared to one per 10,000 in England. Iceland qualified for UEFA Euro 2016 — their first major tournament — and reached the quarter-finals. Source: Wikipedia — Football in Iceland; Goal.com — “The Secret Behind the Iceland Miracle.”
Belgium — Project 2000. Following group-stage elimination at UEFA Euro 2000 as co-hosts, the RBFA redesigned its youth development structure. Between 2004 and 2012, Belgium failed to qualify for any major tournament. The investment in domestic youth development produced De Bruyne, Hazard, Lukaku, and Courtois. Belgium topped the FIFA World Rankings for the first time in 2015 and held the top spot for more than three years between September 2018 and March 2022. Source: Sky Sports — Bob Browaeys interview, 2018; UEFA — Developing football in Belgium.
Wales — Dragon Park, Newport (opened 20 April 2013). Dragon Park is the Wales National Football Development Centre, opened by then-UEFA President Michel Platini. The £5m complex was a joint venture between UEFA, the FAW, Sport Wales, and Newport City Council. Wales qualified for UEFA Euro 2016 — their first major tournament since 1958 — and reached the semi-finals. Source: Wikipedia — Dragon Park; UEFA — Developing football in Wales.
10. Links to the Improvement Model and Types of Measures
Joiner’s levels of fixes tell you which level of the system your intervention is targeting. A Level 1 or Level 2 fix applied to a Level 3 problem will not produce a detectable change point in any metric — leading indicator or final outcome. Confirm which level your intervention operates at before designing metrics.
Types of measures — outcome, process, and balancing measures — map directly onto the causal chain. Outcome measures sit at the bottom. Process measures sit in the middle. Leading indicators are process measures chosen specifically because they are measurable early and causally connected to the outcome. Balancing measures — the things that must not get worse — should be designed alongside leading indicators and monitored in parallel.
The practical sequence:
- Apply Joiner’s levels to confirm the intervention is operating at the right level of the system.
- Map the causal chain using the six-step method on this page.
- Identify leading indicators — process measures that are early, causal, and non-vanity.
- Establish baselines and begin collecting data.
- Apply the StepChange Analyzer once sufficient observations exist — typically after 18–24 months.
- Use the result as the honest test: not “did the number move?” but “has the system structurally changed?”