What are we actually measuring?

Table of Contents

Part of a series Measuring disease in ALS: a critical appraisal of the ALSFRS-R

Part 1: This article

Part 2: The ALSFRS-R: what it measures, why we use it, and where it fails

Clinical trials in ALS live or die by their outcome measures. The choice of what to measure, how to measure it, and how to model the result determines whether a treatment effect is detectable, whether a regulatory submission is credible, and ultimately whether an effective drug reaches patients or languishes in a file cabinet — is how we measure the consequences of that neuronal loss in living people over time.

This series is about that measurement problem. Specifically, it is about the Revised ALS Functional Rating Scale (ALSFRS-R): why it became the dominant outcome measure in ALS clinical trials, what it does well, what it does poorly, and what we might do instead — or alongside it.

The measurement problem in neurodegeneration
#

A tumor shrinks or it doesn’t — response is measured by imaging criteria that are unambiguous and widely accepted. The biology announces itself on the scan. In neurodegenerative disease, the situation is fundamentally different. The pathological process — neuronal loss, protein aggregation, synaptic failure — unfolds invisibly over years, while its functional consequences accumulate gradually and unevenly across biological systems. There is rarely a single biomarker that captures the whole picture. And unlike a shrinking tumour, functional decline in neurodegeneration has no single legible endpoint — it accumulates gradually, unevenly, and only becomes visible in aggregate.

This forces a choice. We can measure what is happening at the biological level: protein concentrations in cerebrospinal fluid, axonal loss on imaging, electrophysiological signatures of denervation. Or we can measure what is happening at the functional level: what the patient can and cannot do with their body on a given day. Both approaches have merit. Both have serious limitations. And the relationship between them — between biological disease activity and functional disability — is neither linear nor stable across individuals or time.

A quantitative outcome measure is any instrument designed to translate that complexity into a number. The number is what goes into a statistical model. The statistical model is what informs a regulatory decision. Before committing to a given outcome measure, it is worth asking whether that number truly captures what we intend it to.

What makes a good outcome measure?
#

Before evaluating any specific instrument, it helps to establish what we are looking for. A good outcome measure in a clinical trial should satisfy several criteria simultaneously, and they are in tension with each other.

It should be sensitive to change — capable of detecting meaningful differences between treated and untreated patients within a feasible trial duration.
It should be reliable — producing consistent results when administered by different raters, or in different settings, or at different times of day.
It should be valid — actually measuring what it claims to measure, and not something adjacent to it.
It should exhibit interval-level measurement properties, meaning that a one-unit change at the bottom of the scale represents the same quantum of disease progression as a one-unit change at the top.
And it should be feasible — cheap enough, fast enough, and simple enough to be administered at scale across multi-site international trials.

No current outcome measure in ALS satisfies all of these criteria fully. The interesting question is which compromises are tolerable, and for whom.

Why not survival?
#

The most obvious endpoint in a fatal disease is survival. If a drug works, patients should live longer. If it doesn’t, they won’t. Survival is objective, clinically meaningful, and requires no rater training. It is also, in ALS, a deeply impractical primary endpoint for most trials.

The problem is time. Median survival in ALS from symptom onset is approximately 2–4 years, with enormous heterogeneity.¹ Detecting a meaningful survival benefit — say, a 20% reduction in mortality risk — with adequate statistical power would require following thousands of patients for years. The resulting trials would be prohibitively expensive, logistically nightmarish, and ethically complicated. In an era where patients have access to compassionate use programmes, off-label treatments, and natural history registries, maintaining a clean placebo comparison over years of follow-up is increasingly untenable — patients and families will not accept it, and the regulatory environment does not require it. Survival also integrates everything — disease biology, respiratory management, nutritional support, access to palliative care — making it a noisy signal for the specific effect of a drug on neurodegeneration.

For these reasons, survival has largely been relegated to secondary endpoint status in ALS trials, used to support and contextualise results from faster-moving instruments rather than to anchor them.

Why not respiratory function, strength, or biomarkers?
#

The alternatives are plentiful, and each illuminates a different facet of the disease while obscuring others.

Forced vital capacity (FVC) tracks respiratory muscle decline with high precision and has strong prognostic relevance — respiratory failure is the leading cause of death in ALS.² But FVC is a single-domain measure. A drug that slowed limb function decline without affecting respiratory progression would be invisible to it, or nearly so. It also requires spirometry equipment and trained personnel, limiting its use in remote or resource-constrained settings.
Strength testing — whether by handheld dynamometry, megascore composite, or the older Medical Research Council grading — offers direct measurement of motor output. The ATLIS system and related platforms have improved standardisation considerably.³ But strength tests are sensitive to effort, fatigue, and positioning, require in-person administration, and again measure only one dimension of a multisystem disease.
Biomarkers are where the field’s hopes currently reside. Neurofilament light chain (NfL) in plasma and CSF reflects neuroaxonal damage with impressive sensitivity, correlates with disease progression, and changes in response to treatment in some contexts.⁴ But NfL is not yet a validated surrogate endpoint — the regulatory bar for surrogate endpoints requires demonstrated correlation with clinical outcomes that withstands scrutiny across trials and populations, and that evidence base is still being built. Until that work is done, biomarkers function as supportive evidence rather than as the primary evidentiary standard.

Enter the functional composite scale
#

Against this backdrop, the appeal of a functional composite scale becomes clear. If the disease affects bulbar function, upper limb function, lower limb function, and respiratory function — and if no single domain captures the whole picture — then a scale that samples across all of those domains simultaneously has an obvious advantage. It is sensitive to changes anywhere in the nervous system. It integrates information across biological subsystems. And if designed correctly, it can be administered quickly, reliably, and without expensive equipment.

This was the logic behind the original ALS Functional Rating Scale, published in 1996, and its revision in 1999 — the ALSFRS-R — which added greater granularity in the respiratory domain.⁵ Over the subsequent two and a half decades, it became the standard primary endpoint in ALS clinical trials worldwide: present in PRO-ACT, AnswerALS, and virtually every randomised controlled trial in the field. Whatever its flaws — and they are substantial, which is the subject of the next three articles in this series — it has the enormous practical advantage of being everywhere. Any new measure will be evaluated against it, and any analytical innovation that can be applied to existing ALSFRS-R data carries immediate leverage across an enormous accumulated evidence base.

Understanding what the ALSFRS-R is, what it measures, and where it breaks down is therefore not merely an academic exercise. It is a prerequisite for designing better trials, interpreting existing ones, and — ultimately — getting effective treatments to patients faster.

In Part II, we examine the scale in detail: its structure, its strengths, and the long list of limitations that have accumulated in the literature over twenty-five years of use.

Longinetti et al. (2019). Epidemiology of amyotrophic lateral sclerosis: an update of recent literature. ↩︎
Bourke et al. (2006). Effects of non-invasive ventilation on survival and quality of life in patients with amyotrophic lateral sclerosis: a randomised controlled trial. ↩︎
Andres et al. (2012). Validation of a new strength measurement device for amyotrophic lateral sclerosis clinical trials. ↩︎
Benatar et al. (2023). Neurofilament light chain in drug development for amyotrophic lateral sclerosis: a critical appraisal. ↩︎
Cedarbaum et al. (1999). The ALSFRS-R: a revised ALS functional rating scale that incorporates assessments of respiratory function. ↩︎

Part of a series Measuring disease in ALS: a critical appraisal of the ALSFRS-R

Part 1: This article

Part 2: The ALSFRS-R: what it measures, why we use it, and where it fails

Part 3: The multidimensionality problem

The measurement problem in neurodegeneration#

What makes a good outcome measure?#

Why not survival?#

Why not respiratory function, strength, or biomarkers?#

Enter the functional composite scale#

The measurement problem in neurodegeneration
#

What makes a good outcome measure?
#

Why not survival?
#

Why not respiratory function, strength, or biomarkers?
#

Enter the functional composite scale
#