FormalPara Key Points for Decision Makers

Quality-adjusted life-years are the primary health outcome in economic evaluations of disease-modifying therapies in relapsing–remitting multiple sclerosis, calculated using utility weights based on the Expanded Disability Status Scale (EDSS) and utility decrements due to relapses and progression to secondary-progressive multiple sclerosis

The EDSS has several limitations. If it is used as the only disability measure in economic evaluations, the long-term clinical and economic implications of disease-modifying therapies may not be properly assessed

In addition to EDSS and the occurrence of recent relapses, the time to complete the Timed 25-Foot Walk test for ambulatory function significantly predicts health utility in people with relapsing–remitting multiple sclerosis and secondary-progressive multiple sclerosis, supporting the use of Timed 25-Foot Walk test to supplement the EDSS and the occurrence of relapses in the characterization of the course of disease progression and the accrual of quality-adjusted life-years in future economic evaluations of disease-modifying therapies for relapsing–remitting multiple sclerosis

1 Introduction

Multiple sclerosis (MS) is a chronic inflammatory and neurodegenerative disease of the central nervous system that leads to damage of axons and myelin [1]. Multiple sclerosis leads to progressive disability, cognitive impairment, limitations in mobility, vision, and speech, pain, fatigue, spasticity, gastrointestinal and urinary dysfunction, as well as emotional, psychological, and mental problems, [1,2,3] which negatively impact the quality of life of people with MS [4, 5] .

Multiple sclerosis can be categorized into two types: relapsing and progressive. Relapsing disease includes clinically isolated syndrome and relapsing–remitting MS (RRMS) [6]. Progressive disease includes primary-progressive MS and secondary progressive MS (SPMS) [6]. Approximately 85% of people with MS are initially diagnosed with RRMS [7,8,9]. Relapsing–remitting MS generally lasts for 8–20 years, by which time it transitions to SPMS [7,8,9].

There is no cure for MS. Disease-modifying therapies (DMTs) are the best available option for people with RRMS [10]. Disease-modifying therapies have been shown to prevent new brain lesions, reduce the frequency and severity of relapses, and delay the progression of disability [10, 11]. As of September 2020, the US Food and Drug Administration approved 20 DMTs for RRMS [10].

Over the past three decades, numerous economic evaluations have assessed the cost effectiveness of DMTs for the treatment of people with RRMS to provide decision makers, payers, and stakeholders with the information needed to determine whether those treatments should be adopted and reimbursed [12,13,14,15,16,17,18,19]. The structure of the decision-analytic models used in these economic evaluations has converged over time [12, 14]. The course of disease progression is characterized by changes in a person’s disability, as measured by the Expanded Disability Status Scale (EDSS), and the occurrence of relapses during the relapsing–remitting phase and after progression to SPMS [12]. Quality-adjusted life-years (QALYs) are the primary health outcome in these models, calculated using utility weights based on the EDSS during RRMS and utility decrements due to relapses and progression to SPMS [12].

The EDSS is an ordinal scale with a range from 0 (normal neurological examination) to 10 (death due to MS), in increments of 0.5, used to measure disability in ambulation and eight functional systems: pyramidal, cerebellar, brain stem, sensory, bowel and bladder, visual, cerebral total, and cerebral mentation [20]. Scores of 0–4.5 represent normal ambulation and measure disability based on neurological examination, while scores of 5.0 and above represent progressive loss of walking ability [20]. The EDSS has been the most commonly used endpoint to measure disability progression in randomized clinical trials (RCTs) of RRMS, [21, 22] and it is well understood and accepted by the neurology and regulatory communities [23,24,25,26]. However, the EDSS has several limitations, including: high intra- and inter-observer variability, [27] it is an ordinal scale and the differences between contiguous scores are variable, [27, 28] it is non-linear and the time spent in the middle scores is shortest, with peaks at EDSS 1.0–3.0 and 6.0–7.0, [27] EDSS levels of 4.0–7.5 are primarily determined based on the distance people can walk and the need for an assistive device, [20] and it cannot detect changes in people with severe disability and in various domains relevant in MS (e.g., upper extremity function, cognition) [27, 28].

The MS Functional Composite (MSFC) was proposed as an alternative to address the limitations of the EDSS [29, 30]. The MSFC includes three measures: the Timed 25-Foot Walk (T25FW) test for ambulatory function, the 9-Hole Peg Test (9HPT) for upper-extremity function, and the Paced Auditory Serial Addition Test (PASAT) for cognition [29, 30]. The MSFC has been used as an endpoint in clinical trials of MS, although five times less than the EDSS [23]. When used, the MSFC has generally been a secondary endpoint along with the EDSS [21, 22, 31, 32]. Although the MSFC covers multiple major MS domains and has been reported to be highly reliable and correlated with the EDSS, health-related quality of life (HRQoL), and other important clinical and economic indicators, its responsiveness is not always better than the EDSS and also has several limitations [25, 28, 29, 33,34,35,36,37]. To address the individual limitations with the EDSS and the MSFC, endpoints combining the EDSS with the MSFC, or the MSFC individual components, have been proposed and used to assess the efficacy of DMTs in clinical trials of RRMS [24, 38, 39].

Given the limitations of the EDSS, if it is the only disability measure used in the decision-analytic models of economic evaluations of DMTs, the long-term clinical and economic implications of DMTs may not be properly assessed. Furthermore, with the growing interest in evaluating the efficacy of DMTs using multiple disability measures in clinical trials of MS, it may follow that additional disability measures could be more commonly included to supplement the EDSS in future economic evaluations of DMTs. However, as described by Hernandez et al., the use of multiple disability measures can pose additional challenges and will inevitably increase the complexity and data demands of these models [12].

The aim of this study is to assess if additional disability measures—T25FW, 9HPT, and PASAT—significantly contribute additional information on meaningful outcomes for decision makers, such as utility to calculate QALYs, which would otherwise not be captured by the EDSS and relapses. If additional disability measures significantly predict utility after accounting for the effect of the EDSS and relapses, these additional measures of disability could be considered in future economic evaluations of DMTs in RRMS.

2 Methods

2.1 Data Source

Data from the Multiple Sclerosis Outcome Assessments Consortium (MSOAC) Placebo Database were used in this study [40, 41]. The MSOAC Placebo Database currently includes 2465 individual records of people diagnosed with relapsing–remitting, secondary-progressive, and primary-progressive forms of MS, from the placebo arms of the clinical trials in Table 1 [40, 41]. The data from the placebo arms were contributed by industry members of MSOAC and include data on demographics, MS type (e.g., RRMS, SPMS), medical history, disability measures (e.g., EDSS, T25FW), patient-reported outcome measures (e.g., RAND 36-Item Health Survey 1.0 [RAND-36]), and relapse information [41]. The database does not contain imaging data, treatment data, or standard-of-care or active comparator data. All data are fully anonymized and de-identified, and the individual clinical trials are not identified [40, 41]. The MSOAC Placebo Database is available to researchers who submit, and are approved for, a request for access [41, 42].

Table 1 Clinical trials informing the Multiple Sclerosis Outcome Assessments Consortium (MSOAC) Placebo Database

2.2 Outcome Measure

The MSOAC Placebo Database contains data on the RAND-36, a generic profile instrument designed to yield scores on multiple aspects of HRQoL [54, 55]. The RAND-36 comprises 36 items that assess eight health dimensions: physical functioning (ten items), role limitations caused by physical health problems (four items), role limitations caused by emotional problems (three items), social functioning (two items), emotional well-being (five items), energy/fatigue (four items), pain (two items), general health perceptions (five items), and one item to assess the change in perceived health in the last 12 months [54, 55]. The RAND-36 includes the same items as the 36-Item Short Form Survey (SF-36) and their scoring procedures yield equivalent results, except for the pain and general health dimensions, for which the scoring procedures differ [54,55,56]. The RAND-36 has been widely used in clinical studies and can provide useful descriptive information about the impact of interventions in HRQoL [55, 57]. However, the RAND-36 is not preference based and cannot be used to generate QALYs in economic evaluations [57].

The Short-Form Six-Dimension (SF-6D) was included in this study as the dependent variable. The SF-6D is a preference-based measure of HRQoL, which produces utility scores anchored at 1 for perfect health and 0 for death [57]. The RAND-36 from the MSOAC Placebo Database was converted to SF-6D utility values using preference weights from a sample of the general population in the UK with the Statistical Analysis Software (SAS) program provided by The University of Sheffield under a non-commercial license [57,58,59,60].

2.3 Potential Predictors of Utility

This study focused on four disability measures and their relationship to the SF-6D: EDSS [20], T25FW to assess mobility and leg function [61], 9HPT for the dominant and non-dominant hand to assess upper extremity function [62], and the PASAT to assess cognitive function [63]. The EDSS was modeled as a categorical variable with ten levels: 0, 1–1.5, 2–2.5,…,9–9.5 [20]. The T25FW and 9HPT were modeled as continuous variables measured in seconds with a range of 0–180 and 0–300, respectively [29]. PASAT was modeled as a continuous variable measured as the total number of correct answers out of 60, range 0–60 [29].

The occurrence of relapse(s) within the previous 6 months was also considered as it has been shown to be a significant predictor of utility in people with MS [64, 65], relapse rate is the most frequently primary endpoint in RRMS clinical trials [21, 22], and utility decrements due to relapses are included in economic evaluations of DMTs to calculate QALYs [12]. The occurrence of relapse(s) within the last 6 months was modeled as a categorical variable with two levels: yes and no.

Demographics including baseline age in years, sex, and race were considered as they have been included in previous analyses of predictors of utility in people with MS [64, 66]. Baseline age was modeled as a continuous variable in years, sex as a categorical variable with two levels (male and female), and race as a categorical with five levels, including White, Black or African American, Asian, Hispanic, and Other. Degree of education, disease duration, and years since diagnosis have also been shown to be significant predictors of utility in people with MS but were not available in the MSOAC Placebo Database [64, 66].

2.4 Data Handling and Statistical Analyses

RAND-36 data in the MSOAC Placebo Database are available for clinical trial visits post-randomization. Therefore, values for the potential predictors of utility available prior to randomization were not included in the analyses. For RAND-36, duplicates of items in the same visit were removed for the conversion of RAND-36 to SF-6D. For T25FW and 9HTP, the maximum possible value was set to 180 and 300 seconds, respectively, and for PASAT to a maximum of 60 correct answers. If a visit included two assessments of T25FW, 9HPT, or PASAT, the average value was used. For relapse(s) within the last 6 months, only confirmed MS relapses were considered in the analyses. Relapses with unknown start and end dates were not included in the analyses.

Two repeated-measures mixed-effects models were conducted to estimate the effects of the potential predictors of utility in SF-6D in people with RRMS and SPMS. The cut-off for statistical significance was p < 0.05. All analyses were conducted using SAS® Studio Version 3.8 (SAS Institute Inc., Cary, NC, USA). The SAS code is available in the Electronic Supplementary Material (ESM), except for the program to convert RAND-36 to SF-6D utility values, which was licensed from The University of Sheffield [60].

To validate our findings, we compared the mean SF-6D utility values predicted using the repeated-measures mixed-effects models with the mean SF-6D utility by EDSS observed in the MSOAC Placebo Database and with the SF-6D utility by EDSS for people with RRMS and SPMS in the UK reported in the study by Hawton et al. [65].

3 Results

From the 2465 individual records included in the MSOAC Placebo Database, 1580 were from people diagnosed with RRMS and 555 with SPMS. Data on RAND-36, converted to SF-6D utility values, were available for 274 people with RRMS and 420 with SPMS, which were included in the analyses.

Baseline characteristics are shown in Table 2. People with RRMS were younger than people with SPMS, with a mean age of 36.8 and 49.9 years, respectively. Approximately, two thirds of the people were female and more than 94% were White in both groups. People with RRMS had a lower level of disability as measured by the EDSS, T25FW, 9HPT, and PASAT. The baseline utility measured by the SF-6D was higher for people with RRMS.

Table 2 Baseline characteristics

In the repeated-measures mixed-effects models (Table 3), a higher level of EDSS, a longer time to complete the T25FW test, and a relapse in the last 6 months were significant predictors of a lower utility score in people with RRMS and SPMS. The time to complete the 9HPT with the dominant or non-dominant hand, the number of correct answers from the PASAT, baseline age, sex, and race were not significant predictors of utility for people with RRMS or SPMS.

Table 3 Results of repeated-measures mixed-effects models: predictors of health utility in relapsing–remitting multiple sclerosis (RRMS) and secondary-progressive multiple sclerosis (SPMS)

We compared the mean SF-6D utility values calculated using the significant predictors of the repeated-measures mixed-effects models (i.e., intercept, EDSS, T25FW, and a relapse in the previous 6 months; Table 3) with the mean SF-6D utility by EDSS observed in the MSOAC Placebo Database (Fig. 1). For the mixed-effects models, we used the mean baseline T25FW by the EDSS (Table 2) and no relapses in the last 6 months. The results of this comparison show that the observed mean SF-6D utility by the EDSS and the predicted SF-6D using the mixed-effects models are different. This difference suggests that utility values based solely on the EDSS do not fully capture the impact of both EDSS and T25FW, which confirms the findings of the mixed-effects models: after accounting for the impact of EDSS, T25FW has an additional significant effect predicting utility in people with RRMS and SPMS.

Fig. 1
figure 1

Mean Short-Form Six-Dimension (SF-6D) predicted values compared with observed mean SF-6D utility by the Expanded Disability Status Scale (EDSS) from Multiple Sclerosis Outcome Assessments Consortium (MSOAC) Placebo Database. RRMS relapsing–remitting multiple sclerosis, SPMS secondary-progressive multiple sclerosis

To confirm our findings with another population for which SF-6D utility data are available, we compared the mean SF-6D utility values calculated using the significant predictors of the repeated-measures mixed-effects models (Table 3) with the SF-6D utility by the EDSS for people with RRMS and SPMS in the UK reported in the study by Hawton et al. (Fig. 2) [65]. It should be noted that the study by Hawton et al. did not report SF-6D utility for EDSS 4–4.5 in RRMS because there were fewer than ten observations, and in SPMS SF-6D utility was reported only for EDSS 6–6.5 and 7–7.5 [65]. The pattern of SF-6D values for RRMS and SPMS from this study is broadly consistent with the study by Hawton et al. [65]. We observed the same inconsistency identified in their study, as the SF-6D value for RRMS was higher for EDSS 6–6.5 compared with EDSS 5–5.5 [65]. As discussed in the study by Hawton et al., this inconsistent finding may reflect complexities with the use of the EDSS in clinical practice, the limitations of its psychometric properties, the effect of coping strategies of people at the interfaces of EDSS 5–5.5 and 6–6.5, or these findings may have occurred randomly [65].

Fig. 2
figure 2

Mean Short-Form Six-Dimension (SF-6D) predicted values compared with mean SF-6D utility by the Expanded Disability Status Scale (EDSS) from a comparable study. RRMS relapsing-remitting multiple sclerosis, SPMS secondary-progressive multiple sclerosis

4 Discussion

In this study, we have demonstrated that there is a significant inverse relationship between the time to complete the T25FW test and utility for people with RRMS, after accounting for the effect of the EDSS and relapses. The time to complete the 9HPT and the number of correct answers from the PASAT were not significant predictors of utility for people with RRMS. These findings are consistent with a recent multicenter 3-year prospective study conducted by Heesen et al. designed to understand perceptions on the value of 13 bodily functions for 171 people with RRMS and for their physicians [67]. In the study by Heesen et al., the 13 bodily functions included wakefulness and alertness, bladder control, normal skin sensations, bowel control, thinking and memory (cognition), mood, lack of pain, power and coordination of hands, sexuality, speech, swallowing, visual function, and walking [67]. For people with RRMS, visual function (23%) followed by cognition (17%), walking ability (16%), and lack of pain (14%) were the most relevant. For physicians, walking ability was the most relevant (38%), followed by cognition (18%); visual function did not gain a high priority for physicians (8%) [67]. Power and coordination of hands was the most relevant for 4% of people with RRMS and for 6% of their physicians [67].

In people with SPMS, there is also a significant inverse relationship between the time to complete the T25FW test and health utility, after accounting for the effect of the EDSS and relapses. The time to complete the 9HPT and the number of correct answers from the PASAT were not significant predictors of utility for people with SPMS. These findings are consistent with those from a study by Heesen et al., in which people diagnosed with MS and a disease course less than 5 years (n = 84) and longer than 15 years (n = 82) were asked about their perceptions on the value of 13 bodily functions, including: walking, power and coordination of hands, normal skin sensations, lack of pain, bladder control, bowel control, visual function, wakefulness and alertness, thinking and memory (cognition), speech, swallowing, mood, and sexuality [68]. In the group of people with MS and a disease course longer than 15 years, walking was the most relevant bodily function (28%) followed by visual function (24%) [68]. Cognition was valued as the most relevant for 15% of the participants, and power and coordination of hands by 3% [68]. As people with RRMS generally transition to SPMS after 8–20 years, [7,8,9] we considered the findings from the group of people with MS and a disease course longer than 15 years in the study by Heesen et al. to be a good proxy of the perceptions of value of bodily functions for people with SPMS [68].

Two systematic literature reviews of modeling approaches used in cost-effectiveness analyses of DMTs for RRMS by Hawton et al. and by Hernandez et al. have recommended the use of additional measures of disability to supplement EDSS and the occurrence of relapses in the characterization of the course of the disease to properly assess the long-term clinical and economic implications of DMTs for RRMS [12, 16]. The findings from this study support the recommendation by Hawton et al. and by Hernandez et al., suggesting the addition of T25FW to characterize the course of disability and the accrual of QALYs in future economic evaluations of DMTs for RRMS.

To include T25FW in future economic evaluations, the efficacy study (e.g., RCT) for the DMT of interest would be used to derive predictive equations or other relevant modeling approaches for disability measured by the EDSS and T25FW, and for the occurrence of relapses for the DMT of interest and its comparator. A key consideration is that changes in EDSS, T25FW, and the occurrence of relapses are interrelated, [34, 69] meaning that changes in one are likely to trigger changes in the others, concurrently or at a later time point [70, 71]. Failure to properly capture these interrelated changes would lead to under- or over-estimates of the treatment effects on EDSS, T25FW, and relapses, resulting in incorrect estimates of incremental cost-effectiveness ratios. For relevant comparator DMTs for which data are not available in the efficacy study informing the economic evaluation, appropriate indirect treatment comparisons (e.g., a network meta-analysis) would be needed for EDSS, T25FW, and relapses, as it has been previously done for EDSS and relapses [72,73,74]. Finally, as EDSS and T25FW change over time and relapses occur in the model, QALYs would be accrued using the corresponding utility weights from the repeated-measures mixed-effects models presented in Table 3. Double counting would not occur because the mixed-effects models presented in Table 3 include EDSS, T25FW, and relapses. Therefore, the mixed-effects models capture the significant impact of T25FW in addition to the effect that the EDSS and relapses have on utility.

Based on the findings from two systematic literature reviews of outcome measures in trials in multiple sclerosis, T25FW has been collected in RCTs for all approved DMTs (including the first approved DMTs interferons and glatiramer acetate), [21, 22] and will likely continue to be collected in future RCTs of MS, owing in part to the growing interest in evaluating the efficacy of DMTs using multiple disability measures in clinical trials of MS [75]. T25FW has been reported for various DMTs, for example, dimethyl fumarate, [76] fingolimod and interferon beta-1a, [77] natalizumab, [78] and peginterferon [44]. The MSOAC Placebo Database could serve as a source of T25FW, EDSS, and relapse data for other relevant DMTs such as alemtuzumab, glatiramer acetate, and teriflunomide, [41] if data from the treatment arms of the clinical trials in the database are available to external researchers. If T25FW data are not reported or available for a specific comparator DMT, that comparator could not be in the economic evaluation analyses using T25FW, EDSS, and relapses. The availability of data for all external comparators is a general challenge for economic evaluations across disease areas.

Endpoints combining EDSS with the MSFC or its individual components have been proposed and used to assess the efficacy of DMTs in clinical trials of RRMS to overcome the limitations associated with using the EDSS [24, 25, 38, 39, 75]. Unlike the EDSS, endpoints such as the T25FW and other components of the MSFC have recently started to be used in clinical practice to objectively capture disability related to cognition, visual function, dexterity, and ambulation and to monitor disease status and response to DMTs [79, 80]. The introduction of treatment targets in MS, such as ‘no evidence of disease activity’, to guide clinical decision making, have highlighted that the EDSS, relapse rates, and magnetic resonance imaging (MRI) markers may not be the exclusive and appropriate factors to systematically monitor people with MS in clinical practice, and that other sensitive tests are needed to measure and track disease activity and progression [80, 81]. T25FW, 9HPT, Low Contrast Sloan Letter Chart (visual test), and the Symbol Digit Modalities Test (cognition test) have been proposed to monitor the progression of disability in clinical practice, as part of a multifactorial MS decision model (which also includes the domains of relapse, neuropsychology, and MRI findings), to support early treatment decisions and uncover treatment failure in clinical practice [80]. Advances in technology will also contribute to the relevance of endpoints such as the T25FW, 9HPT, and SDMT outside the clinical trial setting, enabling people with MS to contribute to healthcare outcomes and monitor their disability status remotely [79, 81,82,83]. With a potential increase in clinical trials using multiple disability endpoints and their growing relevance in clinical practice, the inclusion of additional disability endpoints in economic evaluations of DMTs for RRMS may also become more relevant.

4.1 Limitations

First, the RAND-36 was converted to SF-6D using preference weights from a sample of the general population in the UK [60]. However, the UK population may have different preferences to non-UK populations. For researchers interested in using the SF-6D in other countries, The University of Sheffield website provides the contact information of investigators who have conducted valuation surveys in other countries [60]. Second, the findings from this study may be limited in terms of their generalizability beyond the clinical trials included in the MSOAC Placebo Database. The 2465 individual records that form the MSOAC Placebo Database come from the placebo arms of the pivotal clinical trials of DMTs approved for the treatment of RRMS, including fingolimod, natalizumab, peginterferon, and teriflunomide (Table 1). However, data from other relevant pivotal clinical trials of DMTs approved for the treatment of RRMS with a placebo arm (e.g., DEFINE [dimethyl fumarate vs placebo], [84] CONFIRM [dimethyl fumarate vs glatiramer acetate and vs placebo], [85] CLARITY [cladribine vs placebo] [86]) are not included in the MSOAC Placebo Database, and data from relevant RRMS pivotal trials using an active control arm (e.g., CARE-MS 1 and CARE-MS 2 [alemtuzumab vs interferon beta-1a], [87, 88] OPERA I and II [ocrelizumab vs interferon beta-1a] [89]) are not available to the research community to the best of our knowledge. If these data become available to the research community, they can be used to confirm the findings of this study and support their generalizability. Third, the findings from this study are limited by the small sample size, driven by the number of individual records of people with RRMS and SPMS in the MSOAC Placebo Database that include RAND-36. Additional data from clinical trials of DMTs owned by pharmaceutical companies (some of which have several years of follow-up) and registries capturing HRQoL and/or utility data, as well as disability measures and relapses, are needed to confirm the findings of this study. Fourth, degree of education, disease duration, and years since diagnosis have been shown to be significant predictors of utility in people with MS [50, 51]. However, these were not available in the MSOAC Placebo Database. Fifth, there were no data for EDSS 7–7.5, 8–8.5, and 9–9.5 for RRMS and EDSS 8–8.5 and 9–9.5 for SPMS. Therefore, the repeated-measures mixed-effects models in Table 3 cannot be used to predict utility scores at those levels of disability measured by the EDSS. For economic evaluations attempting to use the mixed-effects models presented in this study, one possibility could be to derive the coefficients for the missing EDSS levels by fitting a linear or other appropriate regression to the coefficients of the EDSS levels available in the mixed-effects models.

4.2 Future Analyses for Primary-Progressive Multiple Sclerosis

The focus of this study was on RRMS and SPMS, as people with these two types of MS are modeled in economic evaluations of DMTs for the treatment of RRMS. The MSOAC Placebo Database currently includes data from one clinical trial conducted in people with primary progressive MS (Table 1). In the SAS code available in the ESM, we have included lines of code highlighted in yellow that would allow researchers with access to the MSOAC Placebo Database to explore significant predictors of utility in people with primary progressive MS. However, researchers would need to request the programs to convert RAND-36 to SF-6D to The University of Sheffield, as a license agreement is required for each study that will use the SF-6D algorithm [60].

5 Conclusions

This study suggests that the time to complete the T25FW test for ambulatory function significantly contributes additional information on health utility in people with RRMS and SPMS otherwise not captured by the EDSS and the occurrence of recent relapses. These findings support the use of T25FW as an additional measure of disability to supplement the EDSS and the occurrence of relapses in the characterization of the course of disease progression and accrual of QALYs in future economic evaluations of DMTs for the treatment of RRMS.