Introduction
Myelofibrosis (MF) is a debilitating myeloproliferative neoplasm originating in hematopoietic stem cells [
1]. The resulting clinical manifestations of MF are heterogeneous, with most patients presenting with symptoms associated with anemia and splenomegaly [
2‐
4]. MF is rare, with an incidence of 0.1 to 1 per 100,000 individuals per year but has a higher prevalence of 6 per 100,000 person-years due to its chronic nature and disabling course [
1,
3]. The median age at diagnosis is 67 years [
3], and the median survival for all patients with MF is approximately 6 years [
5,
6].
Stem cell transplant is potentially curative but associated with high rates of morbidity and mortality [
1,
7]. For the majority, alleviating symptoms, reducing clinical complications, and slowing progression are key treatment goals. Recently, the US Food and Drug Administration (FDA) approved momelotinib for the treatment of adult patients with intermediate- or high-risk MF, including primary or secondary MF, and anemia [
8]. Momelotinib demonstrated clinically meaningful benefits in MF-associated symptoms, anemia measures, and splenomegaly vs danazol in patients with Janus kinase (JAK) inhibitor–experienced MF who were anemic (hemoglobin < 10 g/dL) and symptomatic (Total Symptom Score [TSS] ≥ 10 at screening) in the phase 3 MOMENTUM trial (NCT04173494) [
9]. The primary endpoint of MOMENTUM was the Myelofibrosis Symptom Assessment Form (MFSAF) TSS response rate at week 24 (defined as ≥ 50% reduction in mean MFSAF TSS over the 28 days immediately before the end of week 24 compared with baseline) [
9].
Considering the symptom burden of patients with MF, evaluation of symptoms via patient-reported outcome (PRO) measures is critical [
10]. The PRO Consortium’s MF Working Group was established to review existing questionnaires and develop a consensus-based PRO questionnaire for future MF trials. The resulting MFSAF version 4.0 (v4.0) comprises 7 symptom items (fatigue, night sweats, pruritus, abdominal discomfort, pain under the left ribs, early satiety, bone pain) and produces a TSS, calculated as the sum of the 7 individual item responses, to assess MF symptom severity [
11].
Investigation of the MFSAF TSS v4.0 psychometric properties has been limited. This analysis aimed to provide preliminary evaluation of these properties and validate MFSAF TSS v4.0 use as a trial endpoint in the MOMENTUM study. Intent-to-treat (ITT) data from baseline and weeks 4, 8, 12, 16, 20, and 24 were used. The following were examined: instrument completion rates; descriptive analyses; confirmatory factor analysis (CFA) to assess the unidimensionality of the instrument; item-to-item and item-to-total correlations to evaluate the hypothesized relationships within the MFSAF’s scale; internal consistency reliability to assess the degree to which responses are consistent across the items of the multi-item scale score; test-retest reliability to evaluate score reproducibility; construct validity to establish that the MFSAF TSS measures the construct of interest; and sensitivity to change to demonstrate that health-related changes in participants’ status are reflected in changes in the MFSAF TSS.
Results
PRO completion
At baseline, 195 patients were enrolled in the study. Baseline characteristics are provided in Supplementary Table 1. Completion was high at baseline (100%) and at key analysis time points (weeks 12 and 24). Overall completion rate ranged from 92.2% at week 4 to 100% at baseline (Supplementary Table 2).
Descriptive analyses
Floor effects were observed during the baseline period for item 2 (night sweats), item 3 (itching), item 5 (pain under ribs [left]), and item 7 (bone pain [not joint or arthritis]), as the lowest response option “0” (no symptoms) was endorsed by 21.0%, 27.2%, 19.5%, and 21.0% of patients, respectively. Floor effects were reinforced at later time points, as the percentages for the 4 aforementioned items were increased (28.4-35.8% [week 12], 26.4-35.7% [week 24]). At weeks 12 and 24, item 4 (abdominal discomfort) and item 6 (early satiety) also showed floor effects, with the percentage of patients endorsing response option “0” being 12.3% and 10.5% at week 12, respectively, and 17.1% and 15.5% at week 24, respectively (Supplementary Table 3).
Table
1 presents the summary statistics of MFSAF items and TSS at baseline and weeks 12 and 24. Item 3 (itching) (mean = 2.9), item 5 (pain under ribs [left]) (mean = 3.1), and item 7 (bone pain [not joint or arthritis]) (mean = 3.1) had the lowest means (less symptom severity). Items with the highest means (highest symptom severity) at baseline were item 1 (fatigue) (mean = 6.1) and item 6 (early satiety) (mean = 4.5). Overall, mean scores decreased, indicating symptom improvement (range: 3.1–6.1 at baseline, 1.8–4.5 at week 12, 1.9–4.4 at week 24). The MFSAF TSS mean score was 27.2 at baseline and decreased to 19.1 at week 12 and 18.7 at week 24. Skewness at baseline was > 0.5 for item 3 (itching), item 5 (pain under ribs [left]), and item 7 (bone pain [not joint or arthritis]), indicating moderately skewed data. At the subsequent time points, moderate skewness was present across all items except item 1 (fatigue). Supplementary Fig. 1 depicts item response distribution over time.
Table 1
Summary statistics for MFSAF items and TSS at BL and weeks 12 and 24 – ITT population
Time point: BL (N = 195)a |
1: Worst fatigue | 6.1 (2.13) | 6.2 | 4.6/7.8 | 0.9/10.0 | −0.27 |
2: Worst night sweats | 3.3 (2.72) | 2.8 | 0.9/5.4 | 0.0/10.0 | 0.44 |
3: Worst itching | 2.9 (2.71) | 2.3 | 0.4/5.0 | 0.0/10.0 | 0.73 |
4: Worst abdominal discomfort | 4.3 (2.59) | 4.3 | 2.3/6.3 | 0.0/10.0 | 0.05 |
5: Worst pain under ribs (left) | 3.1 (2.62) | 2.7 | 0.8/4.9 | 0.0/9.9 | 0.60 |
6: Worst early satiety | 4.5 (2.45) | 4.6 | 2.4/6.3 | 0.0/9.7 | 0.13 |
7: Worst bone pain (not joint or arthritis) | 3.1 (2.56) | 2.8 | 1.0/4.9 | 0.0/9.3 | 0.53 |
MFSAF TSS | 27.2 (13.51) | 25.4 | 16.3/36.9 | 4.9/67.7 | 0.49 |
Time point: week 12 (n = 162)b |
1: Worst fatigue | 4.5 (2.32) | 4.4 | 2.9/6.1 | 0.0/10.0 | 0.23 |
2: Worst night sweats | 2.0 (2.06) | 1.5 | 0.1/3.3 | 0.0/9.1 | 1.03 |
3: Worst itching | 1.8 (1.77) | 1.3 | 0.0/2.9 | 0.0/8.3 | 0.95 |
4: Worst abdominal discomfort | 3.0 (2.18) | 2.8 | 1.2/4.6 | 0.0/10.0 | 0.60 |
5: Worst pain under ribs (left) | 2.0 (2.14) | 1.4 | 0.1/3.1 | 0.0/8.4 | 1.05 |
6: Worst early satiety | 3.2 (2.25) | 2.8 | 1.3/4.7 | 0.0/10.0 | 0.65 |
7: Worst bone pain (not joint or arthritis) | 2.6 (2.46) | 2.1 | 0.0/4.1 | 0.0/10.0 | 0.77 |
MFSAF TSS | 19.1 (11.81) | 16.3 | 10.9/26.5 | 0.0/56.5 | 0.84 |
Time point: week 24 (n = 129)b |
1: Worst fatigue | 4.4 (2.36) | 4.0 | 3.0/6.1 | 0.0/10.0 | 0.33 |
2: Worst night sweats | 2.1 (2.28) | 1.3 | 0.0/3.2 | 0.0/9.2 | 1.16 |
3: Worst itching | 1.7 (1.89) | 1.1 | 0.0/2.6 | 0.0/9.3 | 1.45 |
4: Worst abdominal discomfort | 2.8 (2.16) | 2.6 | 1.1/4.3 | 0.0/10.0 | 0.71 |
5: Worst pain under ribs (left) | 1.9 (2.10) | 1.2 | 0.1/3.0 | 0.0/7.6 | 1.03 |
6: Worst early satiety | 2.9 (2.21) | 2.4 | 1.3/4.7 | 0.0/10.0 | 0.71 |
7: Worst bone pain (not joint or arthritis) | 2.8 (2.55) | 2.4 | 0.3/4.4 | 0.0/10.0 | 0.77 |
MFSAF TSS | 18.7 (12.40) | 16.2 | 9.2/24.7 | 0.1/57.9 | 1.06 |
Structural validity (CFA, item-to-item and item-to-total correlation)
The results of CFA revealed that the comparative fit index (CFI) and the standardized root mean square residual (SRMR) values were acceptable for the assumed unidimensional model (i.e., SRMR value equal to 0.05, CFI value close to 0.95, and root mean square error of approximation value equal to 0.11) (Supplementary Table 4 and Supplementary Fig. 2).
Spearman correlation coefficients were mostly moderate to strong at baseline, ranging from 0.384 to 0.772. Exceptions include item 1 (fatigue) with item 3 (itching) (
r = .289), and item 3 (itching) with item 6 (early satiety) (
r = .298). Week 24 correlations were higher and all but one exceeded the 0.40 threshold (range: 0.391–0.829) (Table
2).
Table 2
Item-to-item and item-to-total correlations of MFSAF items at BL and week 24 – ITT populationa
Time point: BL (N = 195)c |
1: Worst fatigue | 1.000 | – | – | – | – | – | – | 0.574 |
2: Worst night sweats | 0.399 | 1.000 | – | – | – | – | – | 0.618 |
3: Worst itching | 0.289 | 0.436 | 1.000 | – | – | – | – | 0.473 |
4: Worst abdominal discomfort | 0.509 | 0.526 | 0.384 | 1.000 | – | – | – | 0.779 |
5: Worst pain under ribs (left) | 0.405 | 0.534 | 0.398 | 0.752 | 1.000 | – | – | 0.734 |
6: Worst early satiety | 0.604 | 0.460 | 0.298 | 0.772 | 0.600 | 1.000 | – | 0.716 |
7: Worst bone pain (not joint or arthritis) | 0.415 | 0.511 | 0.444 | 0.508 | 0.565 | 0.478 | 1.000 | 0.625 |
Time point: week 24 (n = 129)d |
1: Worst fatigue | 1.000 | – | – | – | – | – | – | 0.618 |
2: Worst night sweats | 0.439 | 1.000 | – | – | – | – | – | 0.612 |
3: Worst itching | 0.424 | 0.473 | 1.000 | – | – | – | – | 0.579 |
4: Worst abdominal discomfort | 0.538 | 0.542 | 0.428 | 1.000 | – | – | – | 0.763 |
5: Worst pain under ribs (left) | 0.418 | 0.501 | 0.490 | 0.706 | 1.000 | – | – | 0.706 |
6: Worst early satiety | 0.517 | 0.435 | 0.479 | 0.829 | 0.625 | 1.000 | – | 0.716 |
7: Worst bone pain (not joint or arthritis) | 0.410 | 0.496 | 0.468 | 0.430 | 0.564 | 0.391 | 1.000 | 0.570 |
Correlations between each item and the remainder of the total score were moderate to high (Table
2). All item-to-total correlations exceeded the 0.50 threshold except item 3 (itching) at baseline. The item-to-total correlations ranged from 0.473 to 0.779 at baseline and 0.570 to 0.763 at week 24.
Test-retest reliability
Moderate test-retest reliability for the MFSAF TSS score (ICC = 0.645) was observed in all patients across screening and baseline (
N = 195). When stable patients across baseline and week 4 were defined with PGIS (
N = 82 for PGIS-F and
N = 76 for PGIS-S), good reliability (ICC = 0.845 for PGIS-F and ICC = 0.829 for PGIS-S) was observed. Excellent test-retest reliability was demonstrated in the PGIS-defined stable condition between weeks 4 and 8 (ICC = 0.911 for PGIS-F [
N = 89] and ICC = 0.915 for PGIC-S [
N = 83]) (Table
3).
Table 3
Test-retest reliability of MFSAF TSS – ITT population
Screening to BL | | 195 | 0.645 | 0.555–0.720 |
BL to week 4 | PGIS-S | 76 | 0.829 | 0.638–0.909 |
PGIS-F | 82 | 0.845 | 0.685–0.915 |
Weeks 4 to 8 | PGIS-S | 83 | 0.915 | 0.846–0.951 |
PGIS-F | 89 | 0.911 | 0.821–0.951 |
Internal consistency reliability
Internal consistency reliability was calculated at baseline (α = 0.877) and week 24 (α = 0.903) (Table
4). “Cronbach’s α if item deleted” did not lead to an increase for most items. The only exception was MFSAF item 3 (itching) at baseline, for which the increase was minimal. McDonald’s omega (ω) coefficient was equal to 0.875 at baseline and 0.899 at week 24, further supporting the reliability of the instrument.
Table 4
Internal consistency of MFSAF TSS – ITT population
Time point: BL (N = 195)a |
Cronbach’s α | Total | 0.877 (0.849–0.902) |
Cronbach’s α if item deleted | 1: Worst fatigue | 0.867 (0.836–0.893) |
| 2: Worst night sweats | 0.860 (0.828–0.888) |
| 3: Worst itching | 0.883 (0.856–0.906) |
| 4: Worst abdominal discomfort | 0.843 (0.807–0.875) |
| 5: Worst pain under ribs (left) | 0.850 (0.816–0.880) |
| 6: Worst early satiety | 0.850 (0.816–0.880) |
| 7: Worst bone pain (not joint or arthritis) | 0.859 (0.826–0.887) |
Time point: week 24 (n = 129)b |
Cronbach’s α | Total | 0.903 (0.875–0.926) |
Cronbach’s α if item deleted | 1: Worst fatigue | 0.898 (0.868–0.922) |
| 2: Worst night sweats | 0.888 (0.856–0.915) |
| 3: Worst itching | 0.893 (0.863–0.919) |
| 4: Worst abdominal discomfort | 0.878 (0.843–0.908) |
| 5: Worst pain under ribs (left) | 0.882 (0.848–0.910) |
| 6: Worst early satiety | 0.882 (0.849–0.911) |
| 7: Worst bone pain (not joint or arthritis) | 0.897 (0.868–0.922) |
Construct validity: convergent and divergent
Table
5 presents the correlations between the MFSAF TSS and the hypothetically related EORTC QLQ-C30 scale scores, PROMIS Physical Function Short Form 10b Total Score, PROMIS Physical Function 4 additional item scores, and EQ-5D-5L item scores at baseline and week 24. EORTC QLQ-C30 Pain and EQ-5D-5L Pain/Discomfort achieved the predefined threshold (correlation higher than 0.5) and demonstrated the highest associations with MFSAF TSS. At baseline, the correlations between the EORTC QLQ-C30 Cognitive and Emotional Functioning scales and MFSAF TSS were lower than 0.5, as per our initial hypothesis.
Table 5
Construct validity: convergent and divergent validity – ITT population
EORTC QLQ-C30 Global health status/QOL | Pearson | −0.365 (194) | −0.393 (120) |
EORTC QLQ-C30 Physical Functioning | Pearson | −0.407 (194) | −0.435 (120) |
EORTC QLQ-C30 Role Functioning | Pearson | −0.357 (194) | −0.345 (120) |
EORTC QLQ-C30 Emotional Functioning | Pearson | −0.354 (194) | −0.514 (120) |
EORTC QLQ-C30 Cognitive Functioning | Pearson | −0.244 (194) | −0.324 (120) |
EORTC QLQ-C30 Fatigue | Pearson | 0.366 (194) | 0.437 (120) |
EORTC QLQ-C30 Pain | Pearson | 0.597 (194) | 0.525 (120) |
EORTC QLQ-C30 Insomnia | Pearson | 0.332 (194) | 0.415 (119) |
PROMIS Physical Function Item: “Are you able to climb several flights of stairs?” | Polyserial | −0.338 (193) | −0.252 (120) |
PROMIS Physical Function Item: “Does your health now limit you in lifting or carrying groceries?” | Polyserial | −0.371 (193) | −0.382 (120) |
PROMIS Physical Function Item: “Does your health now limit you in going for a short walk (less than 15 minutes)?” | Polyserial | −0.373 (193) | −0.298 (120) |
PROMIS Physical Function Item: “How much difficulty do you have doing your daily physical activities, because of your health?” | Polyserial | −0.265 (193) | −0.400 (120) |
PROMIS Physical Function Short Form 10b Total Score | Pearson | −0.350 (193) | −0.380 (120) |
EQ-5D-5L Usual Activities | Polyserial | 0.429 (193) | 0.390 (120) |
EQ-5D-5L Pain/Discomfort | Polyserial | 0.579 (193) | 0.520 (120) |
Construct validity: known-groups
Results from the ANOVA comparing mean MFSAF TSS between consecutive groups defined by PGIS and ECOG PS are presented in Table
6; similar analyses using DIPSS and DIPSS-plus are presented in Supplementary Table 5.
Table 6
Construct validity: known-group validity – ITT population
Time point: BLc |
PGIS-S categories | None or mild | 35 | 18.42 (1.97) | 14.53–22.30 | < 0.001 | – |
Moderate | 112 | 25.31 (1.10) | 23.14–27.48 | – | 0.62 |
Severe | 47 | 38.61 (1.70) | 35.26–41.96 | – | 1.13 |
PGIS-F categories | None or mild | 26 | 18.58 (2.45) | 13.74–23.42 | < 0.001 | – |
Moderate | 99 | 25.10 (1.26) | 22.62–27.58 | – | 0.57 |
Severe | 69 | 33.70 (1.51) | 30.73–36.67 | – | 0.68 |
ECOG PS | 0 (fully active) | 31 | 21.91 (2.33) | 17.31–26.51 | < 0.001 | – |
1 (restricted in strenuous activity) | 117 | 25.99 (1.20) | 23.62–28.35 | – | 0.33 |
2 (ambulatory and capable of self-care) | 47 | 33.73 (1.89) | 30.00-37.47 | – | 0.59 |
3 (capable of limited self-care) | 0 | – | – | – | – |
4 (completely disabled) | 0 | – | – | – | – |
5 (dead) | 0 | – | – | – | – |
Time point: week 24d |
PGIS-S categories | None or mild | 51 | 12.00 (1.59) | 8.85–15.16 | < 0.001 | – |
Moderate or severe | 70 | 23.48 (1.36) | 20.78–26.17 | – | 1.01 |
PGIS-F categories | None or mild | 36 | 10.83 (1.90) | 7.07–14.59 | < 0.001 | – |
Moderate | 64 | 20.10 (1.42) | 17.28–22.92 | – | 0.91 |
Severe | 21 | 27.59 (2.48) | 22.67–32.51 | – | 0.60 |
ECOG PS | 0 (fully active) | 34 | 15.38 (2.14) | 11.14–19.61 | 0.164 | – |
1 (restricted in strenuous activity) | 68 | 19.16 (1.51) | 16.17–22.15 | – | 0.31 |
2 (ambulatory and capable of self-care) | 22 | 22.55 (2.66) | 17.29–27.81 | – | 0.26 |
3 (capable of limited self-care) | 0 | – | – | – | – |
4 (completely disabled) | 0 | – | – | – | – |
5 (dead) | 0 | – | – | – | – |
In accordance with expectations, lower (i.e., better) MFSAF TSS mean scores were observed for patients with better PGIS and ECOG responses. Patterns were more clearly observed among PGIS levels at baseline and week 24 and among ECOG levels at baseline. The MFSAF TSS at baseline based on PGIS-S categorization was 18.42 for the “none or mild” group, 25.31 for the “moderate” group, and 38.61 for the “severe” group. ES was moderate (threshold: 0.5 ≤ d < 0.8) to large (threshold: d ≥ 0.8) for PGIS at baseline (absolute range: 0.57–1.13 for PGIS-F and PGIS-S) and week 24 (absolute range: 0.60–1.01 for PGIS-F and PGIS-S), providing further evidence of known-groups validity.
Statistically significant differences among PGIS categories were observed at both baseline and week 24 (
P < .001 for both PGIS-F and PGIS-S), and among ECOG PS groups at baseline (
P < .001) (Table
6).
Sensitivity to change
Sensitivity to change analysis (Supplementary Table 6) revealed that patients who indicated symptom improvement on the PGIS-S/PGIC-S and fatigue improvement on the PGIS-F/PGIC-F reported higher (in absolute terms) mean change from baseline vs patients who reported “no change” or worsening. The mean change from baseline to week 12 in MFSAF TSS for the 3 PGIS-S collapsed categories was − 13.85 (improved), − 5.49 (no change), and − 1.66 (worsening), and from baseline to week 24 was − 16.06 (improved), − 5.30 (no change), and − 1.68 (worsening). PGIC anchors showed similar trends to those observed with PGIS anchors. The mean changes from baseline to week 12 in the MFSAF TSS for the 3 PGIC-F collapsed categories were − 10.32 (improved), − 6.82 (no change), and − 5.35 (worsening).
Within group ES was moderate (threshold: 0.5 ≤ d < 0.8) to large (threshold: d ≥ 0.8) in absolute value for “improved” groups (collapsed categories) across all anchors, indicating a greater change in the MFSAF TSS between week 12 and baseline (absolute range: 0.69–1.08) and an even greater change between week 24 and baseline (absolute range: 0.79–1.16) (Supplementary Table 6). This was also observed in PGIC uncollapsed categories, as the “much improved” group showed large ES changes vs other groups (absolute within ES = 1.12 for PGIC-S and within ES = 1.14 for PGIC-F at week 12). Between-group ES also showed a great differentiation of the “improved” group relative to the “no change” group, with PGIS showing higher ES vs PGIC anchors.
Discussion
Our analyses summarize the psychometric properties of the MFSAF v4.0 and provide preliminary evidence of its validity and appropriateness for use in MF clinical trials. The MFSAF TSS v4.0 is a diary-based measure that assesses the most important MF symptoms that affect a patient’s QOL. Data from qualitative interviews showed that its 11-point rating scale is relevant and understandable to patients [
27], and this format has been recommended in other conditions where capturing patients’ perspectives is crucial [
28]. The MFSAF v4.0 may be used as a daily diary or a weekly assessment; although our study used the former for capturing MF symptoms, future work may benefit from selection of a recall period appropriate for the goals of the study [
11].
A total of 195 patients were enrolled in this trial, which is close to the recommended sample size to establish PRO validity [
29]. The high compliance rate achieved with the electronic diary was expected as opposed to paper diaries, which have shown lower compliance rates [
30]. Summary statistics revealed moderate skewness at baseline and later time points, which can be attributed to high endorsement of low severity options for item 2 (night sweats), item 3 (itching), item 5 (pain under ribs [left]), and item 7 (bone pain [not joint or arthritis]). Consistent with our study, low severity for some of these items (i.e., night sweats, itching) has been demonstrated in previous studies of the MFSAF [
25,
31]. Item-to-item and item-to-total analysis revealed moderate to strong correlations at baseline and week 24, supporting the multi-item TSS scale. In addition, CFA analysis performed at baseline also confirmed the hypothesized unidimensionality of this tool. Future studies may consider using data from a single day when performing these investigations, as averaging across time has the potential to inflate the correlation strength. In addition, research may examine unidimensionality via a multilevel CFA in which the different levels of variance (within-person and between-person) are examined separately.
Consistent with the MFSAF v2.0, which has previously demonstrated reliability [
25], test-retest and internal consistency reliability of the MFSAF v4.0 were established in the current study. Similarly, convergent and divergent validity seem to be well supported, as some associations, though not all, met the hypotheses. It should be noted that the convergent validity cutoffs utilized in the present analysis are arbitrary and quite high, as other PRO-related research has utilized a lower threshold (e.g., 0.40) [
32]. Adequate known-groups validity was also demonstrated, as the MFSAF TSS could distinguish among PGIS groups at baseline and week 24 and among ECOG groups at baseline. As expected, lower (i.e., better) MFSAF TSS was observed for patients with better PGIS and ECOG PS with mostly moderate to high ESs as per the a-priori hypotheses. DIPSS and DIPSS-plus did not provide evidence of known-groups validity for MFSAF, which may be because there were no patients in the low-risk group based on DIPSS and low-risk or intermediate-1–risk groups (1–2 points) based on DIPSS-plus, so the differentiation of the remaining groups was not distinct. With regards to sensitivity to change, a comparison of mean change scores across anchors (PGIS and PGIC) indicated that the MFSAF TSS was sensitive to change at weeks 12 and 24, particularly in the direction of improvement. A potential limitation of this approach is that PGIC has a lengthy recall period and as a result is prone to bias, which could potentially affect patients’ perceptions of changes in symptoms [
33]. Nevertheless, the monotonic patterns across the change groups are consistent with the previously studied MFSAF v2.0 [
25], and results support the sensitivity to change property of the MFSAF TSS.
Literature suggests providing a positive rating for construct validity if 75% of the prespecified hypotheses are met [
34]. However, the present analysis evaluated the psychometric proprieties of the MFSAF TSS within a sample that had not been collected for this purpose. Thus, even though all hypotheses were prespecified based on previous literature, determination of construct validity was made by inspecting all results collectively and not by inspecting how many hypotheses were met. Similarly, expected differences among the known groups and the various anchor groups used in the sensitivity to change analyses were not explicitly defined. However, ES results were provided, aiming to contextualize whether the identified differences were of low, moderate, or high magnitude.
As with all psychometric studies undertaken using clinical trial data, it must be acknowledged that MOMENTUM was not designed to assess the psychometric properties of the MFSAF v4.0. However, sufficient data and information were available to derive preliminary and exploratory evidence of the measurement properties of this instrument. Although the analyses conducted in this study are therefore exploratory in nature, our findings provide supportive evidence of structural validity, reliability (test-retest and internal consistency), construct validity, and sensitivity to change. Evaluation of a clinically meaningful change threshold of the MFSAF TSS is an area for future work.
Acknowledgments
We thank all participating patients and their families and all study site staff. Medical writing support was provided by Prasanthi Mandalay, PhD, of Nucleus Global, an Inizio company, and funded by GSK.
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.