Introduction

Background

Sarcopenia, defined as “a syndrome characterised by progressive and generalised loss of skeletal muscle mass and strength and with a risk of adverse outcomes such as physical disability, poor quality of life and death” by the European Working Group on Sarcopenia in Older People (EWGSOP), is a growing public health problem [1]. It has recently been recognized as a geriatric condition with an ICD-10-CM code (M62.84) [2]. Sarcopenia has been shown to be associated with negative health outcomes, such as a higher rate of mortality and functional decline, a higher rate of falls, and a higher incidence of hospitalization [3]. Other research has shown an association between sarcopenia and depression [4]. Not much is yet known about the relationship between sarcopenia and quality of life. Although several studies have incorporated quality of life outcomes in their designs, the results are difficult to compare because of the different diagnostic criteria used to establish sarcopenia. Some studies that diagnosed sarcopenia with the EWGSOP criteria have found lower health-related quality of life (HRQoL) scores for sarcopenic subjects in select domains of the Short-Form 36-item (SF-36) questionnaire, but other studies (using other diagnostic criteria) have found no difference in SF-36 scores between sarcopenic and non-sarcopenic subjects [5].

Until recently, researchers only had generic questionnaires, such as the SF-36, available to assess quality of life in sarcopenic patients. These questionnaires are designed for use in broad populations and may thus not be sensitive enough to accurately measure quality of life in sarcopenic populations [6]. To address this problem, Beaudart et al. developed the Sarcopenia Quality of Life (SarQoL®) questionnaire in 2015 [7].

Until now, no study has evaluated the responsiveness, defined as “the ability of an instrument to detect change over time in the construct to be measured”, of the SarQoL® questionnaire [8]. When an instrument is used for evaluative purposes, i.e. when the aim is to detect and measure longitudinal change in subjects or populations, responsiveness is a key psychometric property [9, 10]. This situation is often present in clinical studies aimed at testing the effect of an intervention, where an accurate assessment of HRQoL before and after the intervention is an important outcome. Researchers need to have valid data on the responsiveness of the instrument they wish to use to be certain of the results they obtain.

The psychometric properties of the SarQoL® questionnaire have been evaluated in several cross-sectional studies, but until now, its ability to detect change over time (responsiveness) had not yet been examined [11,12,13,14]. This study aimed to evaluate the responsiveness of the SarQoL® questionnaire in a sample of older, community-dwelling, sarcopenic subjects from the SarcoPhAge (Sarcopenia and Physical impairment with advancing Age) cohort.

Methods

Design

The current article describes an instrument validation study that examined data collected at the 2nd and 4th annual visit of the SarcoPhAge study, an ongoing 5-year prospective, longitudinal, observational cohort study being carried out in Liège (Belgium) [15, 16]. Participants in the SarcoPhAge study all provided written informed consent. The research protocol and its amendments were approved by the Ethics Committee of the University Teaching Hospital of Liège (no. 2012-277).

Participants

Participants from the SarcoPhAge study with valid data from the 2nd (T1) and 4th (T3) study visit (a 2-year interval) who were diagnosed as sarcopenic according to the EWGSOP criteria were included [1]. This 2-year interval was chosen because it covers the first and last available administrations of the questionnaire and because the SarcoPhAge study is an observational study; therefore, we relied on the natural progression of sarcopenia to cause a change in health status between the two measurements. The details of this study have been reported previously [11, 15, 17, 18].

Sarcopenia was diagnosed according to the EWGSOP algorithm, which demands the presence of low muscle mass in combination with low muscle strength and/or low physical performance [1]. Muscle mass was measured by dual-energy X-ray absorptiometry (DXA) (Hologic Discovery A, USA), which was calibrated daily by scanning a spine phantom. Male subjects with a skeletal muscle mass index (SMI = appendicular lean mass/height2) below 7.26 kg/m2 and women with an SMI below 5.5 kg/m2 were considered to have low muscle mass. Muscle strength was measured with a hydraulic hand dynamometer (Saehan Corporation, Korea), calibrated at the beginning of the study for 10, 40 and 90 kg. Men with a maximal handgrip strength below 30 kg and women below 20 kg were considered to have low muscle strength. Physical performance was examined with the help of the Short Physical Performance Battery (SPPB), with a value of 8 or less being considered low [15].

Participants were included in the current analysis when diagnosed as sarcopenic at T1 and/or T3 and when both SarQoL® questionnaires (T1 and T3) had less than 20% missing data for the calculation of the Overall score.

Measures

The SarQoL® Questionnaire

The SarQoL® questionnaire is a patient-reported outcome measure (PROM) specific to sarcopenia. The SarQoL® questionnaire consists of 22 questions incorporating 55 items, which fall into seven domains of HRQoL. These domains are “Physical and Mental Health”, “Locomotion”, “Body Composition”, “Functionality”, “Activities of Daily Living”, “Leisure activities” and “Fears”. Each domain is scored from 0 to 100, and an Overall score is calculated. The questionnaire is auto-administered and takes 10 min to complete [7]. The questionnaire is available in 16 languages and can be found on its webpage [19].

Several psychometric properties of the SarQoL® questionnaire have been examined previously. The questionnaire has demonstrated its ability to distinguish between sarcopenic and non-sarcopenic subjects (discriminative power). It has good internal consistency and construct validity, and its test–retest reliability is excellent. Furthermore, it has been demonstrated that there are no floor or ceiling effects for the Overall score [11,12,13,14].

The Short-Form 36-Item (SF-36) Questionnaire

The SF-36 is a multi-item generic health survey that uses 36 questions to measure functional health and wellbeing from the patient’s perspective. It measures eight domains: “Physical Functioning”, “Role limitation due to physical problems”, “Bodily Pain”, “General Health Perceptions”, “Vitality”, “Social Functioning”, “Role limitations due to emotional problems” and “Mental Health”, each of which provides a score between 0 and 100. Additionally, two composite scores can be calculated: the Physical Component Summary (PCS) and the Mental Component Summary (MCS) [20,21,22].

The EuroQol 5-Dimension 3-Level (EQ-5D-3L)

The EQ-5D-3L is a standardized measure of health status developed by the EuroQol Group in 1990. The instrument consists of two pages: the EQ-5D descriptive system, which is composed of five questions encompassing five dimensions of health (mobility, self-care, usual activities, pain/discomfort and anxiety/depression); and the Visual Analogue Scale (EQ-VAS), which records the respondent’s self-rated health on a vertical scale going from best (100) to worst imaginable health (0). The EQ-5D descriptive system is used to calculate an index score, which represents the utility value for current health [23, 24].

Physical Parameters

Parameters related to muscle mass, muscle strength and physical performance were collected. Apart from the SMI, we also determined appendicular lean mass (ALM) and ALM divided by body mass index (ALM/BMI) by DXA. As mentioned previously, muscle strength was determined with a hydraulic hand dynamometer. For physical performance, the patients performed the SPPB test, which also includes the usual gait speed on a 4-m track. The subjects also performed the timed-up-and-go (TUG) test, which uses the time that a subject takes to rise from a chair, walk three metres, turn around, walk back to the chair, and sit down to determine a subject’s mobility. Lastly, the chair stand test (CST) was administered as part of the SPPB. In this test, the subjects are asked to stand up from a chair and sit back down five times as fast as they can.

Methodological Approach

Hypotheses Testing

It is recommended to treat responsiveness as the longitudinal form of construct validity and to evaluate it in much the same way as the construct validity of a questionnaire [25]. Thus, we formulated hypotheses between the changes in the scores of the SarQoL® questionnaire and the changes observed for the SF-36 and the EQ-5D. AG, CB and OB were responsible for the formulation of the hypotheses, on the basis of similarity in the construct of the different domains, and previously found results for the construct validity of the questionnaire. The data used in this analysis were collected before the formulation of the hypotheses, but no statistical manipulations in relation to the evaluation of responsiveness were carried out before the final set of hypotheses was agreed upon.

The hypotheses used for the evaluation of the responsiveness, the expected strength of the correlations and the rationale for their formulation are detailed in Table 1.

Table 1 Hypotheses for the evaluation of responsiveness

We employed the criteria formulated by De Boer et al. to evaluate the results of the hypotheses testing. These state that a questionnaire has high responsiveness when less than 25% of hypotheses are refuted, moderate responsiveness when 25–50% are refuted and poor responsiveness when more than 50% are refuted [26].

Standardized Response Means (SRMs)

We also calculated SRMs for the different questionnaires, by dividing the mean difference between T1 and T3 by the standard deviation of the differences between the paired measurements [27]. The SRM reflects the magnitude of the change measured by the different questionnaires. Consequently, when greater SRMs are obtained, this is an indication of better responsiveness. To allow the use of the thresholds for responsiveness formulated by Cohen et al., which are designed for use with the effect size and which categorize an observed change, we applied the correction developed by Middel and Van Sonderen [28, 29]. After correcting the SRMs with the formula [(SRM/√2)/√(1 − r); with r = correlation between baseline and follow-up score], we categorized them as trivial when SRM < 0.20, small when 0.20 ≤ SRM < 0.49, moderate when 0.50 ≤ SRM < 0.79 and large when SRM ≥ 0.80 [29].

A selection of SRMs were compared in pairs to evaluate whether they were significantly different. This was carried out using the modified jack-knife method, which uses linear regression to determine whether a significant difference exists between two SRMs [30]. For this measure, an individual SRM is first calculated for each subject by dividing their change score by the standard deviation of the change scores in the whole sample. Next, a “centred” SRM is calculated for each subject by subtracting the mean SRM score of the sample from the individual SRMs. With these variables, a linear regression is carried out with the individual SRMs of the two quality-of-life scores of interest as dependent variables and the “centred” SRM of one of the quality-of-life scores (either one will work) as the independent variable. A significant difference is demonstrated when the p value of the intercept is at most 0.05 [30, 31].

Correlations Between Physical Parameters and QoL

We investigated the relationship between the evolution of physical parameters linked to sarcopenia and the changes observed by the different questionnaires with the help of correlations. We selected the five summary/total scores available (SarQoL® Overall score, SF-36 PCS and MCS, EQ-5D Utility Index and EQ-VAS) to represent the HRQoL of the subjects and constructed correlations with usual gait speed, handgrip strength, SPPB score, ALM, ALM/BMI, SMI, TUG test and the chair stand test. The strength of the association was judged as excellent when larger than 0.81, very good when between 0.61 and 0.80, good when between 0.41 and 0.60, acceptable when between 0.21 and 0.40 and insufficient when less than 0.20 [32].

Statistical Analysis

Data were analysed using IBM SPSS Statistics, version 24.0.0.0 for Windows (Armonk, NY: IBM Corp).

The distribution of variables was determined by examining the histogram, the quantile–quantile plot, the Shapiro–Wilk test and the difference between mean and median. Gaussian variables are reported as the mean ± standard deviation and non-Gaussian variables as median (P25–P75). Nominal variables are reported as absolute (n) and relative frequencies (%). The presence of significant differences between T1 and T3 was examined with the paired samples t test for variables with normal distribution, the Wilcoxon matched-pair signed-rank test for non-Gaussian variables and the chi-squared test for nominal variables. Pearson correlations were calculated when both groups/variables had normal distributions. Spearman correlations were calculated when this was not the case.

Change scores were calculated by subtracting the scores from T1 from those obtained at T3. For quality of life, this means that a positive change score indicates an improvement and a negative change score a decline. The calculation of the SRMs, their correction with the technique from Middel and Van Sonderen and the modified jack-knife method used to detect significant differences between SRMs have been described in the preceding paragraphs.

A post hoc power analysis was conducted on the Pearson and Spearman correlations used in the primary outcome with the G*Power software, version 3.1.9.2 [33]. This analysis computes the achieved power for a bivariate normal model with an α-error of 0.05 and a sample size of 42 subjects.

Results were considered significant at p ≤ 0.05.

Results

In total, 42 sarcopenic participants from the SarcoPhAge study fulfilled the inclusion criteria, which is a moderate sample size according to the COSMIN checklist [34]. The subjects had a median age of 73 (69–79) years at T1, and 25 out of 42 (59.5%) were women. The median number of drugs taken by the participants increased significantly (p = 0.001) from 6 (5–9) at T1 to 8 (6–10) at T3, as did the proportion of subjects who fell in the year before the study visits, from 8 (19.0%) at T1 to 16 (38.1%) at T3 (p = 0.017). The gait speed of the participants diminished significantly from a median of 1.02 (0.80–1.21) m/s at T1 to 0.89 (0.76–1.09) m/s at T3 (p = 0.032). In the sample as a whole, a slight but significant reduction in handgrip strength was observed, from a median of 19.75 (18.00–28.00) kg at T1 to 19.00 (16.75–22.50) kg at T3 (p = 0.010). This change was attributable to the female subjects (p = 0.030). No significant changes between T1 and T3 were found for BMI (p = 0.393), number of comorbidities (p = 0.763), proportion of subjects who experienced a fracture in the year before the study visits (p = 0.268), independence in activities of daily living as measured by the Katz scale (0.942), SPPB score (p = 0.083), TUG test (p = 0.081), ALM/BMI (p = 0.197) and SMI (p = 0.451). The ALM of the whole sample diminished significantly (p = 0.035), but this effect was lost when the sample was divided into men (p = 0.287) and women (p = 0.072).

The three different questionnaires obtained different results for quality of life. The SarQoL® questionnaire measured a significant reduction for three domains (Body Composition, p = 0.023; Functionality, p = 0.002; Activities of Daily Living, p < 0.001) and the Overall score, which diminished from a median of 61.15 (51.15–71.76) at T1 to 54.56 (42.31–68.44) at T3 (p = 0.002). The SF-36 PCS and MCS, the EQ-5D Utility Index and the EQ-VAS, however, did not detect a significant change (respectively, p = 0.679, p = 0.062, p = 0.231 and p = 0.716). The complete clinical characteristics and the evolution of quality of life can be found in Table 2.

Table 2 Clinical characteristics and quality of life scores for sarcopenic sample (n = 42)

Responsiveness

Of the nine formulated hypotheses, 8 (89%) were confirmed. Hypothesis 9 was rejected when a correlation of r = 0.467 was found, just under the threshold of r > 0.5. In total, three very good correlations were found, five good correlations and two acceptable correlations. The results of this evaluation as well as of the power analysis are reported in Table 3.

Table 3 Evaluation of responsiveness with hypotheses

According to the criteria by De Boer et al., the SarQoL® questionnaire possesses high responsiveness because fewer than 25% of hypotheses are refuted [26].

Standardized Response Means

The magnitude of change observed in the sample was examined by calculating SRMs. The SarQoL® questionnaire had three domains with SRMs below 0.20, indicating that no change was observed, two domains with an SRM between 0.20 and 0.49 (small change) and three domains with an SRM between 0.50 and 0.79 (moderate change). In contrast, only one domain of the SF-36 had a moderate SRM (Physical Functioning; SRM = − 0.50), and six domains reported an SRM indicating small change. A further three domains of the SF-36 had SRMs indicating no change had occurred. For the EQ-5D, small SRMs were observed for two domains, with the remaining five domains having SRMs indicating no change. All obtained SRMs can be found in Table 4.

Table 4 Standardized response means

The SRM of the SarQoL® Overall score was significantly larger than the SF-36 PCS (p = 0.005), the EQ-5D Utility Index (p < 0.001) and the EQ-VAS (p = 0.003). The SRMs of the SarQoL® Overall score and the SF-36 MCS were not significantly different (p = 0.150). The results of this analysis are reported in Table 5.

Table 5 Exploration of significant differences between SRMs

Correlations Between Physical Parameters and QoL

Good correlations were found between change in the SarQoL® Overall score and change in gait speed (r = 0.50), SPPB score (r = 0.47) and the chair stand test (r = − 0.42). Good correlations were also found between change in ALM/BMI and change on the EQ-VAS (r = − 0.48) as well as between change on the timed up-and-go test and change on the SF-36 PCS (r = − 0.44). Acceptable correlations were found between change in gait speed and change on the SF-36 PCS (r = 0.39), between change on the chair stand test and change on the SF-36 PCS (r = − 0.37) and the SF-36 MCS (r = − 0.36). No other correlations were statistically significant. The full analysis can be found in Table 6.

Table 6 Correlations between changes in physical parameters and evolution of quality of life

Discussion

The aim of this study was to evaluate the responsiveness of the SarQoL® questionnaire in a population of older, community-dwelling, sarcopenic subjects by formulating hypotheses on the correlations between change scores, and by calculating the standardized response means. Additionally, we examined the correlations between changes in physical parameters and the evolution of the quality-of-life scores.

The results from the hypotheses reveal that the SarQoL® questionnaire has high responsiveness according to the criteria of De Boer et al., with only one hypothesis out of nine (11%) refuted [26]. The most notable results are the strong correlations found for the Overall score and domain 4 (Functionality) of the SarQoL® questionnaire, and the Physical Functioning domain of the SF-36. These correlations, respectively r = 0.669 and r = 0.680, were larger than the expected correlation of r = 0.5 but make sense in light of the similarity of their content and the relatively important weight of domain 4 in the calculation of the Overall score of the SarQoL® questionnaire.

The SRMs show that the change measured by the Overall score of the SarQoL® questionnaire was significantly larger than that measured by the SF-36 PCS, the EQ-5D utility index and the EQ-VAS, but not the SF-36 MCS. The absence of a significant difference between the SRM of the Overall score and the SF-36 MCS indicates a very large 95% confidence interval of the latter. The SRM obtained for the SarQoL® Overall score is in accordance with the change in physical parameters of the subjects. Participants lost approximately 10% of their original gait speed (from a median of 1.02 m/s to 0.89 m/s), and the female participants lost a median of 2 kg of grip strength in the 2-year interval. It is also interesting to note that the number of falls experienced in the year preceding the administration of the test doubled from 8 (19.0%) to 16 (38.1%). The SarQoL® Overall score more accurately reflects these changes, more so than the SF-36 and the EQ-5D.

The SarQoL® questionnaire measured an SRM indicating moderate change for domain 4 (Functionality) and domain 5 (Activities of Daily Living), highlighting that the effects of diminished muscle strength and physical performance manifest themselves most in all the physical tasks performed on a regular basis. SRMs indicating small change were reported for domain 1 (Physical and Mental Health) and domain 3 (Body Composition). The smaller SRM for domain 1 may result from the way the questions are formulated, with many more abstract concepts (energy, physical capacity, muscle mass, etc.) instead of the very relatable examples from domains 4 and 5 (climbing a flight of stairs, opening a bottle or jar, etc.). Subjects may have more difficulty finding the right answers for them because these changes are much less perceptible in absolute terms. The SRM for domain 3 (Body Composition) covers an area where drastic change is not necessarily expected given that the median age in the sample is 73 years old and that many of the age-related changes to the way one looks have already manifested themselves. Finally, three domains reported SRMs that indicate no change has occurred. Domain 6 (Leisure Activities) and domain 7 (Fears) are represented by, respectively, two and four items in the questionnaire and may be much less sensitive than domains with more items. For domain 2 (Locomotion), this reasoning does not apply. This domain asks pointed questions connected to walking (length, frequency, difficulties, tiredness, etc.), and given that the usual gait speed has significantly diminished, one would expect to see an effect in this domain. However, the questions in this domain may be affected by the phenomenon of response shift, whereby the internal standards of measurement of the subject are recalibrated.

The SF-36 reported moderate change for the domain Physical Functioning, and small change for the domains Social Functioning, Role limitations due to emotional problems, Mental Health, and General Health, and reported no change for the other domains. These results are in line with our hypothesis that the SarQoL® questionnaire, being specific to sarcopenia, should detect a greater change than generic questionnaires such as the SF-36. The EQ-5D reported a small change for the domains Autonomy and Usual Activities and no change for all other scores. This should not be surprising given the distance between the response options for the EQ-5D, which means a significant change needs to occur in real life for it to be registered in the change scores.

Lastly, the correlations between changes in physical parameters and the changes on the different overall/composite scores revealed three good correlations for the SarQoL® Overall score, one good and two acceptable correlations for the SF-36 PCS, one acceptable correlation for the SF-36 MCS, no correlations for the EQ-5D Utility Score and one good correlation for the EQ-VAS. In general, the SarQoL® Overall score correlates well with physical performance, with good correlations for change in gait speed, SPPB and CST. However, these results should be interpreted with caution given the multidimensional nature of sarcopenia, which is unlikely to be covered in a single test.

This study has several strengths. The methodology we adopted supplied us with evidence from different sources and allowed us to show both the quality and quantity of responsiveness. We were able to draw upon the data collected within the SarcoPhAge study, which allowed us to have a moderate sample size (n = 42) despite the relatively low prevalence of sarcopenia. Furthermore, the SarcoPhAge study collected muscle mass data with DXA, which is, in practice, the most reliable method, and collected data on a number of tests for physical performance, which allowed us to compare the changes on several physical parameters [35].

There are, however, several limitations in this study. The SarcoPhAge study was not specifically designed to allow the evaluation of the responsiveness of the SarQoL® questionnaire, lacking both a known intervention and a transition question. A second limitation is that the primary methodology used in this study, the testing of hypotheses, has only been introduced a few years ago and that several questions about this process have not yet found a consensus, such as how many hypotheses should be tested, what percentage should be confirmed for good responsiveness and how to set the strength of the expected correlations. We have tried to address these issues by using pre-defined, specific and challenging hypotheses but recognize that this methodology should be considered an ongoing process and hope that other studies can re-evaluate our hypotheses and add their own. Lastly, the SF-36 PCS and MCS scores were used in the evaluation of the SRMs but not in the hypotheses. We acknowledge that the PCS and MCS would have made good targets for the formulation of hypotheses, but unfortunately, the choice to calculate these scores was made after the hypotheses were formulated and after the statistical manipulations had started. It was therefore impossible for us to include the PCS and MCS scores in the hypotheses. It is our hope that future responsiveness studies will include the PCS and MCS in their hypotheses.

Conclusions

This study contributed data on the last major psychometric property of the SarQoL® questionnaire not yet studied. The questionnaire has good responsiveness, measured both in an evaluation with hypotheses (8/9 confirmed) and by the strength of its standardized response means. The SarQoL® questionnaire appears to be the optimal tool for the assessment of quality of life in sarcopenic populations. Its use in clinical trials assessing biochemical entities for the management of sarcopenia should be recommended, as patient-related outcomes are encouraged to be included as co-primary endpoints in such studies [36].