Introduction

Cancer anorexia-cachexia syndrome (CACS) is a multifactorial condition characterized by an ongoing loss of skeletal muscle mass that leads to progressive functional impairment and cannot be fully reversed through conventional nutritional support [1, 2]. CACS affects a majority of patients with advanced cancer and is associated with response to therapy, a reduction in treatment tolerance, duration of survival, and quality of life [3]. Accordingly, weight loss and decreased muscle mass are conditions that warrant close monitoring in patients with solid tumors, including those with lung cancer.

Lung cancer is the leading cause of cancer-related deaths worldwide with an estimated 1.37 million deaths attributed to lung cancer in 2008 [4]. In the USA, 50 % of newly diagnosed patients with non-small cell lung cancer (NSCLC) present with advanced disease and, in spite of significant advances in treatment, their adjusted 1-year survival ranges from 12 to 27 % [5]. In a prospective study of patients with stages III and IV NSCLC, 58 % of patients reported weight loss and when compared to those who did not report weight loss, those who did had significantly lower response rates, shorter progression-free survival, and shorter overall survival [6].

In addition to a direct impact on response and survival, maintaining adequate nutrition and minimizing weight loss is a concern for many patients and their families [7]. A number of benefits in addition to survival might be associated with halting the progression of CACS, including enhancing physical ability and improving quality of life [8]. CACS in patients with NSCLC is an important area of study given the prevalence of the disease, the frequency with which it is associated with weight loss, and the prognostic implications of weight loss for patients with advanced NSCLC. Unfortunately, brief, valid, and reliable patient-reported outcomes of cachexia are lacking.

Anamorelin HCl is an orally active ghrelin receptor agonist with appetite-enhancing and anabolic activity. Based on the data available from phase I and phase II clinical trials, anamorelin HCl produced an increase in total body weight and lean body mass in patients with advanced cancer, and specifically in patients with NSCLC, in addition to increasing muscle strength and improving quality of life outcomes [913]. Anamorelin is currently being evaluated in three phase III studies in patients with NSCLC cachexia, including ROMANA 2 (NCT01387282), a randomized, double-blind, placebo-controlled, multicenter study to evaluate safety and efficacy [14].

The goal of this project was (1) to identify, blind to treatment assignment, the most responsive items from among self-report measures administered in the ROMANA 2 phase III global study, and (2) to evaluate the validity of the selected subset of questions for the patient population. Identification of a brief subset of items to assess cachexia in the context of NSCLC would thus address an important gap in the research literature and provide a useful assessment tool to facilitate treatment decision-making and enhance patient-centered care.

Methods

Samples and procedures

Patient recruitment for the ROMANA 2 trial took place in 7 countries at approximately 50 research sites. Written consent was obtained from all participants, and the study protocol was approved by each site’s institutional review board. Patients were eligible to participate in the study if they were/had ≥18 years of age, diagnosed with unresectable stage III or IV NSCLC, involuntary weight loss of ≥5 % body weight within 6 months prior to screening or a screening body mass index (BMI) <20 kg/m2, a body mass index ≤30 kg/m2, an ECOG performance status ≤2, an estimated life expectancy of >4 months, adequate hepatic and renal function, and able to understand and comply with the protocol tests and procedures. Patients may have been receiving maintenance chemotherapy, planning to initiate a new chemotherapy and/or radiation therapy regimen (within ±14 days of randomization), or may have completed a chemotherapy and/or radiation therapy and/or have no plan to initiate a new regimen within 12 weeks from randomization (at least 14 days must elapse from the completion of the chemotherapy and/or radiation therapy prior to randomization).

A total of 477 patients were planned to be randomized to receive either anamorelin HCl or placebo in a 2:1 ratio. Via an interactive voice response system (IVRS), central randomization stratified patients by geographic region (North America vs. rest of world), by chemotherapy and/or radiation therapy status (patients who initiated chemotherapy and/or radiation therapy within ±14 days of randomization or patients who were only receiving maintenance chemotherapy vs. patients who had no plan to initiate chemotherapy and/or radiation therapy within 12 weeks from randomization), and by weight loss over the prior 6 months (≤10 % of body weight vs. >10 % of body weight).

Study measures

Demographic information obtained included age, gender, race, ethnicity, and geographic region. Clinical information obtained included Karnofsky performance status (KPS), body weight, BMI, lean body mass (as measured by dual energy X-ray absorptiometry), muscle strength (as measured by handgrip strength), and the Hunger Assessment Scale (HAS), developed specifically for this trial. The HAS was comprised of two items: “I have felt hungry,” and “My family and friends are pleased with my appetite.” The HAS was analogous to the subjective hunger index from the Grand Hunger Scale [15], and participants responded to the items using a 0–4 Likert scale. Quality of life outcome measures included the Functional Assessment of Cancer Therapy-General (FACT-G) [16], Functional Assessment of Chronic Illness Therapy-Fatigue (FACIT-F) Subscale [17], and the Functional Assessment of Anorexia/Cachexia Therapy (FAACT) [18]. The FACIT-F and FAACT have been previously used and validated in clinical research investigations. The instruments were provided using certified translations (as available) and completed by patients in their native language. All assessments were administered at baseline and then at weeks 3, 6, 9, and 12 of the treatment period.

Analytic strategy

This project followed several steps. First, we analyzed the ROMANA 2 phase III trial data on a blinded basis to identify the most responsive items using a preliminary database that included all 203 patients who had reached the week 12 visit at the time of data analysis. Second, the most responsive items were reviewed by the research team at Northwestern University and Helsinn to identify a “best” list of items to analyze further. Third, the “best” items were validated and analyzed to identify the size of a meaningful difference and provide clinical interpretation.

Compliance

We calculated compliance rates for each question on the FACIT-F and FAACT as the proportion of patients who completed the question at each assessment from those who completed any question in that assessment.

Identification of most responsive items

We conducted a series of analyses to aid in the selection of a brief subset of the most responsive items. First, change scores were calculated for all FACIT-F and FAACT items as the week 12 score minus the baseline score. Then, we formed groups of patients based on whether their KPS improved, worsened, or stayed the same. Analysis of variance (ANOVA) was used to compare item change scores between these groups. A similar set of item change scores was used to compare patients classified into similar groups (improved, worsened, or unchanged) as determined by change in lean body mass, handgrip strength, and body weight.

Effect sizes to aid in the interpretation of the magnitude of the change were calculated. Items were flagged as “responsive” if they met the following criteria: (1) the effect size for at least one of the change groups was >0.30 in magnitude and in the expected direction, (2) the effect size in the other change group was in the expected direction, and (3) the effect size for the unchanged group fell between. Pearson correlation coefficients were calculated between the item change scores and the changes in KPS, lean body mass, handgrip strength, body weight, and the Hunger Assessment Scale items. Items were considered responsive if the correlation coefficient was greater than 0.25. These criteria were defined based on standard guidelines for interpreting effect sizes and correlations while also targeting the identification of a brief number of “most responsive” items.

Internal consistency reliability

Pearson correlation coefficients were calculated among all pairs of responsive items at each visit. Cronbach’s coefficient alpha was calculated to evaluate the internal consistency reliability of that set of items. We also examined these item-total correlations to identify items with poor correlations with the total.

Known-groups validity

We conducted cross-sectional analyses of scores focused on differentiating definable (“known”) groups according to appropriate clinical anchor variables. Groups were formed according to KPS, the questions of the Hunger Assessment Scale, lean body mass, and handgrip strength. We used ANOVA to compare mean scores between these groups. We expected that patients with worse performance/hunger status would have worse mean summary scores. Effect sizes (mean difference/pooled standard deviation) were calculated for group comparisons to provide an indication of the magnitude of these group differences.

Convergent validity

We hypothesized that the questionnaire summary scores would be associated with fatigue, as measured by the FACIT-F scale, anorexia/cachexia, as measured by the FAACT, and general quality of life/functioning, as measured by the FACT-G. We further anticipated that scores would be associated with clinical variables and the items of the Hunger Assessment Scale. The association between the summary scores and these other measures was evaluated using Pearson correlations at each assessment. Moderate correlations in the range of 0.3 to 0.7 were expected.

Responsiveness

As in the assessment of convergent validity, we hypothesized that changes in the summary scores would be moderately associated with changes in the FACIT-F, FACT-G, FAACT, lean body mass, handgrip strength, body weight, and Hunger Assessment Scale. To evaluate this, first change scores were calculated for all instruments by subtracting the baseline score. Pearson correlation coefficients were then calculated between the brief scale change scores and the FACIT-F, FACT-G, FAACT, lean body mass, handgrip strength, body weight, and Hunger Assessment Scale change scores.

We formed groups of patients based on whether their performance status improved, worsened, or stayed the same. ANOVA was used to compare questionnaire change scores between these groups. A similar comparison of change scores was used to compare patients between change groups defined by lean body mass change, handgrip strength change, and body weight change. Effect sizes were calculated.

Important differences and clinical interpretation

Distribution- and anchor-based methods were used to identify important differences for the new scale scores. Based upon other work [19, 20], distribution-based methods include 1/3 standard deviation (SD), 1/2 SD, and one standard error of measurement (SEM). Anchor-based methods [21] utilized cross-sectional and longitudinal data.

The randomized treatment arms were collapsed for all analyses described above and data remained blinded. There was no imputation of missing data and no adjustment for multiple comparisons. Analyses were conducted with SAS v9.3 (SAS Institute, Cary, NC).

Results

Descriptive characteristics of sample

A preliminary database lock was used for this study, selecting all patients who had reached the 12-week follow-up time when the data were pulled (February 2013). A total of 383 enrolled patients were included at the time of data analysis. Baseline demographic and clinical characteristics of the sample are presented in Table 1. The mean age was 63.3 years (SD = 8.7, range = 33 to 88).

Table 1 Baseline sample characteristics (N = 383)

Compliance

The number of patients remaining on study at each assessment was 325, 274, 233, and 203, for weeks 3, 6, 9, and 12, respectively. Item level compliance for the FACIT-F and FAACT was greater than 95 % for all items except for optional item GS7 (“I am satisfied with my sex life”) which had compliance levels of 34–41 %.

Confirmation of most responsive items

Table 2 summarizes the results of the group and correlation analyses. The items were sorted according to the number of variables to which they were responsive in these two analyses. The top eight items were selected as the subset of the most responsive items: ACT1—The amount I eat is sufficient to meet my needs, ACT9—Family/friends are pressuring me to eat, ACT10—When I eat I seem to get full quickly, C6—I have a good appetite, AN12—I am too tired to eat, GP7—I am forced to spend time in bed, HI7—I feel fatigued, and HI12—I feel weak all over. These eight items fell into two distinct categories with the first four measuring appetite/eating and the last four measuring fatigue/activity. Therefore, two separate scores were created by summing the responses to the four items in each scale. The validity of these two 4-item scale scores was assessed in the remainder of this report.

Table 2 Summary of item responsiveness

Internal consistency reliability

Fatigue/activity items were highly intercorrelated with inter-item correlations of 0.45 or greater at each time. Inter-item correlations among appetite/eating items were generally greater than 0.20 with the exception of ACT9 (family/friends pressuring to eat) with ACT1 (amount eat sufficient to meet needs). Cronbach’s coefficient alpha for the fatigue/activity scale was 0.83, 0.84, 0.84, 0.85, and 0.87 at baseline and weeks 3, 6, 9, and 12, respectively. All item-total correlations were at least 0.55. Cronbach’s coefficient alpha for the appetite/eating scale was 0.71, 0.73, 0.71, 0.73, and 0.76 at baseline and weeks 3, 6, 9, and 12, respectively. All item-total correlations were greater than 0.35.

Scale scores

The brief 4-item fatigue/activity and appetite/eating scores (also referred to as the Simplified Evaluation of Appetite (SEA) and Simplified Evaluation of Fatigue (SEF), respectively) were calculated according to standard FACIT measurement system guidelines. Specifically, negatively worded items were reverse scored and then all item scores were summed to create total scores such that higher scores indicated better quality of life or lower symptom levels. Descriptive statistics for these calculated scores are presented in Table 3.

Table 3 Descriptive statistics for fatigue/activity and appetite/eating scores

Known-groups validity

At baseline, fatigue/activity scores differed between KPS groups with effect sizes for differences between adjacent groups of 0.34 (p = 0.001). A similar relationship was observed at weeks 6 and 12, with slightly larger effect sizes of 0.38 to 0.53 (p < 0.001). Although not previously hypothesized, fatigue/activity scores also differed between groups defined by nondominant handgrip strength and the Hunger Assessment Scale item “My family and friends are pleased with my appetite” at all assessments (p < 0.005).

Baseline appetite/eating scores differed significantly between “I have felt hungry” groups (p < 0.001) but effect sizes for differences between adjacent groups were small to moderate (0.13 to 0.56). For “My family and friends are pleased with my appetite” groups, the effect sizes for differences between adjacent group were large (0.66 to 0.84, p < 0.001). Again, similar relationships were observed at weeks 6 and 12, with slightly larger effect sizes (0.37 to 1.31, p < 0.001). Although not previously hypothesized, appetite/eating scores also differed between groups defined by nondominant handgrip strength and KPS at all assessments (p < 0.004).

Convergent validity

Pearson correlation coefficients at each assessment between the brief scale scores and the existing measures are presented in Table 4. As hypothesized, the fatigue/activity and appetite/eating scores were associated with fatigue, as measured by the FACIT-F scale, anorexia/cachexia, as measured by the FAACT, and general quality of life/functioning, as measured by the FACT-G. Fatigue/activity scores were also moderately correlated with responses to the HAS item “My family and friends are pleased with my appetite.” Appetite/eating scores were more strongly correlated with that item and moderately correlated with “I have felt hungry.” Correlations with KPS and nondominant handgrip strength were small to moderate for both scales.

Table 4 Correlations between brief scale scores and existing measures (convergent validity)

Responsiveness

Pearson correlation coefficients between the brief scale change scores and the FACIT-F, FACT-G, FAACT, lean body mass, handgrip strength, body weight, and Hunger Assessment Scale change scores are presented in Table 5. As hypothesized, changes in fatigue/activity and appetite/eating scores were associated with changes in fatigue, as measured by the FACIT-F scale, anorexia/cachexia, as measured by the FAACT, and general quality of life/functioning, as measured by the FACT-G. Fatigue/activity change scores were also moderately correlated with changes in responses to the item “My family and friends are pleased with my appetite.” Appetite/eating change scores were more strongly correlated with changes in that item and moderately correlated with changes in “I have felt hungry” and body weight. Correlations with changes in KPS and nondominant handgrip strength were small for both scales.

Table 5 Correlations between brief scale change scores and existing measures’ change scores (responsiveness)

Important differences and clinical interpretation

The distribution-based estimates are summarized in Table 6. Of the cross-sectional anchor-based analyses, many effect sizes for comparison of KPS and “My family and friends are pleased with my appetite” groups met the criteria for consideration and are listed in Table 6. For the longitudinal analyses, however, only the body weight change group comparisons met the criteria and only for the appetite/eating score. The estimated (minimally) important difference for the fatigue/activity scale was 1.3 to 1.9 points; for the appetite/eating scale, it was 1.2 to 2.0 points; that is, roughly 1–2 points for both scales.

Table 6 Distribution-based and anchor-based estimates of important differences

Discussion

This study identified eight of the most responsive items from the FACIT-F and FAACT scales for use in patients undergoing therapy for cancer-related anorexia/cachexia. These eight items were scored as two scales of four items each: fatigue/activity and appetite/eating. Scores differed between groups defined by performance status, nondominant handgrip strength, lean body mass, and items assessing self-perceptions of hunger and others’ satisfaction with patient’s appetite. As expected, fatigue/activity and appetite/eating scores were associated with fatigue, anorexia/cachexia, and general quality of life.

The fatigue/activity and appetite/eating scales demonstrated strong psychometric properties. These two scales had good internal consistency reliability across multiple assessments and when examining change from baseline. In addition, changes in fatigue/activity and appetite/eating scores were not merely limited to associations with changes in fatigue (FACIT-F) and appetite (Hunger Assessment Scale), respectively, but were also associated with changes in anorexia/cachexia, general quality of life, and body weight with those gaining (or losing) the most weight reporting small but significant improvements (or declines) in scores. Similar associations were seen with change in lean body mass. Collectively, these findings underscore the potential utility of these brief subscales for assessing and better understanding cachexia in patients with advanced NSCLC.

Importantly, the brief versions of the FACIT-F and FAACT subscales compared quite favorably to their longer counterparts. Within this current sample, the correlations between the fatigue/activity and the FACIT-F and between the appetite/eating and the FAACT were strong across all assessments, r = 0.91 to 0.93 and r = 0.68 to 0.77 (Table 4). When compared to the validation samples [18, 22], internal consistency for the fatigue/activity scale had slightly lower internal consistency (range 0.83–0.87 vs. 0.93–0.95), but the appetite/eating scale was equivalent (range 0.71–0.76 vs. 0.76). For known-groups validity, the fatigue/activity scale had a significant relationship to performance status scores in the validation sample as well as this current sample (both p < 0.05). The responsiveness of the appetite/eating scale was also quite similar when examining the relationship to performance status in the validation sample compared to the current sample (both p < 0.05).

This study has some limitations. First, the predominantly white, non-Hispanic, male sample may limit the generalizability of the findings. Second, although this study intentionally focused on patients with advanced NSCLC only, cachexia is a significant concern among patients with other types of cancer as well. These brief scales should be tested in other relevant clinical populations. Third, in the process of validating the brief scales, some of our outcomes included redundant item content. We retained overlapping items and reported the findings because they provided the most accurate portrayal of association between original and revised (abbreviated) scales. This was important when evaluating the construct validity of our new measures with well-established and/or face-valid measures.

In summary, this multi-site, longitudinal study included a wide range of important clinical and patient-reported outcomes to identify and validate a brief measure of CACS in patients with advanced stage NSCLC. The results indicated that the fatigue/activity and appetite/eating scales derived from the FACIT-F and FAACT were reliable, valid, and responsive measures. The estimated important difference for each scale was approximately 1 to 2 points, suggesting guidelines for evaluation of clinically significant levels of change and highlighting the utility of these assessment tools in clinical trials. As mentioned above, future work should examine the generalizability of these findings to patients with other solid tumors who are prone to CACS. In addition, clinicians and researchers alike may benefit from implementing this tool in clinical practice settings to examine ways in which it may facilitate treatment decision-making, improve patient-centered care, and enhance quality of life outcomes for patients suffering from cancer-related cachexia.