Introduction

Huntington disease (HD) is an autosomal-dominant inherited neurodegenerative condition caused by a cytosine–adenine–guanine (CAG) expansion in the HTT gene.1 Although HD diagnosis is based on motor signs, the disease is accompanied by cognitive and functional decline and behavioral and psychiatric disturbances that progress to severe functional impairment, dementia, and premature death.2 These changes can be detected up to 15 years before motor diagnosis,3 a period referred to as prodromal HD. The changes associated with HD progression have the potential to impact day-to-day function.4, 5, 6, 7

The World Health Organization defines function as the impact of interactions between health conditions, environmental and personal factors on daily activities, participation in society, and bodily functions of an individual.8 The Food and Drug Administration identified day-to-day function as an important outcome variable to include in patient-related outcome measures for use in clinical trials.9 This emphasizes that treatments for health conditions should result in subjective improvements of daily functioning.

Measures to detect changes in function in prodromal HD are needed for use in clinical trials of treatments to delay or prevent symptoms associated with HD.10 Previous publications using traditional measures of daily functioning in HD have suggested that current measures are insensitive to very early changes and as a result fail to document more subtle alterations in functioning that accompany early cognitive, motor, and behavioral changes due to disease.6, 7, 8, 9, 10, 11 The Total Functional Capacity (TFC) scale12 is recommended by the National Institute of Neurological Disorders and Stroke as a core measure of functional capacity in research with participants with HD.13 However, clinical trials based on this measure may lack sensitivity for the detection of early functional disturbances.6 The fifth edition of the Diagnostic and Statistical Manual for Mental Disorders has abolished the multi-axial system and recommended that the World Health Organization Disability Assessment Schedule (WHODAS) 2.0 be used to provide a measure of daily functioning and disability in neuropsychiatric disorders. Rationale for the deletion of the previous measure, the Global Assessment of Functioning scale, was secondary to its lack of conceptual clarity and questionable psychometrics in routine practice. On the dawn of the DSM-5, the field is challenged to provide data with which to understand and interpret this newly recommended scale. An objective of the current study was to administer the WHODAS to a large group of people with prodromal HD who, by definition, reflect the progression of neuropsychiatric disease from mild to moderate and to present the findings as a benchmark for other disorders.

The 36-item WHODAS measures day-to-day functioning across health conditions. In contrast to the TFC’s global assessment of function, the WHODAS assesses specific areas of cognitive, behavioral, and physical functioning. The WHODAS has not previously been used in an HD sample and one purpose of this research was to compare the WHODAS with the traditional functional capacity measure used in HD, the TFC scale.

The DSM-5 also recommends administration of the proxy WHODAS when patients have impaired cognitive capacity. There is evidence of impaired awareness of cognitive, behavioral, and motor changes in people with HD before diagnosis.14 Therefore, it may be critical to understand the validity of self-report (versus proxy-report) of outcome measures considered for clinical trials in HD. The aims of the study were (1) to examine cross-sectional and longitudinal participant- and companion-rated measures of daily activity using the 36-item WHODAS in groups of individuals with varying degrees of prodromal disease progression and gene non-expanded controls; (2) to compare longitudinal participant and companion ratings on the WHODAS, and (3) to compare the WHODAS and the TFC in terms of baseline and longitudinal differences.

Materials and methods

Participants

Participants were from the Neurobiological Predictors of HD (PREDICT-HD) study. Participants independently underwent testing for the HD gene mutation and knew their gene status before study participation. Individuals with the CAG repeat expansion (CAG ≥36) served as cases, whereas people at risk for HD who tested negative for the gene expansion (CAG <36) served as controls. The PREDICT-HD study began data collection in 2001 and the WHODAS was added to the battery in 2009. The TFC was collected in its entirety from 2001 to 2008 and then an abbreviated TFC was collected based on an item-by-item analysis of existing data.

The analysis used data from 726 participants with 1413 observations and 630 companions with 1117 observations. Companions were predominately spouse/partner (74%) followed by friend/neighbor (8%), parent (7%), and sibling (5%), and 74.6% of the companions reported living with the participants. The mean number of years knowing the participant was 20.77 (SD=13.59) years. The median number of follow-up visits was 2 (range=1–4), and the length of follow-up was up to 3.11 years (median=1.14 years). We analyzed data from participants who were currently employed outside their home or attending school. Eight percent of observations (not subjects) were omitted because of missing data.

Progression groups

Individuals entered PREDICT-HD with different disease progression levels and were classified accordingly. Baseline refers to the initial visit when the WHODAS was first administered for each participant. Participants were classified into one of three prodromal HD groups based on their CAG-Age Product (CAP) score,15 computed as CAP=(age at baseline) × (CAG–33.66). The CAP formula was derived from a parametric accelerated failure time model predicting motor diagnosis from age at entry, CAG length, and their interaction. CAP is similar to the ‘disease burden’ score of Penny et al.16 and is presumed to index the cumulative toxicity of mutant huntingtin at the time of study entry. CAP can be converted to a scaled version expressing the 5-year probability of motor diagnosis from study entry (5 years is roughly the average time to diagnosis for the converters in PREDICT-HD). Cutoffs were derived for the best fitting subgroups based on an optimization algorithm using an earlier sample of PREDICT-HD participants.15 Based on gene status and a discretization of the CAP distribution, four groups were defined in this analysis: control, low, medium, and high. The estimated time to motor diagnosis for each CAP group is >12.8 years for the low group, 7.6–12.8 years for the medium group, and <7.6 years for the high group.

Measures

The WHODAS includes 36 items in six areas of day-to-day functioning: understanding and communicating (6 items); getting around (5 items); self-care (4 items); getting along with others (5 items); activities at home, work, and/or school (8 items); and participation in society (8 items). There are five response categories for each item (none=1; mild=2; moderate=3; severe=4; extreme/cannot do=5). The range of scores for the WHODAS is 36–180 with higher scores indicating worse functioning. For participants, the WHODAS was administered by a trained staff member. For companions, the WHODAS was administered by a trained staff member or was self-administered. All respondents were asked to rate how much difficulty the participant had with each item and how much these difficulties interfered with the participant’s life.

The TFC consists of five questions related to the functioning domains of occupation, finances, domestic chores, activities of daily living, and care level. Each item has either three or four response categories (0 to either 2 or 3) for a total possible range of scores from 0 to 13, with higher scores indicating better functioning. The TFC is rated by a clinician following a 5- to 10-min narrative interview with each participant and their companion together. Rating choices are specific to each functional domain (eg, usual level, full capacity, impaired). In PREDICT-HD 2.0, only two TFC items, occupation and finance, were collected because of a need to reduce the functional battery. The three omitted items were endorsed by fewer than 2% of participants in PREDICT-HD 1.0.6 The retained two items were summed to yield a modified TFC score in this analysis, with a range of 0–6. To facilitate comparison with the WHODAS, TFC was scaled as TFC loss, so that high scores indicate worse function. Thus, results are reported as TFC loss computed as 6–(occupation item score+finance item score). TFC loss scores ranged from 0 to 6 with higher score indicating worse functioning.

Statistical methods

The WHODAS total score was calculated as the sum of the 36-item scores according to the simple scoring and missing data instructions.17 Baseline characteristics were compared by group using an analysis of variance for continuous variables and a χ2 test for categorical variables. To examine the internal consistency of the WHODAS and TFC, Cronbach’s alpha analyses were performed using baseline WHODAS and modified TFC scores.

Group comparison

To examine group differences in baseline status and rate of change over time, participant and companion ratings were analyzed separately using linear mixed effects regression (LMER).18 The time metric duration was time since initial WHODAS administration expressed in years. All models included gender, years of education, and age at entry as covariates. Three models were fitted to test for baseline (intercept) and longitudinal (slope) group differences: Model 1 was a null model that included the covariates and duration as predictors, but no group differences; Model 2 added group main effects to examine baseline differences among the groups (baseline effect); Model 3 added the interaction between duration and groups to test differences in the rate of change among the groups (longitudinal effect). Details are provided in Supplementary Appendix A.1.

The models were evaluated using Akaike’s information criterion (AIC), corrected for small-sample bias (AICc).19, 20 Two scalings of AICc were used: the difference in AICc (dAICc) and the AICc weight (wAICc). The dAICc was computed as the difference in AICc values between each model and the model with the lowest AICc with smaller values indicating better fit (the best fitting model would have dAICc=0 and all other models dAICc >0). The wAICc was a probability scaling (0≤wAICc≤1), with values closer to one indicating better fit. If Model 2 or 3 was the best fitting, baseline and longitudinal differences between controls and each gene-expanded group were reported.

Participant and companion comparison

To test whether participant and companion slopes were equal in each group, participant and companion data were modeled simultaneously using multi-response LMER. Details are provided in Supplementary Appendix A.2. There were various combinations of equal and non-equal slopes among the groups resulting in 16 possible candidate models as shown in the appendices (Supplementary Appendix Table A.2). Model 1 was the simplest having equal slopes in all groups, and Model 16 was the most complex having unequal slopes in all groups. All models included gender, years of education, and age at entry as covariates. The models were assessed by AICc values as described above to examine which groups had statistically reliable slope differences. The relative importance of the slope discrepancy of each group was assessed by the sum of the weights (wAICc) across all models with unequal group slopes (eight models for each group).21 A sum closer to one indicates higher importance. Fitted curves were drawn using model-averaged parameters over all models.21

WHODAS and TFC comparison

The relative sensitivity of the WHODAS ratings was evaluated by comparison to the TFC. Because TFC loss and WHODAS had different units, both were transformed to a common scale by subtracting the mean and then dividing by the standard deviation. For each scaled measure, the model with baseline and longitudinal group differences was fitted. The effect sizes were based on the baseline and longitudinal differences between the controls and each of the gene-expanded groups. The effect sizes were the t-values of the differences computed as the difference divided by its standard error. In each group, effect sizes were compared between WHODAS and TFC when the effect was significant for at least one measure.

All statistical analyses were performed in R 2.15.2 and LMER analyses were performed using the package lme4.22, 23

Results

Group comparison

Participant characteristics at initial WHODAS administration are presented in Table 1. Cronbach’s alpha values were 0.92 for participant WHODAS, 0.91 for companion WHODAS, and 0.87 for modified TFC indicating a relatively high of degree of internal consistency. Table 2 displays the model comparisons for participants and companions (analyzed separately). For participants, the model with baseline differences among groups was best fitting (wAICc=0.80). In contrast, the model with both baseline and longitudinal differences among groups was the best fitting for companions (wAICc=0.95). For the two best fitting models, all the parameter estimates are presented in the Supplementary Appendix Table A.1.

Table 1 Participant characteristics at initial WHODAS administration
Table 2 Separate participant and companion WHODAS analysis results

Table 3 lists the group comparison WHODAS results for the best fitting models from Table 2. For each effect, the focus was on the difference between the controls and each of the gene-expanded groups for self-reported functioning. The upper portion of Table 3 presents baseline group differences for participants. The baseline difference relative to controls increased going from low to high. The difference for the high group was positive and significant (P<0.001). This indicates that self-reported functioning at baseline was significantly worse for the high group than controls. The difference for the medium group was marginally significant (P=0.054).

Table 3 Comparison of WHODAS scores between control group and each of the gene expanded groups for the best fitting models from Table 2

The bottom portion of Table 3 presents both baseline and longitudinal group differences for companion-rated functioning of participants. Similar to the participant results, the baseline difference relative to controls increased from low to high. The baseline difference for the high group was positive and significant (P<0.001), indicating companions of participants in the high group reported significantly worse participant functioning at baseline than companions of the controls. Unlike the baseline difference, the longitudinal difference relative to controls did not have a monotonic change with group. For the longitudinal difference, the medium and high group differences were positive and significant (all P=0.02), indicating companions of persons in these groups reported significantly worse functioning of participants over time relative to companions of the controls.

Participant and companion comparison

The simultaneous analysis of participant and companion WHODAS scores addressed whether longitudinal changes of participant and companion ratings were different in each group. Figure 1 shows the fitted curves using model-averaged coefficients across all models. The participant and companion curves were relatively similar in each group from the control to the medium group indicating similar rates of change over time in participant and companion ratings. In contrast, there was a divergence between the two curves in the high group indicating companions reported worse functional decline over time than participants. The divergence in the high group was much more reliable than the other groups as indicated by the sum of weights shown in the Supplementary Appendix Table A.3 (high group weight=0.93). It was notable that in the medium group the slope increased sharply for both participants and companions indicating much faster functional decline over time rated by participants and companions.

Figure 1
figure 1

Fitted LMER curves by group for participant and companion WHODAS ratings. All model coefficients were estimated adjusting for gender, age at baseline, and years of education. The plots show the WHODAS total score as a function of duration, informant (participant or companion), and group.

WHODAS and TFC comparison

Companion WHODAS ratings were better at detecting baseline and longitudinal group differences than participant ratings. Therefore, comparison was made between companion WHODAS and TFC. At baseline, 93% of participants had the minimum possible TFC loss and 38% of companion-rated participants had the minimum possible WHODAS. For participants with repeated measures, only 10% had a change in TFC loss over time. In contrast, 75% had a WHODAS rating that changed over time.

We also focused on the baseline and longitudinal differences between the controls and each of the gene-expanded groups. Group comparison results for the companion WHODAS and TFC loss scores are shown in the Supplementary Appendix Table A.4. At baseline, companion WHODAS had a larger difference than TFC loss for the high group (t=3.80 versus 2.33). There was no significant baseline difference for the low and medium groups. For the longitudinal differences, companion WHODAS ratings had a larger difference for the medium group (t=2.33 versus 1.93), whereas TFC loss had a larger difference for the high group (t=3.76 versus 2.3). There were no significant longitudinal differences for the low group. Figure 2 shows fitted curves for the companion WHODAS and TFC loss. The baseline differences relative to controls were larger in all groups for companion WHODAS than TFC loss. The longitudinal difference for the medium group was larger for companion WHODAS, whereas the longitudinal difference for the high group was larger for TFC loss.

Figure 2
figure 2

Fitted LMER curves by group for scaled companion WHODAS and TFC loss scores. All model coefficients were estimated adjusting for gender, age at baseline, and years of education. The plots show the scaled companion WHODAS and TFC loss scores as a function of duration and group.

Discussion

This is the first known analysis of the WHODAS 2.0 in prodromal HD. The aims of this study were to examine baseline and longitudinal changes in participant and companion ratings of the WHODAS, to compare participant and companion ratings of function in prodromal HD, and compare the ability of the TFC and the WHODAS to detect baseline and longitudinal functional changes.

The participant and companion WHODAS baseline differences increased relative to controls from low to high groups, indicating that stratification based on CAP scores facilitates identification of cross-sectional differences in day-to-day functioning that may be obscured if prodromal participants are compared with controls without stratification. The ability to identify differences in baseline functioning by estimated proximity to motor diagnosis may be useful for establishing inclusion criteria in clinical trials.

The main finding of the participant and companion comparison is the significant divergence in participant and companion WHODAS scores by progression group. Participant and companion ratings were concordant in the control, low, and medium groups with both participants and companions reporting declines in functioning over time. The decline in the medium group was sharp, indicating that impairment in day-to-day functioning became most noticeable to both participants and companions 7.6–12.8 years before motor diagnosis. It may be optimal to begin interventions that target preservation of day-to-day functioning in the medium group. In the high group, there is a divergence between participant and companion reports with participants reporting slower rate of functional decline than companions, who report decline at a rate similar to the medium group with greater functional loss. These results suggest that proxy measures may be more reliable in detecting day-to-day functional changes in later stages of prodromal HD disease progression. The Food and Drug Administration advocates the use of patient-related outcome measures to evaluate treatment benefit with data coming directly from the patient.9 However, the discrepancy between participant and companion reports in the high group may indicate declining self-awareness of functional changes because of frontal-subcortical dysfunction that occurs in HD. Awareness of loss-of-functional abilities is impaired in diagnosed HD.24 It is unclear at what point lack of awareness begins in the HD disease process, although awareness of executive dysfunction has been shown to be impaired before motor diagnosis.14 Our results support proxy data may be more helpful than participant data in measuring day-to-day functional decline in participants nearing motor diagnosis.

Measures that show evidence of sensitivity to change over time are crucial for use in clinical trials to delay or prevent symptoms of HD. It has been challenging to find measures that detect functional changes in people estimated to be relatively far from motor diagnosis.25 We compared the ability of the TFC and the companion WHODAS to detect functional changes in prodromal HD. The TFC showed more functional loss in the high group than the companion WHODAS. The WHODAS showed more functional loss in the medium group than the TFC and was also able to detect changes in the high group, although the effect size was not as great as the TFC for the high group. Even so, the WHODAS detected greater variability of change over time across all prodromal groups than the TFC. A majority of companion WHODAS ratings (75%) showed longitudinal changes, whereas only a minority of TFC ratings (10%) showed longitudinal changes. Baseline variability was also greater in the WHODAS (38% with best possible score versus 93% for the TFC). Therefore, the WHODAS has greater sensitivity in detecting change over time in prodromal HD than the TFC.

This study presents WHODAS 2.0 data from the largest sample of prodromal HD participants to date. A limitation of this study is that we relied on only two items from the TFC, which somewhat reduced the variability of the outcome. Although the omitted items were endorsed by fewer than 2% of participants in PREDICT-HD 1.0,6 their potential impact on the variability of the TFC in our sample is unknown. Also, we used LMER modeling in our analysis, which may not be appropriate for change over time in the TFC considering its limited variability. There is no TFC companion measure although companions may be present during the clinician-led interview that is conducted to complete the TFC. Companions may be reluctant to report functional impairments with participants present. It is possible a companion TFC may be useful in detecting functional changes in the medium group. The companion WHODAS was not administered in a consistent manner, with some being administered by trained staff, and others self-administered. We have not analyzed results for differences related to method of administration. However, in both methods, the participant was not present during administration.

The findings from this study support the WHODAS 2.0 is better at detecting changes in day-to-day functioning in earlier stages of prodromal HD than the TFC and is an index of HD disease progression. The WHODAS may be more useful than the TFC to test the ability of treatments and interventions to preserve day-to-day functioning at this stage of disease progression. Treatments should ideally begin before striatal volume loss, which declines precipitously 15 years before motor diagnosis.3 Measures that are sensitive to functional changes further out from motor diagnosis are critical for use in clinical trials. The WHODAS is sensitive to change in groups estimated to be 12.8 years and less from motor diagnosis. The companion WHODAS performed better than the participant version in the longitudinal analysis and there was a discrepancy in reporting of function between participants and companions in the high group. Owing to the concern for lack of awareness of functional impairment in prodromal HD, it is important to consider including companion measures of day-to-day function in studies designed to test the effectiveness of treatments and interventions to improve functional outcomes. Our findings suggest the WHODAS may be useful in other neuropsychiatric diseases, which supports the DSM-5 recommendation.