Introduction
Chronic migraine (CM) is a common neurological disorder defined as having 15 or more headache days per month for more than 3 months with at least 8 days per month having features of migraine with or without aura [
1]. Previous research has shown associations between CM and increased headache impact and disability as well as decreased health-related quality of life (HRQoL) [
2‐
5]. Migraine is associated with increased familial burden and elevated direct and indirect medical costs [
6‐
9], as well as increased occurrence of fatigue, irritability, headache pain severity, and comorbidities [
10‐
12].
Preventive treatments for migraine are intended to decrease the frequency and impact of migraine attacks. A typical endpoint in migraine prevention trials is the mean change in monthly migraine days (MMDs) relative to pre-treatment baseline levels. Over the past two decades, however, the importance of using patient-reported outcome measures (PROMs) as secondary measures to better characterize the patient experience and potential treatment benefits has been recognized.
Many PROMs have been included in migraine prevention studies. One of these, the 6-item Headache Impact Test (HIT-6) [
13], recommended by the American Headache Society [
14], is intended to measure the impact of headache on daily life, with higher scores reflecting greater migraine impact [
16]. The HIT-6 measures headache-related impact on six items, including severe headache pain, limitations to usual daily activities, the wish to lie down, fatigue, negative affect, and limitations to concentration. The items of the HIT-6 were selected from a large headache-related item bank [
15] developed based on item response theory (IRT) parameters.
A substantial body of literature supports the HIT-6 as a precise and reliable PROM for assessing the impact of headache in the general headache population, as well as in patients with migraine [
12,
13,
16‐
19]. However, much of the previous research has evaluated the broad headache population, and there is limited work specifically focused on use of the HIT-6 in CM, which is a particularly debilitating condition with features unique from other headache and migraine disorders.
The objective of the current research was to expand the existing knowledge base regarding the psychometric properties and evidence for validity of the HIT-6 in the CM population using data from a large clinical trial. Analyses were conducted to examine the model fit and individual item performance of the HIT-6 items using IRT, as well as to examine the internal consistency and test–retest reliability of the HIT-6 summed scores in a CM-specific sample. In addition, we performed analyses to examine convergent and discriminant validity and to evaluate the ability of the HIT-6 total score to distinguish between known groups and to demonstrate change.
Methods
Data source
The PRevention Of Migraine via Intravenous ALD403 Safety and Efficacy‒2 (PROMISE-2) study (ClinicalTrials.gov Identifier: NCT02974153) was a phase 3, randomized, double-blind, placebo-controlled trial evaluating the safety and efficacy of eptinezumab for the prevention of CM [
20]. Eligible patients (
N = 1072), with a diagnosis of CM per the International Classification of Headache Disorders third edition (beta) [
21], were randomized to receive eptinezumab 100 or 300 mg, or placebo, administered by 30-min intravenous infusion once every 12 weeks.
Study approval for PROMISE-2 was provided by the independent ethics committee or institutional review board at each study site. The research was conducted in accordance with current Good Clinical Practice, the principles of the Declaration of Helsinki, and local regulatory requirements. Each enrollee provided written informed consent prior to their participation.
Study measures
The current analyses used all available HIT-6 data from the PROMISE-2 study, pooling active treatment and placebo groups. For the reliability analyses, all data from the screening and baseline visits of those patients who passed screening and were accepted into the trial were evaluated. For the validity analyses, all data on measures and variables of interest at baseline and week 12 time points were evaluated.
The HIT-6 [
13] measures the impact and effect of headache on the ability to function normally in daily life, and consists of six questions, each with five verbal response categories. Per the HIT-6 User’s Manual [
22], the following values are used to score responses: never = 6, rarely = 8, sometimes = 10, very often = 11, and always = 13; these category weights were selected so that HIT-6 summed scores would correspond as closely as possible to scores from response pattern-based IRT scoring [
13]. The total score was obtained by summing the responses to all six items using item weights just specified. Scores ≥ 60 were indicative of severe life impact, 56–59 of substantial life impact, 50–55 of some life impact, and ≤ 49 of little to no life impact. For the reported item-level analyses (item-level descriptive statistics, latent variable modeling, classical test theory analyses), the ordinal HIT-6 responses were coded as: never = 1, rarely = 2, sometimes = 3, very often = 4, and always = 5.
Baseline MMDs and monthly headache days (MHDs) were the number of migraine or headache days, respectively, reported during the 28-day screening period.
The Patient Global Impression of Change (PGIC) [
23] was a single question concerning the patient’s impression of the change in their disease status since the start of the study. Verbal responses were scored on a seven-category scale (from “very much improved” to “very much worse”). The Short-Form Health Survey (SF-36 v2.0) [
24] is a widely used, 36-question assessment measuring 8 domains of HRQoL (physical functioning, physical role functioning, emotional role functioning, vitality, mental health, social functioning, bodily pain, and general health) over the previous 4 weeks. Domain scores are created from between 2 to 10 items, depending on domain, and all have been found to exhibit suitable reliability in a wide variety of populations [
25,
26]. The current analyses focused on the domains of bodily pain, physical role functioning, and emotional role functioning, in which higher scores indicate better functioning/health. The EuroQol five-dimension, five-level scale (EQ-5D-5L) [
27] consists of five dimensions/items (scored using integer values ranging from 1 = “no problems” to 5 = “extreme problems”) and a visual analog scale (VAS; scored from 0 = “the worst health you can imagine” to 100 = “the best health you can imagine”). The current analyses focused on the individual item responses related to usual activities, pain/discomfort, and mobility dimensions.
Data handling
All analyses were performed by pooling treatment arms and sites using all available data. Data management, descriptive summaries, and statistical tests were conducted using SAS software, version 9.4 (Cary, NC, USA).
No specific rules for missing item-level data on the HIT-6 are contained within the User’s Manual [
22]. To be conservative, no imputation for missing data was used in these analyses, meaning that HIT-6 total scores were not to be calculated for any observations with missing item responses. No corrections were made for multiple testing to control Type I errors; the broader purpose of any presented
p value was to help describe general patterns of effects regarding the HIT-6 scores.
Analytic plan
Item-level descriptive statistics
Descriptive statistics and an observed frequency table for each of the HIT-6 items at the baseline assessment were examined for floor effect, ceiling effect, and missing data issues. The floor and ceiling effects were evaluated by looking at the percentages of responses in the lower and upper extreme response categories (i.e., “never” and “always”). Prior to IRT analyses, HIT-6 item responses were collapsed, if necessary, to obtain a minimum of five observed responses in each analyzed response category, to ensure sufficient observations in each category for accurate parameter estimation.
Unidimensional model fit
In consideration of the extensive psychometric work used to develop the HIT item bank and select items for the HIT-6, exploratory latent variable models were deemed unnecessary to assess the degree to which the HIT-6 items conformed to the theoretical model underlying them; however, a unidimensional IRT model was fit to the baseline HIT-6 data in flexMIRT 3.5 [
28]. Given the ordered categorical response scale for all items, the IRT item model used was the graded response model [
29]. Maximum marginal likelihood via the Bock-Aitkin expectation–maximization algorithm [
30] was used to estimate IRT parameters; as this is a full-information estimation method [
31], all observations (including those with item-level missing responses) were to be included in the analyses. Standard errors (SEs) were calculated via the supplemented expectation–maximization algorithm [
32]. The fit of each model was evaluated using the Tucker-Lewis index (TLI) [
33], and the limited information M
2-based root mean square error of approximation (RMSEA) [
34,
35], using customary cut-offs for adequate fit of ≥ 0.95 for the TLI [
36] and < 0.08 for the RMSEA [
37]. Item-level fit was evaluated using the summed-score-based item-fit diagnostic S-X
2 [
38,
39].
Internal consistency reliability
To assess internal consistency/reliability, classical test theory analyses (i.e., item-total correlations, coefficient alpha, and alpha with item removed) were computed for the HIT-6 using baseline data, along with the IRT-based reliability plot and the IRT-based marginal reliability estimate. For the interested reader, Edwards [
40] provides a general introduction to IRT, including how the concept of reliability in IRT varies from traditional, single-number summary values.
Coefficient alpha was calculated using two methodologies in recognition of the ordinal scale of the HIT-6 item responses: (i) based on Pearson correlations (traditional approach) using SAS software and (ii) based on polychoric correlations (modified approach) using R v3.4.3 [
41]. Consistent with the HIT-6 manual [
22] and the assumptions underlying coefficient alpha, only HIT-6 observations with complete item responses were used for the internal consistency reliability analyses. A minimum value of 0.70 demonstrated satisfactory reliability in evaluating both the IRT marginal reliability and coefficient alpha values [
42].
Test–retest reliability
Test–retest correlations were calculated from screening to baseline for the HIT-6 total summed scores via uncorrected Pearson correlations and intraclass correlations (ICCs) from a two-way mixed-effect model with absolute agreement for single measures [
43]. It was expected that patients would be relatively stable as both assessments occurred prior to study treatment; therefore, an anchor variable to define stability was not needed.
Convergent and discriminant validity
When correlations were examined between HIT-6 total scores and another continuous variable, Pearson correlations were used. When examined in relationship with a categorical/ordinal variable, Spearman correlations were used. All planned correlations were pre-specified with respect to expected direction and strength/effect size by a team comprising migraine experts and statistical methodologists [
44]. With regard to effect size, a correlation of 0.1 indicated a small effect, 0.3 indicated a moderate effect, and 0.5 indicated a large effect size.
Trial eligibility criteria tend to create a homogeneous sample at the beginning of a study [
45] and statistical theory tells us that having reduced variability within a sample can lead to artificially lowered correlations [
46]. Since variability in all measures tends to increase over the course of the trial (due to treatment effects), we also examined a subset of the convergent/discriminant correlations at week 12, when greater heterogeneity in variables was expected and correlations would be unattenuated.
Known-groups validity
Distinct groups were created using the week 12 data. The “improved” group comprised those patients with PGIC item responses of “very much improved” and “much improved”. The “not improved” group contained those patients with responses of “minimally improved”, “no change”, “minimally worse”, and “much worse”. Similar analyses were conducted using groups defined by headache frequency during weeks 9‒12. Patients who reported ≥ 15 headache days during the 4-week period were classified as “chronic” (consistent with clinical practice), while patients with < 15 headache days over the same period were classified as “non-chronic”. All group differences were examined against typical Cohen’s
d criteria where 0.2 indicated a small effect, 0.5 a moderate effect, and 0.8 or greater a large effect size [
47].
Sensitivity to change
Change scores for HIT-6 total scores and individual HIT-6 items were correlated with change scores for other validation measures. Change on any variable of interest was defined from baseline to week 12; week 12 MMD and MHD values were defined as the number of migraine or headache days, respectively, reported between weeks 9 and 12 to match the 4-week recall period of the HIT-6 scores.
Discussion
The HIT-6 appears to be a reliable and valid instrument for the assessment of headache impact in patients with CM, based on our analyses using data from PROMISE-2. Although CM shares features with other headache diagnoses, it is important to recognize that these are distinct conditions and, as such, it is critical that headache PROMs are rigorously evaluated for specific use in the CM population. One goal of the current study was to provide a unique psychometric evaluation of the HIT-6 using IRT in a CM sample, and results demonstrated that the HIT-6 was successfully calibrated using a unidimensional IRT model. Correlations of HIT-6 total scores with the reference measures, both cross-sectionally and using longitudinal change scores, conformed to expectations with respect to direction and often conformed to expectations of magnitude. Known-groups analyses and correlation of change scores also supported the contention that the HIT-6 total scores behave in a manner consistent with the assessment of headache impact.
The IRT results demonstrated that all HIT-6 items provided good coverage over the latent construct of headache impact, and each provided valuable information to the total score. These results are similar to a previous study of 1384 patients with CM in which a unidimensional model fitted to the data met the typical cut-values for good fit [
19]. Conversely, in a psychometric examination of the HIT-6 in headache clinic patients (
N = 309) [
16], while most items could differentiate between a wide range of individuals with migraine, there was a lack of unique information provided by the lower response categories for the pain severity and wishing to lie down items, suggesting that these items were unable to separate fine-grained differences. Given the severity of migraine for those living with CM, having less information at the lower end of the headache impact continuum should not be problematic in most settings. However, if one expects large, meaningful, positive changes, it may be worth taking advantage of the full HIT item bank using a computerized adaptive administration to maximize measurement precision and reliability over the range of experience.
Internal consistency estimates demonstrated reliable scores across a wide range of headache impact, with good marginal reliability. Moreover, coefficient alpha estimates were also in the acceptable range, and these results were in line with previous examinations of the reliability of the HIT-6 [
12,
16‐
19]. Test–retest reliability between screening and baseline was slightly lower than would be considered acceptable for continued use. However, this is likely due to the homogeneity of the patient sample prior to treatment due to the trial enrollment criteria; limited variability can artificially reduce/attenuate estimates of correlations [
46]. Previously reported test–retest values of the HIT-6 scores, despite differences in methodologies and time points used across studies, were found to be at acceptable levels [
13,
19].
The results of the convergent/discriminant correlation analyses were largely in line with the previous literature. When correlations did not conform to expectations, the observed correlation value typically fell only slightly outside the expected range and indicated a stronger association than anticipated; review of the original predictions suggests that relationships may have been underestimated given the CM population. The validity of the HIT-6 scores was also supported in the form of convergent and discriminant validity analyses during its initial validation using on online sample of adults (18–65) that self-reported a headache in the past four weeks not due to illness, injury or hangover [
13], where, as expected, HIT-6 scores correlated negatively with all subscale and component scores of the 8-item short-form health survey (SF-8), with magnitudes ranging from small to moderate. HIT-6 summed scores also correlated strongly and positively with scores from an adaptive administration of HIT items and IRT-based scores derived from 34 items of the HIT item bank [
13]. Subsequent studies using a variety of headache patient samples found HIT-6 total scores to be associated, as expected, with numerous other migraine-specific PROM scores as well as with general health and HRQoL measures and with objective headache and migraine outcomes [
12,
13,
17,
19,
45,
50‐
53].
Results of the known-groups analyses supported the validity of the HIT-6 total scores, in line with previous evaluations [
5,
12,
13,
15,
19]. In the initial HIT-6 publication, individuals reporting more severe pain generally demonstrated significantly higher HIT-6 total scores [
13]. Other studies have indicated that mean HIT-6 scores significantly increased according to headache diagnosis (non-migraine < EM < CM) [
5,
12]. Moreover, in agreement with our data, previous publications have reported that HIT-6 total scores show sensitivity to change in patients with migraine [
45,
54‐
57]. In a clinical trial investigating erenumab injections in patients with EM, active treatment reduced mean MMDs and days with acute migraine medication use relative to placebo [
58]; HIT-6 total score data mirrored these results, with the treatment groups demonstrating statistically larger decreases from baseline relative to placebo [
54]. In the PREEMPT clinical trials of onabotulinumtoxinA in patients with CM [
55‐
57], the HIT-6 was employed as a secondary outcome and demonstrated a statistically significant reduction in mean scores from baseline to week 24, favoring active treatment. In the same study’s open-label phase (in which previously placebo-treated patients received active treatment), the HIT-6 total scores retained the demonstrated decrease from baseline but the differences between treatment groups were no longer statistically significant, as would be expected.
The HIT-6 appears to be a valuable tool for measuring headache impact in patients with CM in a clinical setting, and additional studies are warranted to empirically evaluate and develop threshold(s) for clinically meaningful change (responder definitions) in individuals with CM to help facilitate clinical decision making. Psychometric analyses should also be undertaken to test whether the measurement properties of the HIT-6 are equivalent across different headache groups, such as EM. Although the current study had several strengths—including the large sample size, rigorous psychometric modeling, evaluation of item characteristics, and assessment of reliability—there were limitations as well. The most notable is that the data were from a clinical trial, and thus comprised a more homogeneous sample than the general patient population due to enrollment criteria, potentially limiting the generalizability of the data. The impact of this homogeneity was evident in the screening and baseline HIT-6 scores that resulted in what were likely attenuated estimates of test–retest reliability; we recommend that this be re-examined in a prospective observational study to examine the accuracy of this supposition and provide a more complete understanding of the psychometric soundness of the HIT-6 in the CM population.
Acknowledgements
This study was funded by H. Lundbeck A/S (Copenhagen, Denmark). The authors thank Nicole Coolbaugh, CMPP, of The Medicine Group, LLC (New Hope, PA, United States) for providing medical writing and editorial support, which was funded by H. Lundbeck A/S (Copenhagen, Denmark) in accordance with Good Publication Practice guidelines.
Compliance with ethical standards
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.