Introduction
Celiac disease (CeD) is an autoimmune condition affecting at least 3 million people in the U.S. and 48–300 million worldwide [
1,
2]. In CeD, ingestion of gluten, a protein found in wheat and some other grains, prompts an autoimmune response that causes damage to the structure and function of the small intestine. CeD often presents with aversive gastrointestinal symptoms and extraintestinal symptoms that include headache, fatigue, skin manifestations, neurologic conditions, and psychiatric conditions [
3]. CeD is more prevalent in women than men worldwide [
2,
4] and is diagnosed more often in non-Hispanic Whites than other racial/ethnic groups in the U.S. [
5].
The only available treatment for CeD is to consume a strict, lifelong gluten-free diet (GFD), which often requires significant changes to one’s diet, increases the cost of food, and impacts functioning in multiple life domains. Individuals with CeD report high treatment burden [
6] and negative impacts to quality of life [
7,
8]. Lower quality of life is associated with persisting physical symptoms despite GFD adherence [
9], greater depression symptoms [
10], and presence of psychiatric, neurologic, and/or gastrointestinal co-morbidities [
11,
12]. Lower quality of life is also cross-sectionally and longitudinally related to lower GFD adherence [
7,
12,
13]. Findings suggest that increasing GFD adherence may improve quality of life, and conversely, improving quality of life may increase GFD adherence. Thus, quality of life is important to assess in addition to physical symptoms and biomarkers of CeD pathology, and may be key to ensuring the highest treatment adherence and best clinical outcomes for people with CeD.
Generic health-related quality of life measures may lack sensitivity and specificity for identifying treatment needs and capturing response to treatment [
14,
15] and may not be psychometrically invariant across conditions [
16]. Condition-specific quality of life measures have been developed, including the Celiac Disease Quality of Life Survey (CD-QOL) [
17], Celiac Disease Assessment Questionnaire [
18,
19], and Celiac Disease Questionnaire [
20], among which there is some conceptual overlap (e.g., social and emotional impacts, disease concern, stigma). However, the CD-QOL is unique in that it does not assess physical symptoms, in part because participants in the development samples did not report symptoms as salient concerns, and in part due to empirical findings that quality of life in CeD is more strongly related to psychological and social functioning than symptom burden [
10,
21], and changes in quality of life can occur over time despite no change in gastrointestinal symptoms [
7]. Further, an estimated 21% of people with CeD are asymptomatic, and may experience negative impacts to quality of life for reasons other than symptoms [
4]. Thus, a CeD-specific quality of life instrument that does not confound symptom burden may be highly appropriate for screening and outcomes measurement in clinical settings and behavioral research.
The CD-QOL was developed in the U.S. in the English language. Exploratory factor analysis found that its 20 items yielded four independent factors (“subscales”): (1) functional impact (“limitations”), (2) stigma and mood (“dysphoria”), (3) “health concerns,” and (4) perceptions of “inadequate treatment.” Additionally, developers provided initial evidence of internal consistency reliability, convergent validity, and known-groups validity. However, its four-factor structure and psychometric properties have not been evaluated in a separate U.S. sample as is best practice. Additionally, the developers and subsequent researchers have scored the CD-QOL using a total score, though support for a total score has not been demonstrated through factor analysis. Therefore, research is needed to establish the English CD-QOL as a reliable and valid measure of CeD-specific quality of life in the U.S., and to determine whether it is most appropriately scored as four subscales, a total score, or both. To address these critical gaps, the present study aimed to (1) examine the factor structure of the English CD-QOL using confirmatory factor analysis, and (2) assess psychometric properties of CD-QOL scores, including internal consistency reliability, convergent validity, known-groups validity, and incremental validity.
Methods
Participants and procedures
Participants were recruited to complete questionnaires as part of the iCureCeliac® patient-powered research network hosted by the Celiac Disease Foundation through the Celiac Disease Foundation newsletter or website. Questionnaires were completed at one timepoint, on a voluntary basis, between April 2019 and May 2020. All participants provided informed consent. Questionnaire participants were allowed to select from multiple diagnostic category options, including CeD, other gluten-related disorder, not diagnosed with a gluten-related disorder, and more. For the present analyses, only participants aged 18 years or older who reported a diagnosis of CeD made by biopsy, serology, or genetic testing, and their country of origin as ‘United States’ were selected.
The original dataset included N = 1269 participants, of whom n = 1152 were aged 18 or older, n = 1189 reported a diagnosis of CeD, and n = 1077 reported their country of origin as the U.S. When selecting on these inclusion criteria simultaneously, the resulting database included n = 913. Of those participants, n = 460 did not attempt the CD-QOL. When these cases were removed, the resulting database had n = 453, of which n = 23 were missing some data. This dataset of n = 453 was used for CFA. Of the n = 453 used for CFA, n = 138 did not complete the SF-36, PROMIS measures, CSI, and/or CDAT, leaving n = 315 for additional psychometric analyses.
Statistical analyses
Confirmatory factor analysis (CFA) was conducted in MPlus version 8 [
27] using the ‘MLR’ estimator, which provides maximum likelihood parameter estimates with standard errors robust to non-normality and is appropriate when some cases include missing data. CFA and single-item measure analyses were conducted on the total sample (
N = 453). Additional psychometric analyses were conducted on the subsample with complete data (
n = 315).
Confirmatory factor analysis
CFA was used to examine the absolute fit of four models: (1) the original four-factor structure, (2) a second-order factor structure with four first-order factors and one global factor, (3) a bifactor model with one general factor and four group factors, and (4) a one-factor structure. Model fit was evaluated using the following indices: (a) Chi-square goodness-of-fit (
χ2), (b) Comparative Fit Index (CFI > 0.90 acceptable, and > 0.95 desirable [
28]), (c) Tucker-Lewis Index (TLI > 0.90 acceptable, and > 0.95 desirable [
28]), (d) Root Mean Square Error of Approximation (RMSEA < 0.05 good fit; < 0.08 acceptable fit; < 0.10 poor fit [
29,
30]) using a 90% confidence interval, and (e) Standardized Root Mean Square Residual (SRMR < 0.05 good fit, and < 0.08 acceptable fit [
28]). Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC) were also used to compare models, where lower values indicate better model fit. Ancillary measures were calculated to evaluate dimensionality and model-based reliability for the bifactor model only [
32]: Explained Common Variance (ECV ≥ 0.85 suggests unidimensionality [
33‐
35]), Percent of Uncontaminated Correlations (PUC > 0.70 suggests unidimensionality when ECV > 0.70 [
36]), Average Relative Parameter Bias (ARPB < 10–15% acceptable [
34]), McDonald’s omega, OmegaH (> 0.50 acceptable and > 0.75 desirable [
35,
37]), and H (> 0.80 desirable [
38]).
Reliability
Internal consistency reliability for CD-QOL total and three of four subscale scores was assessed using (a) Cronbach’s alpha (
α > 0.80 good fit, and > 0.70 minimally acceptable fit [
31]) and (b) McDonald’s omega, which assumes neither equivalent factor loadings nor normal distribution among scale items, and is therefore less prone to underestimation of composite reliability [
39]. Because the ‘inadequate treatment’ subscale has only two items, alpha may underestimate their true relationship, and a Pearson correlation coefficient was calculated instead.
Validity
Convergent validity was assessed by computing Spearman’s rho correlation coefficients for CD-QOL total and subscale scores and scores on the SF-36 scales, CDAT (GFD adherence), PROMIS anxiety and depression scales, occupational functioning items, and CSI (physical symptoms). Coefficients r = 0.00–0.39 were considered small, r = 0.40–0.69 were considered moderate, and r = 0.70–1.00| were considered large. Because CD-QOL items assess social limitations, emotional concerns, and cognitive concerns rather than physical aspects, we hypothesized moderate negative correlations between CD-QOL total and SF-36 social functioning, emotional well-being, and general health subscale scores, and small negative correlations between CD-QOL total and SF-36 physical functioning, role limitations due to physical and emotional problems, energy/fatigue, and bodily pain, such that worse CeD-specific quality of life would be related to worse generic quality of life.
We expected a moderate positive correlation between CD-QOL total and CDAT total, such that worse quality of life would be related to lower GFD adherence. We hypothesized moderate positive correlations between CD-QOL total and PROMIS scale scores, such that worse quality of life would be related to greater anxiety and depression symptoms. We also hypothesized moderate positive correlations between CD-QOL total and occupational functioning, such that worse quality of life would be related to more days missed (i.e., worse functioning). We also expected CD-QOL dysphoria subscale scores to be more strongly related to measures of mental health (PROMIS anxiety and depression, SF-36 emotional well-being) than other CD-QOL subscales. Because prior research suggests that the relationship between CeD-specific quality of life and physical symptoms may be limited, we expected a small, positive correlation between CD-QOL and CSI scores, with worse quality of life related to greater symptom burden.
Known groups validity was assessed by grouping participants according to established cut-off scores for GFD adherence on the CDAT and examining mean group differences in CD-QOL total score using analysis of variance and planned pairwise comparisons with Bonferroni corrections. We expected significantly greater mean CD-QOL scores (i.e., worse quality of life) among those with poor GFD adherence compared to those with good GFD adherence. Known groups validity was further assessed using independent samples t-tests to examine group differences in mean CD-QOL total score between participants (a) reporting any inadvertent gluten exposure and those reporting none, (b) reporting any intentional gluten exposure and those reporting none, and (c) endorsing an ability to follow a GFD when dining outside the home compared to those reporting inability. In each comparison we hypothesized significantly greater CD-QOL scores among the less adherent groups. Finally, mean CD-QOL total scores were compared between participants (d) endorsing persisting symptoms despite GFD adherence and those not endorsing, and (e) reporting significantly improved health since CeD diagnosis (very much, quite a bit) and those who did not (no, a little, somewhat). We hypothesized significantly higher CD-QOL scores among those reporting persisting symptoms and little to no improvement in health.
Incremental validity of the CD-QOL for predicting concurrent GFD adherence (CDAT scores) over and above a generic health-related quality of life measure (SF-36) was examined using hierarchical linear regression. Select SF-36 scales were entered in step 1 and CD-QOL total score was entered in step 2. Three SF-36 scales were used given their conceptual overlap with domains assessed by the CD-QOL: emotional well-being, social functioning, and general health. Models with and without CD-QOL total score were compared to determine change in total variance explained (R2).
Discussion
The aims of the current study were to confirm the factor structure and examine the psychometric properties of the English language CD-QOL among adults with CeD in the U.S. Previous work using exploratory factor analysis identified a four-factor solution. We extended prior work by examining a second-order hierarchical structure and a bifactor structure to address whether the measure can be appropriately scored with a raw summed total score. Of the various models tested, the bifactor model showed superior model fit. Ancillary bifactor analyses suggested that the CD-QOL can be considered a primarily unidimensional instrument, assessing a general latent factor of CeD-specific quality of life, and that the total score is more reliable than the subscale scores. Though ancillary analyses confirmed some multidimensionality of the instrument and subscale reliability indices were mixed, the subscale scores, especially for limitations, do not appear to narrowly measure the proposed specific factors and do not provide substantial interpretive value beyond what is provided by the total score. It is therefore recommended that researchers and clinicians in the U.S. using the English CD-QOL consult the total score rather than subscale scores to assess CeD-specific quality of life. Bifactor results suggested that ‘limitations’ is the least reliable and robust subscale. Item loadings to that factor were generally weaker or non-existent compared to loadings on the general factor, a situation known as “factor collapse” [
41]. This finding suggests that themes among these items may best characterize the general factor, such as social stigma, social exclusion, and fear about or preoccupation with food.
In terms of the psychometric properties of the CD-QOL total score and subscale scores, the total score and three subscale scores demonstrated good to acceptable internal consistency reliability. The two items in the inadequate treatment subscale were moderately correlated and evidenced good factor loadings; however, additional items may be needed to better operationalize this subscale. Convergent and known groups validity were supported. The pattern of relationships between the CD-QOL and SF-36 suggest that the CD-QOL assesses specific aspects of quality of life that it purports to measure, and these constructs are related to but not redundant with generic health-related quality of life constructs. Incremental validity findings suggest that researchers and clinicians might choose to use either the generic SF-36 or CD-QOL to assess quality of life in adults with CeD. Selecting one or both measures may depend on the purpose [
42‐
44]. The CD-QOL assesses aspects of functioning and well-being that may not be captured by generic measures and could be important indicators of treatment needs, such as CeD-specific social concerns, food concerns, health concerns, and affect impairments, which can be targeted with behavioral interventions. The CD-QOL is notably shorter than the SF-36, which may reduce burden on both administrators and respondents.
The present study addressed an important gap in the literature. However, our findings may not generalize to all adults with CeD in the U.S. Participants in the present study were self-selected and represent a population with access to the internet, knowledge of how to find relevant health information, willingness to be part of the research community, and capacity to complete online questionnaires. Additionally, because the iCureCeliac database did not inquire about current location, we have assumed that participants who identified their country of origin as the U.S. were living in the U.S. at the time of survey completion, which we were unable to verify. The original CD-QOL items were developed and refined using feedback from mostly White, mostly female patient groups, and as such, item wording or response options may not represent the construct adequately in other groups. Most participants in the current study also identified as female and White, which generally reflects characteristics of the U.S. CeD patient population [
5,
45,
46], but may limit generalizability to other patient groups [
47].
Researchers should seek to validate CD-QOL scores among individuals of more diverse gender identifications, socioeconomic resources, and racial and ethnic backgrounds. Cross-cultural research with translated versions of the CD-QOL has identified alternative factor structures and subscale composition [
14,
48‐
50], suggesting that the presently supported factor structure may not generalize to other countries and cultures and should be evaluated carefully. Further, samples in the original validation studies and the present study had been diagnosed for an average of nine and six years, were mostly diagnosed as adults, and reported relatively high GFD adherence. The validity and utility of the CD-QOL among newly diagnosed individuals or those expressly struggling with GFD adherence should be assessed.
Another possible limitation of the current study is use of self-report measurement, including for CeD diagnostic status. Individuals were invited to participate in the iCureCeliac® registry if they had a “gluten-related condition,” which includes but is not limited to CeD. To reduce demand characteristics to report a diagnosis of CeD if no diagnosis had been made, participants were invited to select from multiple diagnostic category options (e.g., CeD, other gluten-related disorder, not diagnosed with a gluten-related disorder), and were invited to participate in the registry regardless of their response. Only participants who reported a diagnosis of CeD made by biopsy, serology, or genetic testing were included in the present analyses. Notably, the genotype HLA-DQ2 or HLA-DQ8 is a necessary but insufficient condition for diagnosis of CeD. Therefore, our inclusion of n = 8 participants (< 2%) who reported a diagnosis of CeD by genetic testing introduces the possibility that those individuals do not meet criteria for CeD diagnosis, which may impact findings. Finally, our analyses were confined by the cross-sectional nature of the data. Future studies should capture longitudinal data to examine the CD-QOL’s test–retest reliability and sensitivity to change, which will provide information about its utility for screening and outcomes assessment purposes.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.