Skip to main content
Top
Gepubliceerd in: Quality of Life Research 3/2024

Open Access 21-12-2023

Unsupervised item response theory models for assessing sample heterogeneity in patient-reported outcomes measures

Auteurs: Tolulope T. Sajobi, Ridwan A. Sanusi, Nancy E. Mayo, Richard Sawatzky, Lene Kongsgaard Nielsen, Veronique Sebille, Juxin Liu, Eric Bohm, Oluwagbohunmi Awosoga, Colleen M. Norris, Stephen B. Wilton, Matthew T. James, Lisa M. Lix

Gepubliceerd in: Quality of Life Research | Uitgave 3/2024

share
DELEN

Deel dit onderdeel of sectie (kopieer de link)

  • Optie A:
    Klik op de rechtermuisknop op de link en selecteer de optie “linkadres kopiëren”
  • Optie B:
    Deel de link per e-mail
insite
ZOEKEN

Abstract

Purpose

Unsupervised item-response theory (IRT) models such as polytomous IRT based on recursive partitioning (IRTrees) and mixture IRT (MixIRT) models can be used to assess differential item functioning (DIF) in patient-reported outcome measures (PROMs) when the covariates associated with DIF are unknown a priori. This study examines the consistency of results for IRTrees and MixIRT models.

Methods

Data were from 4478 individuals in the Alberta Provincial Project on Outcome Assessment in Coronary Heart Disease registry who received cardiac angiography in Alberta, Canada, and completed the Hospital Anxiety and Depression Scale (HADS) depression subscale items. The partial credit model (PCM) based on recursive partitioning (PCTree) and mixture PCM (MixPCM) were used to identify covariates associated with differential response patterns to HADS depression subscale items. Model covariates included demographic and clinical characteristics.

Results

The median (interquartile range) age was 64.5(15.7) years, and 3522(78.5%) patients were male. The PCTree identified 4 terminal nodes (subgroups) defined by smoking status, age, and body mass index. A 3-class PCM fits the data well. The MixPCM latent classes were defined by age, disease indication, smoking status, comorbid diabetes, congestive heart failure, and chronic obstructive pulmonary disease.

Conclusion

PCTree and MixPCM were not consistent in detecting covariates associated with differential interpretations of PROM items. Future research will use computer simulations to assess these models’ Type I error and statistical power for identifying covariates associated with DIF.
Opmerkingen

Supplementary Information

The online version contains supplementary material available at https://​doi.​org/​10.​1007/​s11136-023-03560-5.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Introduction

Patient-reported outcomes measures (PROMs) are multi-item questions that elicit patients’ appraisals of their health status and quality of life [1, 2]. PROMs are useful for evaluating treatment efficacy in clinical trials from a patient perspective and comparing population groups for quality improvement [35]. Comparing PROM scores among population subgroups relies on the assumption that the measurement model, which describes the relationship between the observed items and the latent construct being measured, is equivalent across these subgroups [6, 7]. This is generally of interest when PROMs are used in potentially heterogeneous populations where respondents may differ in how they interpret and respond to questions about their health and quality of life, a phenomenon known as differential item functioning (DIF). DIF arises when heterogeneity in interpretation and response to the PROM questions are associated with patient characteristics unrelated to the construct of interest being measured [8]. When DIF is ignored in PROM items, the estimated distribution of the PROM scores across population subgroups is biased. Failure to account for DIF in PROM items could affect inferences about PROM scores and their use for supporting decisions in healthcare [810]. For example, if patient subgroups consistently provide lower ratings on items of a depression PROM than other subgroups based on their socio-demographic characteristics, this could result in biased estimates of the between-group difference in PROM scores. Incorrect inferences about the meaning of the PROM scores can arise and affect clinical and health policy decisions. This, in turn, could lead to missed opportunities to address pertinent health issues for patients during routine physician visits and reduced access to mental health services.
Existing methods to test for DIF in PROM are mainly group-based methods that assume potentially relevant differences in the target populations are known a priori and can be explained by observed variables such as socio-demographics or health status [1116]. Also, these multigroup methods evaluate DIF in PROMs items one observed variable at a time. Applying these methods to test for DIF in PROM items in heterogeneous populations where unknown or multiple interacting variables could explain DIF may become onerous with an increasing number of variables.
Unsupervised item response theory (IRT) [1621] models, which combine IRT models with unsupervised learning methods (e.g., recursive partitioning or mixture models), are an alternative class of IRT models that overcome this limitation by identifying subgroups of patients with different patterns of DIF when patient characteristics associated with DIF are not known a priori. These models include IRT models based on the recursive partitioning method (IRTree) and mixture IRT (MixIRT) models. MixIRT model, first proposed by Rost [17], combines latent class models with an IRT modeling framework to identify latent classes across which the IRT parameters are non-invariant. MixIRT models have also been applied to test for DIF [2225] but can be challenging to implement because of model identification issues [20]. On the other hand, IRTree models such as the Rasch trees [18], polytomous Rasch trees [19], and item-focused trees [20, 21], have been developed to identify DIF items when the variables associated with DIF are not known a priori. With these methods, there is no need to specify variables associated with DIF a priori because they are automatically detected using a data-driven approach.
To date, there has not been any investigation of the comparative performance of IRTree and MixIRT models for detecting DIF in PROMs. The aim of our study was to investigate the consistency of results for these two models. Since these two methods differ in their approach to evaluating MI, we hypothesize that these two methods will be consistent in detecting the presence of heterogeneity but will differ with respect to the number of homogeneous subgroups identified. The manuscript is organized as follows. “Methods” section describes these models and compares their statistical properties. “Numeric example” section applies these models to data from a clinical registry of patients with coronary artery disease who received cardiac angiograms. “Discussion” section discusses the methodological implications of the study findings, the strengths and limitations of the methods, and opportunities for further research.

Methods

Partial credit model

Consider a partial credit model (PCM) [26], a polytomous model commonly used for modeling ordinal data, including items comprising PROMs. Let \({Y}_{im}\) denote the \(i\) th individual’s response to the \(m\) th item. The PCM is defined as,
$$P\left( {Y_{im} \ge j| \tau_{mj} ,\theta_{i} } \right) = \frac{{e^{{ - \left( {\tau_{mj} - \theta_{i} } \right)}} }}{{1 + e^{{ - \left( {\tau_{mj} - \theta_{i} } \right)}} }},$$
(1)
where \(P\left({Y}_{ijm}\ge j| {\tau }_{mj},{\theta }_{i}\right)\) is the \(i\) th individual’s probability of response \(j\)(\(j\) = 1,…,\(J\)) on the \(m\) th (\(m\) = 1,2,…,\(M\)) item, \({\tau }_{mj}\) denotes the threshold between the (j−1)th and jth category (j = 1,…, J) for the \(m\) th item, and \({\theta }_{i}\) is the \(i\) th patient’s latent factor score, which is often assumed to be distributed as \({\theta }_{i} \sim N\)(0,1). While this study considered the PCM, tree-based and mixture models can be generalized to other polytomous IRT models [27].

Tree-based partial credit model (PCTree)

The PCTree is an unsupervised latent variable model that combines the PCM and recursive partitioning to identify subgroups for which the PCM parameters differ. That is, the PCTree uses input covariates to repeatedly partition the entire sample into homogenous subgroups with respect to the model parameters. Komboz et al. [20] developed a 4-step approach for implementing a PCTree [18]:
1.
In Step 1, the PCM is fitted to the entire sample, and the model parameters are estimated via conditional likelihood estimation.
 
2.
In Step 2, the stability of item threshold parameters is assessed for each covariate by conducting structural change tests. Each structural change test involves ordering the contributions of each study respondent to the joint loglikelihood score function of the PCM model for each covariate. DIF is detected, for a covariate, if the ordering of the structural change test statistics for all possible cut-points on that covariate exhibits a systematic change in the individual deviations.
 
3.
In Step 3, among all model covariates, the covariate with the smallest p-value for the structural change test is selected for splitting the entire sample into two subgroups (i.e., child nodes). After a covariate has been selected for splitting, the optimal cut-point on this covariate is determined by maximizing the partitioned loglikelihood (i.e., the sum of the loglikelihoods for two separate models: one for the observations to the left and up to the cut-point, and one for the observations to the right of the cut-point), over all potential (\(r\)–1) cut points, where \(r\) is the number of possible values on a covariate. For categorical values, there are \(r\)–1 cut points.
 
4.
In Step 4, Steps 1–3 are repeated recursively in the child nodes until one of two stopping criteria is reached:
I.
Bonferroni correction criterion recursive partitioning of the sample stops if no further significant parameter instability exists for any covariates across all subgroups. Given that multiple structural change tests could result in an inflated familywise Type I error, a Bonferroni correction is applied to α, such that \({\alpha }{\prime}={\alpha }{\prime}/m\), where \(m\) = number of tests conducted.
 
II.
Minimum terminal node size criterion this involves pre-specifying a minimum sample size for each terminal node. A recommended simple rule of thumb is to set the minimum node size to be 10 times the average number of parameters per item.
 
 

The mixture PCM

The mixture PCM (MixPCM) [17] aims to uncover heterogeneity by allowing model parameters to vary across two or more latent classes23 such that:
$$P\left( {Y_{ijm} \ge j| \tau_{mjc} ,C = c,\theta_{i} } \right) = \frac{{e^{{ - \left( {\tau_{mjc} - \theta_{i} } \right)}} }}{{1 + e^{{ - \left( {\tau_{mjc} - \theta_{i} } \right)}} }},$$
(2)
where the unconditional probability of response j to the \(m\) th item (irrespective of class membership is.
$$\sum\nolimits_{c = 1}^{C} {\pi_{c} P\left( {Y_{ijm} \ge j| c, \tau_{mjc} , \theta_{i} , C = c} \right)} ,$$
(3)
where it is assumed that \({\theta }_{i}\sim N(\mathrm{0,1})\) is the latent trait level for the \(i\) th patient i, \({\tau }_{mjc}\) denotes the threshold between the (j-1)th and jth category (j = 1,…, J) for the \(m\) th item in the \(c\) th class, and \({\pi }_{c}\) is the mixing proportion that defines the relative sizes of the latent classes, and can be explained by sample characteristics (e.g., demographic, or clinical characteristics) such that \(\sum_{c=1}^{C}{\pi }_{c}=1\).
The MixPCM is implemented using a four-step approach:
1.
In Step 1, a one-class PCM, which assumes no heterogeneity, is fit to the data. The tenability of the unidimensionality assumption can be assessed using exploratory factor analysis using polychoric correlation with GEOMIN rotation [2830] or parallel analysis [31]. The unidimensionality assumption is considered satisfied if the ratio of the first and second eigenvalues is greater than 3. If unidimensionality is not a tenable assumption, then MixPCM is not appropriate for testing sample heterogeneity in the data. If the assumption of unidimensionality is satisfied, proceed to step 2.
 
2.
For Step 2, specify MixPCM with increasing numbers of latent classes by allowing the PCM threshold parameters to vary across the latent classes while the latent factor means and standard deviations are constrained to be equal for identifiability purposes.
 
3.
In Step 3, determine the optimal number of latent classes for the MixPCM using the Bayesian Information Criterion (BIC) [32, 33], Vuong-Lo-Mendel-Rubin likelihood ratio test (VLMR)[3, 34], bootstrap likelihood ratio test, and model entropy. The VLMR is used to compare the goodness of fit of models with k, and (k + 1) latent classes; a non-significant VLMR test (p > 0.05) prefers the model with the smaller number. Model entropy is used to assess the certainty of class membership (values > 0.8 indicate high confidence in latent class assignment [35]). For the BIC, the optimal model has the smallest BIC value.
 
4.
For the final step, the association of covariates with the estimated latent class membership is explored either via a one-step approach or a three-step approach [35, 36]. In the former, the known covariates are incorporated into the mixture IRT modeling to estimate the posterior probability of latent class membership, conditional on the covariates. The effects of the covariate on class membership are estimated simultaneously, along with the class-specific item parameters. The MixIRT modeling estimates the posterior probability of latent class membership based on the item response data in the three-step approach. In the second step, the class membership is derived based on the most probable posterior probability of class assignment. In the third step, the covariate effects on class membership are estimated using multinomial logistic regression with pseudo draws to account for imperfect classification is used to estimate the covariate effects.
 

Numeric example

Data source

The consistency between the MixPCM and PCTree was examined by analyzing existing population-based data. Data were from the Alberta Provincial Project for Outcome Assessment in Coronary Heart Disease (APPROACH) registry, a population-based database of all adults who received cardiac catheterization in Alberta, Canada [37]. The APPROACH registry maintains one of the most comprehensive data repositories of individuals with coronary artery disease (CAD). The registry includes detailed data on patients’ demographic and clinical characteristics. This registry was chosen because (1) it is made up of heterogeneous CAD patients with varying degrees of CAD severity, different types of treatments received, different experiences with the healthcare system, and diverse demographic and behavioral characteristics, and (2) collects both generic and cardiac-specific patient-reported HRQOL measures. The Hospital Anxiety and Depression Scale (HADS) was selected as a PROM to be investigated for potential DIF effects. Our choice of the HADS for this study was motivated by the unidimensional nature of the HADS subscales (i.e., anxiety and depression subscales) and its excellent psychometric properties for screening for depression in individuals with CAD [38, 39]. The HADS is a self-administered 14-item generic measure of psychological distress comprising two subscales: depression and anxiety [40]. The response options for the HADS items range from zero to three: higher scores indicate more severe depression and/or anxiety. We limited our attention to the depression subscale items.
The study cohort included all adult Alberta residents who (1) underwent a first cardiac catheterization between January 1, 2002, and December 31, 2017, (2) had at least 1-vessel CAD (Duke Coronary Index between 3 and 13), and (3) completed the HADS two weeks after the procedure. In addition to the HADS, data were collected on demographic characteristics (sex, age), multiple comorbid conditions, disease severity, and coronary angiography results. Ethics approval for this study was obtained from the University of Calgary Conjoint Health Research Ethics Board (REB15-1195).

Statistical analyses

Descriptive statistics were used to summarize the patient’s demographic and clinical characteristics. The assumption of the unidimensionality for the depression items of the HADS was evaluated using parallel analyses and several goodness-of-fit statistics [30, 4145], including the information-weighted fit mean square error statistic (Infit MNSQ), outlier-sensitive fit statistic (Outfit MNSQ), root mean square error of approximation (RMSEA), comparative fit index (CFI), and standardized root mean square residual (SRMSR). An item with infit MNSQ or outfit MNSQ outside the 0.5–2.0 range is considered a misfit to the PCM [42].
The PCTree and MixPCM were used to identify subgroups of patients with different patterns of DIF or no DIF. Patients’ socio-demographics [sex and age (< 75 years vs\(.\ge\) 75 years)] and clinical characteristics (procedure indication, smoking status, body mass index (BMI), and comorbid conditions) were selected as covariates. Several studies have examined the presence of DIF in HADS items for patient’s demographic characteristics, such as age and sex [4648]. In particular, previous studies have reported age differences in quality of life and risk of adverse health outcomes in elderly (\(\ge\) 75 years) heart disease patients compared to younger (< 75 years) patients [4951]. Although there is a limited investigation of DIF in patient-reported HADS item responses with respect to their clinical and disease characteristics, these patient characteristics are known risk factors for depressive symptoms in CAD patients [5154].
For the PCTree model, the minimum sample size for each terminal node was set at 250 as a stopping criterion for the recursive partitioning, which also allows for a sufficient sample size for within-node parameter estimation [20]. To facilitate comparability of the models, the covariates were simultaneously incorporated into the MixPCM to estimate class-specific model parameters and the effects of the covariates on latent class membership. Finally, for each method, multinomial logistic regression models were used to test the covariates (i.e., patients’ demographic and disease characteristics) associated with the identified subgroups.
The PCTree analysis and other analyses were implemented in R software [55], while the MixPCM was implemented in Mplus v8.1 [56]. Statistical significance for the analyses was set at \(\alpha\)= 0.05, except when stated otherwise.

Results

Table 1 describes the patient characteristics. Of the 4478 patients who completed the HADS, 3522 (78.7%) were male, and 815 (18.2%) were 75 years or older. The majority of patients (69.3%) had acute coronary syndrome as the clinical disease. Hypertension and hyperlipidemia were the most frequent comorbid conditions. About 75% of patients endorsed “often,” on ‘I can laugh and see the funny side of things’ and ‘I can enjoy a good book or radio or TV program’ items. In contrast, less than 5% of the patients endorsed “very seldom,” on “I can laugh and see the funny side of things”, “I look forward with enjoyment to things”, or “I can enjoy a good book or radio or TV program” (Online Table A1). Given that there were a number of sparse response categories, those categories endorsed by less than 1.5% of the sample were merged with the adjacent response categories.
Table 1
Characteristics of the study cohort (N = 4478)
Characteristic
N(%)
Sex (male)
3522(78.7)
Age (\(\ge\) 75 years)
815(18.2)
Body mass index (median, IQR)
28.1(6.0)
Procedure indication
 Acute coronary syndrome
3102(69.3)
 Stable angina
1376(30.7)
Complex CAD (left main & 3-vessel disease)
1222(27.3)
Current smoker
1062(23.7)
Comorbid conditions
 Diabetes
218(4.9)
 Prior myocardial infarction
448(10.0)
 Chronic obstructive pulmonary disease
594(13.3)
 Hypertension
3136(70.0)
 Peripheral vascular disease
311(6.9)
 Congestive heart failure
256(5.7)
 Hyperlipidemia
3423(76.4)
 Cerebrovascular disease
214(4.8)
IQR interquartile range, CAD coronary artery disease
The conventional one-class PCM provided a good fit for the data. Specifically, the item Infit MNSQ and Outfit MNSQ values were well within the recommended 0.5–2.0 interval (Online Table A2). Additionally, parallel analysis reveal a dominant principal factor; the ratio of the first and second principal factors was approximately 30.2 and acceptable RMSEA, CFI, and SRMSR values, suggesting that the assumption of unidimensionality of the HADS depression items was satisfied (Online Tables A2 & A3).
The PCTree identified four terminal nodes (i.e., subgroups) of patients defined by the interaction among smoking status, age, and BMI (Fig. 1). The entire sample was first partitioned using the smoking status variable, indicating that this was the most important variable that explained sample heterogeneity in the HADS depression subscale items. The first terminal node, which accounted for 23.7% of the sample, consisted of current smokers. The second terminal node (16.9%) included non-smokers older than 75. The third terminal node (20.8%) was comprised of older (i.e., > 75 years) non-smokers with BMI > 30.4, while the final terminal node (38.5%) consisted of patients at most 75 years and non-smoking with BMI \(\le\) 30.4. The region plots in these terminal nodes of the PCTree model in Fig. 1 show patterns of differences in the HADS items and item response categories for which patients had inconsistent patterns of responses. For example, for item #2 (“I can laugh and see the funny side of things”), the region of the second category, shaded in the second darkest gray color, was largest for patients who are smokers and lowest for non-smoking patients who are < 75 years and with a BMI > 30.422. Similarly, for item 2 (I feel cheerful), the region of the second category, shaded in the second darkest gray color, was largest for smokers and lowest for non-smoking patients < 75 years. Results from multinomial logistic regression analysis revealed that the variance inflation factors were all < 5, which indicates the absence of multicollinearity among the covariates. Significant differences exist among the terminal nodes with respect to sex, procedure indication, disease complexity, diabetes, hyperlipidemia, myocardial infarction, cerebrovascular disease, chronic obstructive pulmonary disease (COPD), and hypertension (Table 2).
Table 2
Adjusted odds ratio [95% confidence interval] for PCTree model subgroups and patient characteristics
Patients’ characteristic
Subgroup 2 vs. 1
Sublgroup 3 vs. 1
Subgroup 4 vs. 1
Sex (Female)
0.77[0.61, 0.96]*
1.07[0.86, 1.33]
1.27[1.04, 1.53]*
Age (> 75 years)
Body mass index (median, IQR)
Procedure indication (stable Angina)
0.46[0.37, 0.58]*
0.38[0.31, 0.47]*
0.38[0.31, 0.46]*
Current smoker
Complex CAD (Left main & 3-vessel disease)
1.89[1.53, 2.34]*
1.06[0.86, 1.31]
1.18[0.98, 1.42]
Comorbid conditions
 
 Diabetes
0.84[0.51, 1.39]
2.34[1.56, 3.51]*
1.17[0.77, 1.77]
 Prior myocardial infarction
2.10[1.55, 2.86]*
0.97[0.69, 1.35]
1.22[0.92, 1.63]
 Chronic obstructive pulmonary disease
0.70[0.53, 0.91]*
0.73[0.57, 0.94]*
0.44[0.35, 0.56]*
 Hypertension
1.90[1.52, 2.37]*
1.87[1.52, 2.29]*
1.05[0.89, 1.24]
 Peripheral vascular disease
0.67[0.47, 0.96]*
0.75[0.53, 1.04]
0.59[0.44, 0.80]*
 Congestive heart failure
2.20[1.52, 3.19]*
1.11[0.74, 1.66]
0.91[0.63, 1.33]
 Hyperlipidemia
0.77[0.62, 0.96]*
1.37[1.09, 1.72]*
0.93[0.78, 1.12]
 Cerebrovascular disease
1.90[1.23, 2.93]*
1.07[0.68, 1.70]
1.35[0.90, 2.03]
CAD coronary artery disease, PCTree tree-based partial credit model; age, body mass index, and current smoker were excluded as predictors since they were used to define the PCTree nodes
*\(p\)< 0.05
For the MixPCM, we fitted one-, two-, and three-class models to the data; models with more classes could not be fitted to the data due to model identification problems. A three-class model provided an optimal fit to the data based on the BIC and a VLMR test comparing two-class and three-class models (Table 3). The classes consisted of 1609 (36.0%), 2145(48.0%), and 715(16.0%) patients, respectively. The multinomial logistic regression models revealed significant differences among the classes on age, sex, smoking status, procedure indication, and comorbid conditions. Patients in class 2 had lower odds of presenting with stable angina, being current smokers, and having comorbid diabetes, prior myocardial infarction, COPD, congestive heart failure, and cerebrovascular disease than patients in class 1. Patients in class 3 had higher odds of being older (> 75 years) but lower odds of being current smokers, having COPD, and having cerebrovascular disease than patients in class 1 (Table 4).
Table 3
Fit statistics for MixPCM with 1 to 3 latent classes for the HADS depression subscale items (N = 4478)
Fit statistics
1-Class
2-Class
3-Class
2 × Loglikelihood
− 26,582.9
− 23,967.4
− 22,900.0
Bayesian information criterion
53,259.9
47,525.3
46,481.0
Entropy
0.86
0.79
Vuong–Lo–Mendel–Rubin likelihood ratio test (p-value)
 < 0.01
 < 0.01
Bootstrap likelihood ratio test
 < 0.01
 < 0.01
Class proportion
 Class 1
1.00
0.64
0.36
 Class 2
0.36
0.48
 Class 3
0.16
MixPCM mixture partial credit model, HADS hospital anxiety and depression scale
Table 4
Adjusted odds ratio [95% confidence interval] for three-class MixPCM and patient characteristics
Characteristic
Class 2 vs. 1
Class 3 vs. 1
Sex (female)
1.19[0.98, 1.49]
1.14[0.94, 1.45]
Age (> 75 years)
1.02[0.80, 1.31]
1.31[1.02, 1.69]*
Body mass index (median, IQR)
0.99[0.98, 1.00]
1.00[0.99, 1.01]
Procedure indication (stable angina)
1.21[1.15, 1.70]*
1.03[0.88, 1.32]
Current smoker
0.44[0.36, 0.53]*
0.60[0.49, 0.74]*
Complex CAD (left main)
0.89[0.73, 1.08]
1.01[0.82, 1.23]
Comorbid conditions
 Diabetes
0.61[0.41, 0.90]*
0.90[0.61, 1.32]
 Prior myocardial Infarction
0.59[0.44, 0.78]*
0.78[0.59, 1.03]
 Chronic obstructive pulmonary disease
0.70[0.55, 0.90]*
0.78[0.61, 0.99]*
 Hypertension
1.06[0.87, 1.29]
1.12[0.96,1.44]
 Peripheral vascular disease
0.90[0.64, 1.26]
1.13[0.80, 1.59]
 Congestive heart failure
0.65[0.46, 0.93]*
0.78[0.55, 1.12]
 Hyperlipidemia
1.06[0.87, 1.31]
1.12[0.90, 1.39]
 Cerebrovascular disease
0.50[0.35, 0.71]*
0.54[0.37, 0.78]*
CAD coronary artery disease, IQR interquartile range, MixPCM mixture partial credit model
*\(p\)< 0.05

Discussion

This study investigates the extent to which PCTree and MixPCM consistently identify patient covariates associated with different interpretations of HADS Depression items. Our analyses show that both models identified age and smoking status (i.e., whether a patient was a current smoker) as covariates associated with DIF. Overall, the PCTree model identified four subgroups of patients defined by smoking status, age, and BMI. However, MixPCM identified three latent classes defined by age, smoking status, procedure indication, and multiple comorbid conditions.
There are several similarities and notable differences in the properties of these two models and how they are operationalized to evaluate sample heterogeneity (Table 5). Both are similar concerning the underlying assumption of unidimensionality of the data, large sample size requirements, and unsupervised learning approaches for DIF detection. Unlike existing group-based methods designed to detect PROM items that exhibit DIF, these unsupervised latent variable models present a global approach for identifying individuals that exhibit DIF instead of the items that exhibit DIF. These methods are particularly of interest in routine clinical practice where PROMs data help inform clinical decisions (e.g., treatment strategies, goals of care, referral for additional services, and so on) about a patient’s care. Identifying individuals with a propensity for DIF can help clinicians contextualize each patient’s responses to PROMs, support shared decision-making, and inform the delivery of personalized disease management. However, these methods have notable differences. First, these models differ with respect to the evaluation of sample heterogeneity. The PCTree evaluates sample heterogeneity via recursive partitioning of the sample into independent homogeneous subgroups for which the PCM parameters are non-invariant using a set of covariates. MixPCM, on the other hand, evaluates sample heterogeneity by estimating the posterior probability of latent class membership for each individual so that the latent classes are non-invariant for the PCM parameters. Second, selecting the optimal number of latent classes in MixPCM is based on known goodness-of-fit statistics, whereas determining the final subgroups in PCTree depends on the likelihood ratio test used in determining optimal split across known covariates. LRT is known to be sensitive to study sample size [57]. Third, unlike the tree-based IRT model, which requires specifying a set of covariates as input variables, the MixPCM models can estimate the latent subgroups with and without specifying a set of covariates. Finally, there are notable differences in the computational requirements for implementing tree-based IRT models and mixture IRT models. Estimating latent classes from mixture IRT models can be computationally intensive as it involves sequentially fitting multiple models and assessing model fit until an optimal number of latent classes is identified. In addition, MixIRT model parameters are estimated based on numeric computation, which is prone to model convergence issues depending on the number of starting values specified. Implementing tree-based models requires only a few lines of code that are less computationally intensive.
Table 5
Comparison of mixture item response theory and tree-based item response theory models
Attributes
Mixture item response theory (MixIRT) model
Polytomous item response theory tree (IRTree) model
Description
This model combines latent class analysis with IRT to identify homogenous subgroups (i.e., latent classes) from the data. The IRT model parameters are allowed to vary across latent classes. Sample heterogeneity is operationalized as differences in IRT parameters across (i.e., latent classes) for which the IRT parameters are non-invariant [17]
A tree-based polytomous IRT model in which the sample is recursively partitioned into homogeneous subgroups. The study sample is partitioned into homogenous subgroups by identifying the most important covariate for which the optimal cut point maximizes the differences in the measurement model (PC Model) parameters across the subgroups [20]. The process is repeated recursively in the child nodes until a stopping criterion is reached
The unidimensionality of the patient-reported outcomes measure (PROM)
Assumes a unidimensional factor structure
Assumes a unidimensional factor structure
Characterization of sample heterogeneity
Can detect sample heterogeneity by incorporating known covariates to Iderive the posterior probability of latent class membership [17]
The IRTree handles heterogeneity with respect to differences in partial credit model parameters in the sample by splitting the sample using known covariates as input variables [20]
Can also detect unobserved sample heterogeneity without known covariates
Unobserved sample heterogeneity cannot be determined without known covariates
Model overfitting
Fit statistics such as the bootstrap likelihood ratio test internally validate the optimal number of latent classes via bootstrapped LRT [29]
Overfitting is avoided by using adopting a Bonferroni correction when determining the splitting points
Incorporation of multiple covariates
MixIRT can detect sample heterogeneity with and without multiple known covariates [32, 33]
The IRTree can only detect sample heterogeneity using multiple covariates only [20]. IRTree models are sensitive to the type of variable and the number of variables included as input variables
Sample size requirements
Requires large sample sizes to ensure stable parameter estimation [9, 10]
Requires large sample sizes to ensure stable parameter estimation. A simple recommended rule of thumb is 10 times the average number of model parameters per item for each node
Computation efficiency
MixIRT model requires the fitting of multiple models by sequentially increasing the number of latent classes. Multiple fit statistics are often used to determine the optimal number of latent classes. Implementing MixIRT models can be computationally intensive. The model can exhibit model convergence issues even with a large number of starting values
The IRTree does not require fitting multiple models before determining the optimal number of subgroups. It requires only a few lines of syntax and is not computationally intensive
Model misspecification
Multiple fit statistics, such as Bayesian information criterion (BIC), sample size-adjusted BIC, Bootstrap likelihood ratio test, and Vuong–Lo–Mendell–Rubin likelihood ratio test, are available for determining the optimal number of latent classes in MixIRT [2731]
Fit statistics for determining the optimal number of subgroups are not available
Software implementation
Implemented in MPlus [50] and R software psychomix package [50]
Implemented using the R software package psychotree [50]
MixIRT mixture item response theory model, PCTree tree-based partial credit model, DIF differential item functioning
Tree-based latent variable models, such as PCTree, are promising methods for identifying sample heterogeneity in PROMs in heterogeneous population of patients defined by multiple interacting variables. Unlike conventional group-based methods for DIF detection that require a priori specification of the variable associated with DIF, these methods can be appealing for handling population heterogeneity in PROM scores. They can be used in exploratory analyses to generate hypotheses about potential DIF variables.
Despite the strengths of these models, they are prone to the inherent limitations of unsupervised learning methods and latent variable methods from which they are derived. Specifically, tree-based models are prone to overfitting, which may lead to the detection of spurious subgroups. Bonferroni-corrected structural change tests and pre-specification of minimum terminal node size are two recommended approaches for preventing model overfitting in tree-based models. Furthermore, the accuracy of the tree-based IRT models for detecting sample heterogeneity depends, to a greater extent, on the variables included as input covariates. For example, the conclusions from the empirical analysis in this study are limited to the available demographic, clinical, and disease characteristics used as input variables. The APPROACH registry does not collect data on a history of depression, medical treatment for depression, cognitive impairment, and other important risk factors that may be associated with DIF in patient-reported HADs items. This limits the generalizability of the conclusions from this empirical study. Moreover, changing the type (i.e., ordinal, continuous, or mixed) and the number of covariates included in the model could influence the number and type of homogenous subgroups (nodes) identified.
Future research could investigate determining the optimal minimum sample size requirement for the terminal nodes across various data characteristics. Also, comparing PCTree and latent class PCM models was based on a single empirical data. Although results from simulation studies reported by Komboz et al. [20] show that PCTree exhibit comparable control of familywise Type I error as the multigroup PCM, the comparison of the Type I error of PCM and MixPCM is yet to be investigated. Future research will use computer simulations to examine the comparative performance of PCTree and MixPCM for detecting DIF in PROM items, with respect to their Type I error and statistical power, under a variety of distributional and data characteristics. Finally, the empirical comparison of these unsupervised learning methods in this study focuses on identifying homogeneous subgroups of individuals consistent patterns of responses to the HADS items and not detecting HADS items that exhibit DIF. While mixture IRT models have been extended to detect DIF and estimate DIF effect sizes in PROM items [58, 59], future research will investigate the extension of tree-based IRT models for detecting DIF PROM items.

Conclusion

In summary, this study revealed that MixPCM and PCTree models are inconsistent in identifying covariates associated with DIF in PROM items. While PCTree is an alternative methodology to the mixture IRT model for examining sample heterogeneity in PROMs items, future research is needed, including computer simulations to evaluate the Type I error and statistical power of these models for DIF detection.

Declarations

Competing interests

None of the authors have financial or non-financial interests to disclose.

Ethical approval

Ethical approval to use de-identified data from the APPROACH registry was obtained from the University of Calgary Conjoint Health Research Ethics Board (REB20-1721).
Not applicable.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://​creativecommons.​org/​licenses/​by/​4.​0/​.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
share
DELEN

Deel dit onderdeel of sectie (kopieer de link)

  • Optie A:
    Klik op de rechtermuisknop op de link en selecteer de optie “linkadres kopiëren”
  • Optie B:
    Deel de link per e-mail

Onze productaanbevelingen

BSL Podotherapeut Totaal

Binnen de bundel kunt u gebruik maken van boeken, tijdschriften, e-learnings, web-tv's en uitlegvideo's. BSL Podotherapeut Totaal is overal toegankelijk; via uw PC, tablet of smartphone.

Bijlagen

Supplementary Information

Below is the link to the electronic supplementary material.
Literatuur
1.
go back to reference Gibbons, E., Black, N., Fallowfield, L., Newhouse, R., & Fitzpatrick, R. (2016). Patient-reported outcome measures and the evaluation of services. In Challenges, solutions and future directions in the evaluation of service innovations in health care and public health. NIHR Journals Library. Gibbons, E., Black, N., Fallowfield, L., Newhouse, R., & Fitzpatrick, R. (2016). Patient-reported outcome measures and the evaluation of services. In Challenges, solutions and future directions in the evaluation of service innovations in health care and public health. NIHR Journals Library.
2.
go back to reference Alemayehu, D., & Cappelleri, J. C. (2012). Conceptual and analytical considerations toward the use of patient-reported outcomes in personalized medicine. American Health & Drug Benefits, 5(5), 310. Alemayehu, D., & Cappelleri, J. C. (2012). Conceptual and analytical considerations toward the use of patient-reported outcomes in personalized medicine. American Health & Drug Benefits, 5(5), 310.
3.
go back to reference Cappelleri, J. C., & Bushmakin, A. G. (2014). Interpretation of patient-reported outcomes. Statistical Methods in Medical Research, 23(5), 460–483.MathSciNetPubMedCrossRef Cappelleri, J. C., & Bushmakin, A. G. (2014). Interpretation of patient-reported outcomes. Statistical Methods in Medical Research, 23(5), 460–483.MathSciNetPubMedCrossRef
4.
go back to reference Wu, A. W., Kharrazi, H., Boulware, L. E., & Snyder, C. F. (2013). Measure once, cut twice—adding patient-reported outcome measures to the electronic health record for comparative effectiveness research. Journal of Clinical Epidemiology, 66(8), S12–S20.PubMedPubMedCentralCrossRef Wu, A. W., Kharrazi, H., Boulware, L. E., & Snyder, C. F. (2013). Measure once, cut twice—adding patient-reported outcome measures to the electronic health record for comparative effectiveness research. Journal of Clinical Epidemiology, 66(8), S12–S20.PubMedPubMedCentralCrossRef
5.
go back to reference Øvretveit, J., Zubkoff, L., Nelson, E. C., Frampton, S., Knudsen, J. L., & Zimlichman, E. (2017). Using patient-reported outcome measurement to improve patient care. International Journal for Quality in Health Care, 29(6), 874–879.PubMedCrossRef Øvretveit, J., Zubkoff, L., Nelson, E. C., Frampton, S., Knudsen, J. L., & Zimlichman, E. (2017). Using patient-reported outcome measurement to improve patient care. International Journal for Quality in Health Care, 29(6), 874–879.PubMedCrossRef
6.
go back to reference McHorney, C. A., & Fleishman, J. A. (2006). Assessing and understanding measurement equivalence in health outcome measures: Issues for further quantitative and qualitative inquiry. Medical Care, 44(11), S205–S210.PubMedCrossRef McHorney, C. A., & Fleishman, J. A. (2006). Assessing and understanding measurement equivalence in health outcome measures: Issues for further quantitative and qualitative inquiry. Medical Care, 44(11), S205–S210.PubMedCrossRef
7.
go back to reference Schmitt, N., & Kuljanin, G. (2008). Measurement invariance: Review of practice and implications. Human Resource Management Review, 18(4), 210–222.CrossRef Schmitt, N., & Kuljanin, G. (2008). Measurement invariance: Review of practice and implications. Human Resource Management Review, 18(4), 210–222.CrossRef
10.
go back to reference Bingham, C. O., III., Noonan, V. K., Auger, C., Feldman, D. E., Ahmed, S., & Bartlett, S. J. (2017). Montreal Accord on patient-reported outcomes (PROs) use series–paper 4: Patient-reported outcomes can inform clinical decision making in chronic care. Journal of Clinical Epidemiology, 89, 136–141.PubMedPubMedCentralCrossRef Bingham, C. O., III., Noonan, V. K., Auger, C., Feldman, D. E., Ahmed, S., & Bartlett, S. J. (2017). Montreal Accord on patient-reported outcomes (PROs) use series–paper 4: Patient-reported outcomes can inform clinical decision making in chronic care. Journal of Clinical Epidemiology, 89, 136–141.PubMedPubMedCentralCrossRef
11.
go back to reference Teresi, J. A., & Fleishman, J. A. (2007). Differential item functioning and health assessment. Quality of Life Research, 16(1), 33–42.PubMedCrossRef Teresi, J. A., & Fleishman, J. A. (2007). Differential item functioning and health assessment. Quality of Life Research, 16(1), 33–42.PubMedCrossRef
12.
go back to reference Zumbo, B. D. (1999). A handbook on the theory and methods of differential item functioning (DIF): Logistic regression modeling as a unitary framework for binary and Likert-type (ordinal) item scores. Directorate of Human Resources Research and Evaluation Department of National Defense. Zumbo, B. D. (1999). A handbook on the theory and methods of differential item functioning (DIF): Logistic regression modeling as a unitary framework for binary and Likert-type (ordinal) item scores. Directorate of Human Resources Research and Evaluation Department of National Defense.
13.
go back to reference Swaminathan, H., & Rogers, H. J. (1990). Detecting differential item functioning using logistic regression procedures. Journal of Educational Measurement, 27(4), 361–370.CrossRef Swaminathan, H., & Rogers, H. J. (1990). Detecting differential item functioning using logistic regression procedures. Journal of Educational Measurement, 27(4), 361–370.CrossRef
14.
go back to reference Wu, Q., & Lei, P.-W. (2009). Using multigroup confirmatory factor analysis to detect differential item functioning when tests are multidimensional. In Paper presented at the Annual Meeting of the National Council for Measurement in Education. San Diego. Wu, Q., & Lei, P.-W. (2009). Using multigroup confirmatory factor analysis to detect differential item functioning when tests are multidimensional. In Paper presented at the Annual Meeting of the National Council for Measurement in Education. San Diego.
15.
go back to reference Thissen, D., Steinberg, L., & Wainer, H. (1993). Detection of differential item functioning using the parameters of item response models. In P. W. Holland & H. Wainer (Eds.), Differential item functioning (pp. 67–113). Lawrence Erlbaum Associates Inc. Thissen, D., Steinberg, L., & Wainer, H. (1993). Detection of differential item functioning using the parameters of item response models. In P. W. Holland & H. Wainer (Eds.), Differential item functioning (pp. 67–113). Lawrence Erlbaum Associates Inc.
16.
go back to reference Stark, S., Chernyshenko, O. S., & Drasgow, F. (2006). Detecting differential item functioning with confirmatory factor analysis and item response theory: Toward a unified strategy. Journal of Applied Psychology, 91(6), 1292.PubMedCrossRef Stark, S., Chernyshenko, O. S., & Drasgow, F. (2006). Detecting differential item functioning with confirmatory factor analysis and item response theory: Toward a unified strategy. Journal of Applied Psychology, 91(6), 1292.PubMedCrossRef
17.
go back to reference Rost, J. (1990). Rasch models in latent classes: An integration of two approaches to item analysis. Applied Psychological Measurement, 14(3), 271–282.MathSciNetCrossRef Rost, J. (1990). Rasch models in latent classes: An integration of two approaches to item analysis. Applied Psychological Measurement, 14(3), 271–282.MathSciNetCrossRef
18.
go back to reference Sawatzky, R., Ratner, P. A., Kopec, J. A., & Zumbo, B. D. (2012). Latent variable mixture models: A promising approach for the validation of patient reported outcomes. Quality of Life Research, 21(4), 637–650.PubMedCrossRef Sawatzky, R., Ratner, P. A., Kopec, J. A., & Zumbo, B. D. (2012). Latent variable mixture models: A promising approach for the validation of patient reported outcomes. Quality of Life Research, 21(4), 637–650.PubMedCrossRef
19.
go back to reference Strobl, C., Kopf, J., & Zeileis, A. (2015). Rasch trees: A new method for detecting differential item functioning in the Rasch model. Psychometrika, 80(2), 289–316.MathSciNetPubMedCrossRef Strobl, C., Kopf, J., & Zeileis, A. (2015). Rasch trees: A new method for detecting differential item functioning in the Rasch model. Psychometrika, 80(2), 289–316.MathSciNetPubMedCrossRef
20.
go back to reference Komboz, B., Strobl, C., & Zeileis, A. (2018). Tree-based global model tests for polytomous Rasch models. Educational and Psychological Measurement, 78(1), 128–166.PubMedCrossRef Komboz, B., Strobl, C., & Zeileis, A. (2018). Tree-based global model tests for polytomous Rasch models. Educational and Psychological Measurement, 78(1), 128–166.PubMedCrossRef
21.
go back to reference Bollmann, S., Berger, M., & Tutz, G. (2018). Item-focused trees for the detection of differential item functioning in partial credit models. Educational and Psychological Measurement, 78(5), 781–804.PubMedCrossRef Bollmann, S., Berger, M., & Tutz, G. (2018). Item-focused trees for the detection of differential item functioning in partial credit models. Educational and Psychological Measurement, 78(5), 781–804.PubMedCrossRef
22.
go back to reference Sen, S., & Cohen, A. S. (2019). Applications of mixture IRT models: A literature review. Measurement: Interdisciplinary Research and Perspectives, 17(4), 177–191. Sen, S., & Cohen, A. S. (2019). Applications of mixture IRT models: A literature review. Measurement: Interdisciplinary Research and Perspectives, 17(4), 177–191.
23.
go back to reference Wu, X., Sawatzky, R., Hopman, W., Mayo, N., Sajobi, T. T., Liu, J., Prior, J., Papaioannou, A., Josse, R. G., Towheed, T., & Davison, K. S. (2017). Latent variable mixture models to test for differential item functioning: A population-based analysis. Health and Quality of Life Outcomes, 15(1), 1–13.CrossRef Wu, X., Sawatzky, R., Hopman, W., Mayo, N., Sajobi, T. T., Liu, J., Prior, J., Papaioannou, A., Josse, R. G., Towheed, T., & Davison, K. S. (2017). Latent variable mixture models to test for differential item functioning: A population-based analysis. Health and Quality of Life Outcomes, 15(1), 1–13.CrossRef
24.
go back to reference Sawatzky, R., Russell, L. B., Sajobi, T. T., Lix, L. M., Kopec, J., & Zumbo, B. D. (2018). The use of latent variable mixture models to identify invariant items in test construction. Quality of Life Research, 27(7), 1745–1755.PubMedCrossRef Sawatzky, R., Russell, L. B., Sajobi, T. T., Lix, L. M., Kopec, J., & Zumbo, B. D. (2018). The use of latent variable mixture models to identify invariant items in test construction. Quality of Life Research, 27(7), 1745–1755.PubMedCrossRef
25.
go back to reference Sajobi, T. T., Josephson, C. B., Sawatzky, R., Wang, M., Lawal, O., Patten, S. B., … Wiebe, S. (2021). Quality of Life in Epilepsy: Same questions, but different meaning to different people. Epilepsia, 62(9), 2094–2102. Sajobi, T. T., Josephson, C. B., Sawatzky, R., Wang, M., Lawal, O., Patten, S. B., … Wiebe, S. (2021). Quality of Life in Epilepsy: Same questions, but different meaning to different people. Epilepsia, 62(9), 2094–2102.
26.
go back to reference Masters, G. (1982). A Rasch model for partial credit scoring. Psychometrika, 47, 149–174.CrossRef Masters, G. (1982). A Rasch model for partial credit scoring. Psychometrika, 47, 149–174.CrossRef
27.
go back to reference Choi, I.-H., Paek, I., & Cho, S.-J. (2017). The impact of various class-distinction features on model selection in the mixture Rasch model. The Journal of Experimental Education, 85(3), 411–424.CrossRef Choi, I.-H., Paek, I., & Cho, S.-J. (2017). The impact of various class-distinction features on model selection in the mixture Rasch model. The Journal of Experimental Education, 85(3), 411–424.CrossRef
28.
go back to reference Fabrigar, L. R., Wegener, D. T., MacCallum, R. C., & Strahan, E. J. (1999). Evaluating the use of exploratory factor analysis in psychological research. Psychological Methods, 4, 272–299.CrossRef Fabrigar, L. R., Wegener, D. T., MacCallum, R. C., & Strahan, E. J. (1999). Evaluating the use of exploratory factor analysis in psychological research. Psychological Methods, 4, 272–299.CrossRef
29.
go back to reference Hattie, J. (1984). Methodology review: Assessing unidimensionality of tests and items. Applied Psychological Measurement, 20, 1–14.CrossRef Hattie, J. (1984). Methodology review: Assessing unidimensionality of tests and items. Applied Psychological Measurement, 20, 1–14.CrossRef
30.
go back to reference Slocum-Gori, S. L., & Zumbo, B. D. (2011). Assessing the unidimensionality of psychological scales: Using multiple criteria from factor analysis. Social Indicators Research, 102(3), 443–461.CrossRef Slocum-Gori, S. L., & Zumbo, B. D. (2011). Assessing the unidimensionality of psychological scales: Using multiple criteria from factor analysis. Social Indicators Research, 102(3), 443–461.CrossRef
31.
go back to reference Glorfeld, L. W. (1995). An improvement on Horn’s parallel analysis methodology for selecting the correct number of factors to retain. Educational and Psychological Measurement., 55(3), 377–393.CrossRef Glorfeld, L. W. (1995). An improvement on Horn’s parallel analysis methodology for selecting the correct number of factors to retain. Educational and Psychological Measurement., 55(3), 377–393.CrossRef
32.
go back to reference Preinerstorfer, D., & Formann, A. K. (2012). Parameter recovery and model selection in mixed Rasch models. British Journal of Mathematical and Statistical Psychology, 65(2), 251–262.MathSciNetPubMedCrossRef Preinerstorfer, D., & Formann, A. K. (2012). Parameter recovery and model selection in mixed Rasch models. British Journal of Mathematical and Statistical Psychology, 65(2), 251–262.MathSciNetPubMedCrossRef
33.
go back to reference Feng, Z. D., & McCullogh, C. E. (1996). Using bootstrap likelihood ratios in finite mixture models. Journal of Royal Statistical Society., 58(3), 609–617. Feng, Z. D., & McCullogh, C. E. (1996). Using bootstrap likelihood ratios in finite mixture models. Journal of Royal Statistical Society., 58(3), 609–617.
34.
go back to reference Lubke, G., & Muthén, B. O. (2007). Performance of factor mixture models as a function of model size, covariate effects, and class-specific parameters. Structural Equation Modeling: A Multidisciplinary Journal, 14(1), 26–47.MathSciNetCrossRef Lubke, G., & Muthén, B. O. (2007). Performance of factor mixture models as a function of model size, covariate effects, and class-specific parameters. Structural Equation Modeling: A Multidisciplinary Journal, 14(1), 26–47.MathSciNetCrossRef
35.
go back to reference Vermunt, J. K. (2010). Latent class modelling with covariates: Two improved three-step approaches. Political Analysis, 18(4), 450–469.CrossRef Vermunt, J. K. (2010). Latent class modelling with covariates: Two improved three-step approaches. Political Analysis, 18(4), 450–469.CrossRef
36.
go back to reference Asparouhov, T., & Muthén, B. (2014). Auxiliary variables in mixture modeling: Three-step approaches using M plus. Structural Equation Modeling: A Multidisciplinary Journal, 21(3), 329–341.MathSciNetCrossRef Asparouhov, T., & Muthén, B. (2014). Auxiliary variables in mixture modeling: Three-step approaches using M plus. Structural Equation Modeling: A Multidisciplinary Journal, 21(3), 329–341.MathSciNetCrossRef
37.
go back to reference Ghali, W. A., & Knudtson, M. L. (2000). Overview of the Alberta Provincial Project for Outcome Assessment in Coronary Heart Disease. On behalf of the APPROACH investigators. The Canadian Journal of Cardiology, 16(10), 1225–1230.PubMed Ghali, W. A., & Knudtson, M. L. (2000). Overview of the Alberta Provincial Project for Outcome Assessment in Coronary Heart Disease. On behalf of the APPROACH investigators. The Canadian Journal of Cardiology, 16(10), 1225–1230.PubMed
38.
go back to reference Zigmond, A. S., & Snaith, R. P. (1983). The hospital anxiety and depression scale. Acta Psychiatrica Scandinavica, 67(6), 361–370.PubMedCrossRef Zigmond, A. S., & Snaith, R. P. (1983). The hospital anxiety and depression scale. Acta Psychiatrica Scandinavica, 67(6), 361–370.PubMedCrossRef
39.
go back to reference Stafford, L., Berk, M., & Jackson, H. J. (2007). Validity of the hospital anxiety and depression scale and patient health questionnaire-9 to screen for depression in patients with coronary artery disease. General Hospital Psychiatry, 29(5), 417–424.PubMedCrossRef Stafford, L., Berk, M., & Jackson, H. J. (2007). Validity of the hospital anxiety and depression scale and patient health questionnaire-9 to screen for depression in patients with coronary artery disease. General Hospital Psychiatry, 29(5), 417–424.PubMedCrossRef
40.
go back to reference De Smedt, D., Clays, E., Doyle, F., Kotseva, K., Prugger, C., Pająk, A., … Group, E. S. (2013). Validity and reliability of three commonly used quality of life measures in a large European population of coronary heart disease patients. International Journal of Cardiology, 167(5), 2294–2299. De Smedt, D., Clays, E., Doyle, F., Kotseva, K., Prugger, C., Pająk, A., … Group, E. S. (2013). Validity and reliability of three commonly used quality of life measures in a large European population of coronary heart disease patients. International Journal of Cardiology, 167(5), 2294–2299.
41.
go back to reference Smith, R. M., Schumacker, R. E., & Bush, M. J. (1995). Using item mean squares to evaluate fit to the Rasch model. Journal of Outcome Measurement, 2(1), 66–78. Smith, R. M., Schumacker, R. E., & Bush, M. J. (1995). Using item mean squares to evaluate fit to the Rasch model. Journal of Outcome Measurement, 2(1), 66–78.
42.
go back to reference Karabatsos, G. (2000). A critique of Rasch residual fit statistics. Journal of Applied Measurement, 1(2), 152–176.PubMed Karabatsos, G. (2000). A critique of Rasch residual fit statistics. Journal of Applied Measurement, 1(2), 152–176.PubMed
43.
go back to reference Christensen, K. B., & Kreiner, S. (2012). Item fit statistics. In K. B. Christensen, S. Kreiner, & M. Mesbah (Eds.), Rasch Models in Health (pp. 83–104). Wiley.CrossRef Christensen, K. B., & Kreiner, S. (2012). Item fit statistics. In K. B. Christensen, S. Kreiner, & M. Mesbah (Eds.), Rasch Models in Health (pp. 83–104). Wiley.CrossRef
44.
go back to reference Sharma, S., Mukherjee, S., Kumar, A., & Dillon, W. R. (2005). A simulation study to investigate the use of cutoff values for assessing model fit in covariance structure models. Journal of Business Research., 58, 935–943.CrossRef Sharma, S., Mukherjee, S., Kumar, A., & Dillon, W. R. (2005). A simulation study to investigate the use of cutoff values for assessing model fit in covariance structure models. Journal of Business Research., 58, 935–943.CrossRef
45.
go back to reference Schreiber, J. B., Nora, A., Stage, F. K., Barlow, E. A., & King, J. (2006). Reporting structural equation modeling and confirmatory factor analysis results: A review. Journal of Educational Research, 99, 323–338.CrossRef Schreiber, J. B., Nora, A., Stage, F. K., Barlow, E. A., & King, J. (2006). Reporting structural equation modeling and confirmatory factor analysis results: A review. Journal of Educational Research, 99, 323–338.CrossRef
46.
go back to reference Bjorner, J. B., Kreiner, S., Ware, J. E., Damsgaard, M. T., & Bech, P. (1998). Differential item functioning in the Danish translation of the SF-36. Journal of Clinical Epidemiology, 51(11), 1189–1202.PubMedCrossRef Bjorner, J. B., Kreiner, S., Ware, J. E., Damsgaard, M. T., & Bech, P. (1998). Differential item functioning in the Danish translation of the SF-36. Journal of Clinical Epidemiology, 51(11), 1189–1202.PubMedCrossRef
47.
go back to reference Cameron, I. M., Crawford, J. R., Lawton, K., & Reid, I. C. (2013). Differential item functioning of the HADS and PHQ-9: An investigation of age, gender and educational background in a clinical UK primary care sample. Journal of Affective Disorders, 147(1–2), 262–268.PubMedCrossRef Cameron, I. M., Crawford, J. R., Lawton, K., & Reid, I. C. (2013). Differential item functioning of the HADS and PHQ-9: An investigation of age, gender and educational background in a clinical UK primary care sample. Journal of Affective Disorders, 147(1–2), 262–268.PubMedCrossRef
48.
go back to reference Cameron, I. M., Scott, N. W., Adler, M., & Reid, I. C. (2014). A comparison of three methods of assessing differential item functioning (DIF) in the hospital anxiety depression scale: Ordinal logistic regression, Rasch analysis and the Mantel Chi-square procedure. Quality of Life Research, 23, 2883–2888.PubMedCrossRef Cameron, I. M., Scott, N. W., Adler, M., & Reid, I. C. (2014). A comparison of three methods of assessing differential item functioning (DIF) in the hospital anxiety depression scale: Ordinal logistic regression, Rasch analysis and the Mantel Chi-square procedure. Quality of Life Research, 23, 2883–2888.PubMedCrossRef
49.
go back to reference Shad, B., Ashouri, A., Hasandokht, T., Rajati, F., Salari, A., Naghshbandi, M., & Mirbolouk, F. (2017). Effect of multimorbidity on quality of life in adult with cardiovascular disease: a cross-sectional study. Health and Quality of Life Outcomes, 15(1), 1–8.CrossRef Shad, B., Ashouri, A., Hasandokht, T., Rajati, F., Salari, A., Naghshbandi, M., & Mirbolouk, F. (2017). Effect of multimorbidity on quality of life in adult with cardiovascular disease: a cross-sectional study. Health and Quality of Life Outcomes, 15(1), 1–8.CrossRef
50.
go back to reference Xue, C., Bian, L., Xie, Y. S., Yin, Z. F., Xu, Z. J., Chen, Q. Z., … Wang, C. Q. (2017). Impact of smoking on health-related quality of Life after percutaneous coronary intervention treated with drug-eluting stents: a longitudinal observational study. Health and Quality of Life Outcomes, 15(1), 1–9. 36. Xue, C., Bian, L., Xie, Y. S., Yin, Z. F., Xu, Z. J., Chen, Q. Z., … Wang, C. Q. (2017). Impact of smoking on health-related quality of Life after percutaneous coronary intervention treated with drug-eluting stents: a longitudinal observational study. Health and Quality of Life Outcomes, 15(1), 1–9. 36.
51.
go back to reference Sajobi, T. T., Wang, M., Awosoga, O., Santana, M., Southern, D., Liang, Z., et al. (2018). Trajectories of health-related quality of life in coronary artery disease. Circulation: Cardiovascular Quality and Outcomes, 11(3), 1–11. Sajobi, T. T., Wang, M., Awosoga, O., Santana, M., Southern, D., Liang, Z., et al. (2018). Trajectories of health-related quality of life in coronary artery disease. Circulation: Cardiovascular Quality and Outcomes, 11(3), 1–11.
52.
go back to reference Nadelmann, J., Frishman, W. H., Ooi, W. L., Tepper, D., Greenberg, S., Guzik, H., … Aronson, M. (1990). Prevalence, incidence and prognosis of recognized and unrecognized myocardial infarction in persons aged 75 years or older: the Bronx Aging Study. The American Journal of Cardiology, 66(5), 533–537. Nadelmann, J., Frishman, W. H., Ooi, W. L., Tepper, D., Greenberg, S., Guzik, H., … Aronson, M. (1990). Prevalence, incidence and prognosis of recognized and unrecognized myocardial infarction in persons aged 75 years or older: the Bronx Aging Study. The American Journal of Cardiology, 66(5), 533–537.
54.
go back to reference Graham, M. M., Norris, C. M., Galbraith, P. D., Knudtson, M. L., & Ghali, W. A. (2006). Quality of life after coronary revascularization in the elderly. European Heart Journal, 27(14), 1690–1698.PubMedCrossRef Graham, M. M., Norris, C. M., Galbraith, P. D., Knudtson, M. L., & Ghali, W. A. (2006). Quality of life after coronary revascularization in the elderly. European Heart Journal, 27(14), 1690–1698.PubMedCrossRef
55.
go back to reference R Core Team. (2019). R: A language and environment for statistical computing. R Foundation for Statistical Computing. R Core Team. (2019). R: A language and environment for statistical computing. R Foundation for Statistical Computing.
56.
go back to reference Muthén, L. K., & Muthén, B. O. (2017). Mplus statistical analysis with latent variables (8th ed.). User’s Guide. Muthén, L. K., & Muthén, B. O. (2017). Mplus statistical analysis with latent variables (8th ed.). User’s Guide.
57.
go back to reference Babyak, M. A., & Green, S. B. (2010). Confirmatory factor analysis: An introduction for psychosomatic medicine researchers. Psychosomatic Medicine, 72(6), 587–597.PubMedCrossRef Babyak, M. A., & Green, S. B. (2010). Confirmatory factor analysis: An introduction for psychosomatic medicine researchers. Psychosomatic Medicine, 72(6), 587–597.PubMedCrossRef
58.
go back to reference Zumbo, B. D. (1999). A handbook on the theory and methods of differential item functioning (DIF): Logistic regression modeling as a unitary framework for binary and Likert-type (ordinal) item scores. Directorate of Human Research and Evaluation Department of National Defense. Zumbo, B. D. (1999). A handbook on the theory and methods of differential item functioning (DIF): Logistic regression modeling as a unitary framework for binary and Likert-type (ordinal) item scores. Directorate of Human Research and Evaluation Department of National Defense.
59.
go back to reference Karadavut, T. (2021). Characterizing the latent classes in a mixture IRT model using DIF. Applied Measurement in Education, 34(4), 301–311.CrossRef Karadavut, T. (2021). Characterizing the latent classes in a mixture IRT model using DIF. Applied Measurement in Education, 34(4), 301–311.CrossRef
Metagegevens
Titel
Unsupervised item response theory models for assessing sample heterogeneity in patient-reported outcomes measures
Auteurs
Tolulope T. Sajobi
Ridwan A. Sanusi
Nancy E. Mayo
Richard Sawatzky
Lene Kongsgaard Nielsen
Veronique Sebille
Juxin Liu
Eric Bohm
Oluwagbohunmi Awosoga
Colleen M. Norris
Stephen B. Wilton
Matthew T. James
Lisa M. Lix
Publicatiedatum
21-12-2023
Uitgeverij
Springer International Publishing
Gepubliceerd in
Quality of Life Research / Uitgave 3/2024
Print ISSN: 0962-9343
Elektronisch ISSN: 1573-2649
DOI
https://doi.org/10.1007/s11136-023-03560-5

Andere artikelen Uitgave 3/2024

Quality of Life Research 3/2024 Naar de uitgave