Top

Quality of Life Research

Gepubliceerd in:

07-04-2022

A comparison of methods to address item non-response when testing for differential item functioning in multidimensional patient-reported outcome measures

Auteurs: Olawale F. Ayilara, Tolulope T. Sajobi, Ruth Barclay, Eric Bohm, Mohammad Jafari Jozani, Lisa M. Lix

Gepubliceerd in: Quality of Life Research | Uitgave 9/2022

Abstract

Purpose

Item non-response (i.e., missing data) may mask the detection of differential item functioning (DIF) in patient-reported outcome measures or result in biased DIF estimates. Non-response can be challenging to address in ordinal data. We investigated an unsupervised machine-learning method for ordinal item-level imputation and compared it with commonly-used item non-response methods when testing for DIF.

Methods

Computer simulation and real-world data were used to assess several item non-response methods using the item response theory likelihood ratio test for DIF. The methods included: (a) list-wise deletion (LD), (b) half-mean imputation (HMI), (c) full information maximum likelihood (FIML), and (d) non-negative matrix factorization (NNMF), which adopts a machine-learning approach to impute missing values. Control of Type I error rates were evaluated using a liberal robustness criterion for α = 0.05 (i.e., 0.025–0.075). Statistical power was assessed with and without adoption of an item non-response method; differences > 10% were considered substantial.

Results

Type I error rates for detecting DIF using LD, FIML and NNMF methods were controlled within the bounds of the robustness criterion for > 95% of simulation conditions, although the NNMF occasionally resulted in inflated rates. The HMI method always resulted in inflated error rates with 50% missing data. Differences in power to detect moderate DIF effects for LD, FIML and NNMF methods were substantial with 50% missing data and otherwise insubstantial.

Conclusion

The NNMF method demonstrated comparable performance to commonly-used non-response methods. This computationally-efficient method represents a promising approach to address item-level non-response when testing for DIF.

vorige artikel Psychometric validity and reliability of the 10- and 2-item Connor–Davidson resilience scales among a national sample of Americans responding to the Covid-19 pandemic: an item response theory analysis

volgende artikel Assessing the construct validity of the Quality-of-Life-Aged Care Consumers (QOL-ACC): an aged care-specific quality-of-life measure

Alleen toegankelijk voor geautoriseerde gebruikers

Johnston, B. C., Patrick, D. L., Thorlund, K., Busse, J. W., da Costa, B. R., Schünemann, H. J., & Guyatt, G. H. (2013). Patient-reported outcomes in meta-analyses –part 2: Methods for improving interpretability for decision-makers. Health and Quality of Life Outcomes, 11(211), 1–9. https://doi.org/10.1186/1477-7525-11-211CrossRef

Guyatt, G. H., Feeny, D. H., & Patrick, D. L. (1993). Measuring health-related quality of life. Annals of Internal Medicine, 118(8), 622–629.CrossRef

Berzon, R., Hays, R. D., & Shumaker, S. A. (1993). International use, application and performance of health-related quality of life instruments. Quality of Life Research, 2(6), 367–368. https://doi.org/10.1007/BF00422214CrossRefPubMed

Bulut, O., & Kim, D. (2021). The use of data imputation when investigating dimensionality in Sparse data from computerized adaptive tests. Journal of Applied Testing Technology, 22(2), 1.

Jia, F., & Wu, W. (2019). Evaluating methods for handling missing ordinal data in structural equation modeling. Behavior Research Methods, 51(5), 2337–2355. https://doi.org/10.3758/s13428-018-1187-4CrossRefPubMed

Little, R. J. A., & Rubin, D. B. (2002). Statistical analysis with missing data (2nd ed.). Wiley.CrossRef

Bell, M. L., & Fairclough, D. L. (2014). Practical and statistical issues in missing data for longitudinal patient-reported outcomes. Statistical Methods in Medical Research, 23(5), 440–459. https://doi.org/10.1177/0962280213476378CrossRefPubMed

Teresi, J. A., & Fleishman, J. A. (2007). Differential item functioning and health assessment. Quality of Life Research, 16(SUPPL. 1), 33–42. https://doi.org/10.1007/s11136-007-9184-6CrossRefPubMed

Banks, K. (2015). An introduction to missing data in the context of differential item functioning. Practical Assessment, Research and Evaluation, 20(12), 1–10.

10.

Finch, H. (2011). The use of multiple imputation for missing data in uniform DIF analysis: Power and type I error rates. Applied Measurement in Education, 24(4), 281–301. https://doi.org/10.1080/08957347.2011.607054CrossRef

11.

Donneau, A. F., Mauer, M., Molenberghs, G., & Albert, A. (2015). A simulation study comparing multiple imputation methods for incomplete longitudinal ordinal data. Communications in Statistics, 44(5), 1311–1338. https://doi.org/10.1080/03610918.2013.818690CrossRef

12.

Eekhout, I., De Vet, H. C. W., Twisk, J. W. R., Brand, J. P. L., De Boer, M. R., & Heymans, M. W. (2014). Missing data in a multi-item instrument were best handled by multiple imputation at the item score level. Journal of Clinical Epidemiology, 67(3), 335–342. https://doi.org/10.1016/j.jclinepi.2013.09.009CrossRefPubMed

13.

Kombo, A. Y., Mwambi, H., & Molenberghs, G. (2017). Multiple imputation for ordinal longitudinal data with monotone missing data patterns. Journal of Applied Statistics, 44(2), 270–287. https://doi.org/10.1080/02664763.2016.1168370CrossRef

14.

Raghunathan, T. E., Lepkowski, J. M., & Van Hoewyk, J. (2001). A multivariate technique for multiply imputing missing values using a sequence of regression models. Survey Methodology, 27(1), 85–95.

15.

Enders, C. K. (2010). Applied missing data analysis. The Guilford Press.

16.

Liu, Y., Millsap, R. E., West, S. G., Tein, J. Y., Tanaka, R., & Grimm, K. J. (2017). Testing measurement invariance in longitudinal data with ordered-categorical measures. Psychological Methods, 22(3), 486–506.CrossRef

17.

Chen, P. Y., Wu, W., Garnier-Villarreal, M., Kite, B. A., & Jia, F. (2020). Testing measurement invariance with ordinal missing data: A comparison of estimators and missing data techniques. Multivariate Behavioral Research, 55(1), 87–101.CrossRef

18.

Donneau, A. F., Mauer, M., Lambert, P., Molenberghs, G., & Albert, A. (2015). Simulation-based study comparing multiple imputation methods for non-monotone missing ordinal data in longitudinal settings. Journal of Biopharmaceutical Statistics, 25(3), 570–601.CrossRef

19.

Baker, F. B., & Kim, S. H. (2004). Item response theory: Parameter estimation techniques (2nd ed.). CRC Press.CrossRef

20.

Lin, X. E., & Boutros, P. C. (2020). Optimization and expansion of non-negative matrix factorization. BMC Bioinformatics, 21(1), 1–10. https://doi.org/10.1186/s12859-019-3312-5CrossRef

21.

Zhang, S., Wang, W., Ford, J., & Makedon, F. (2006). Learning from incomplete ratings using non-negative matrix factorization. In: Proceedings of the Sixth SIAM International Conference on Data Mining (pp. 549–553). https://doi.org/10.1137/1.9781611972764.58

22.

Mazumder, R., Hastie, T., & Tibshirani, R. (2010). Spectral regularization algorithms for learning large incomplete matrices. Journal of Machine Learning Research, 11, 2287–2322.PubMed

23.

Wold, H. (1975). Soft modelling by latent variables: The nonlinear iterative partial least squares (NIPALS) approach. Journal of Applied Probability, 12(S1), 117–142.CrossRef

24.

Fairclough, A. D. L., & Cella, D. F. (1996). Functional assessment of cancer therapy (FACT-G): Non-response to individual questions. Quality of Life Research, 5(3), 321–329.CrossRef

25.

Enders, C. K. (2004). The impact of missing data on sample reliability estimates: Implications for reliability reporting practices. Educational and Psychological Measurement, 64(3), 419–436. https://doi.org/10.1177/0013164403261050CrossRef

26.

Collins, L. M., Schafer, J. L., & Kam, C. M. (2001). A comparison of inclusive and restrictive strategies in modern missing data procedures. Psychological Methods, 6(4), 330–351.CrossRef

27.

Schafer, J. L., & Graham, J. W. (2002). Missing data: Our view of the state of the art. Psychological Methods, 7(2), 147–177. https://doi.org/10.1037/1082-989X.7.2.147CrossRefPubMed

28.

Ayilara, O. F., Zhang, L., Sajobi, T. T., Sawatzky, R., Bohm, E., & Lix, L. M. (2019). Impact of missing data on bias and precision when estimating change in patient-reported outcomes from a clinical registry. Health and Quality of Life Outcomes, 17(1), 106. https://doi.org/10.1186/s12955-019-1181-2CrossRefPubMedPubMedCentral

29.

Lee, D. D., & Seung, H. S. (1999). Learning the parts of objects by non-negative matrix factorization. Nature, 401(6755), 788–791. https://doi.org/10.1038/44565CrossRefPubMed

30.

Pauca, V. P., Piper, J., & Plemmons, R. J. (2006). Nonnegative matrix factorization for spectral data analysis. Linear Algebra and Its Applications, 416(1), 29–47. https://doi.org/10.1016/j.laa.2005.06.025CrossRef

31.

Lin, X. E., & Boutros, P. (2019). NNLM: a package for fast and versatile nonnegative matrix factorization.

32.

Forero, C. G., & Maydeu-Olivares, A. (2009). Estimation of IRT graded response models: Limited versus full information methods. Psychological Methods, 14(3), 275–299. https://doi.org/10.1037/a0015825CrossRefPubMed

33.

Jiang, S., Wang, C., & Weiss, D. J. (2016). Sample size requirements for estimation of item parameters in the multidimensional graded response model. Frontiers in Psychology. https://doi.org/10.3389/fpsyg.2016.00109CrossRefPubMedPubMedCentral

34.

Olsbjerg, M., & Christensen, K. B. (2015). Modeling local dependence in longitudinal IRT models. Behavior Research Methods, 47(4), 1413–1424. https://doi.org/10.3758/s13428-014-0553-0CrossRefPubMed

35.

De Ayala, R. J. (1994). The influence of multidimensionality on the graded response model. Applied Psychological Measurement, 18(2), 155–170.CrossRef

36.

Bulut, O., & Sunbul, Ö. (2017). Monte Carlo simulation studies in item response theory with the R programming language. Journal of Measurement and Evaluation in Education and Psychology, 8(3), 266–287. https://doi.org/10.21031/epod.305821CrossRef

37.

Finch, H. W. (2011). The impact of missing data on the detection of nonuniform differential item functioning. Educational and Psychological Measurement, 71(4), 663–683.CrossRef

38.

Schouten, R. M., Lugtig, P., & Vink, G. (2018). Generating missing values for simulation purposes: A multivariate amputation procedure. Journal of Statistical Computation and Simulation, 88(15), 2909–2930. https://doi.org/10.1080/00949655.2018.1491577CrossRef

39.

Nassiri, V., Molenberghs, G., Verbeke, G., & Barbosa-Breda, J. (2020). Iterative multiple imputation: A framework to determine the number of imputed datasets. American Statistician, 74(2), 125–136. https://doi.org/10.1080/00031305.2018.1543615CrossRef

40.

Goretzko, D. (2021). Factor retention in exploratory factor analysis with missing data. Educational and Psychological Measurement. https://doi.org/10.1177/00131644211022031CrossRefPubMedPubMedCentral

41.

van Buuren, S., & Groothuis-Oudshoorn, K. (2011). mice: Multivariate imputation by chained equations in R. Journal of Statistical Software, 45(3), 1–67. https://doi.org/10.18637/jss.v045.i03CrossRef

42.

Bulut, O., & Suh, Y. (2017). Detecting multidimensional differential item functioning with the multiple indicators multiple causes model, the item response theory likelihood ratio test, and logistic regression. Frontiers in Education, 2(October), 1–14. https://doi.org/10.3389/feduc.2017.00051CrossRef

43.

Bourion-Bédès, S., Schwan, R., Laprevote, V., Bédès, A., Bonnet, J. L., & Baumann, C. (2015). Differential item functioning (DIF) of SF-12 and Q-LES-Q-SF items among French substance users. Health and Quality of Life Outcomes. https://doi.org/10.1186/s12955-015-0365-7CrossRefPubMedPubMedCentral

44.

Yadegari, I., Bohm, E., Ayilara, O. F., Zhang, L., Sawatzky, R., Sajobi, T. T., & Lix, L. M. (2019). Differential item functioning of the SF-12 in a population-based regional joint replacement registry. Health and Quality of Life Outcomes, 17(1), 1–11. https://doi.org/10.1186/s12955-019-1166-1CrossRef

45.

Lix, L. M., Wu, X., Hopman, W., Mayo, N., Sajobi, T. T., Liu, J., Prior, J. C., Papaioannou, A., Josse, R. G., Towheed, T. E., Davison, K. S., & Sawatzky, R. (2016). Differential item functioning in the SF-36 physical functioning and mental health sub scales: A population-based investigation in the Canadian multicentre osteoporosis study. PLoS ONE, 11(3), 1–13. https://doi.org/10.1371/journal.pone.0151519CrossRef

46.

Kwon, J. Y., & Sawatzky, R. (2017). Examining gender-related differential item functioning of the veterans rand 12-item health survey. Quality of Life Research, 26(10), 2877–2883. https://doi.org/10.1007/s11136-017-1638-xCrossRefPubMed

47.

Stout, W., Li, H. H., Nandakumar, R., & Bolt, D. (1997). MULTISIB: A procedure to investigate DIF when a test is intentionally two-dimensional. Applied Psychological Measurement, 21(3), 195–213. https://doi.org/10.1177/01466216970213001CrossRef

48.

Chalmers, R. P. (2012). mirt: A multidimensional item response theory package for the R environment. Journal of Statistical Software, 48(6), 1–29. https://doi.org/10.18637/jss.v048.i06CrossRef

49.

Bradley, J. V. (1978). Robustness. British Journal of Mathematical & Statistical Psychology, 31(2), 144–152.CrossRef

50.

Kaplan, D. (1989). A study of the sampling variability and z-values of parameter estimates from misspecified structural equation models. Multivariate Behavioral Research, 24(1), 41–57.CrossRef

51.

Curran, P., & West, S. G. (1996). The robustness of test statistics to nonnormality and specification error in confirmatory factor analysis. Psychological Methods, 1(1), 16–29.CrossRef

52.

Zhang, L., Lix, L. M., Ayilara, O., Sawatzky, R., & Bohm, E. R. (2018). The effect of multimorbidity on changes in health-related quality of life following hip and knee arthroplasty. Bone and Joint Journal, 100B(9), 1168–1174. https://doi.org/10.1302/0301-620X.100B9.BJJ-2017-1372.R1CrossRef

53.

Salyers, M., Bosworth, H., Swanson, J., Lamb-Pagone, J., & Osher, F. (2000). Reliability and validity of the SF-12 health survey among people with severe mental illness. Medical Care, 38, 1141–1150.CrossRef

54.

Cernin, P., Cresci, K., Jankowski, T., & Lichtenberg, P. (2010). Reliability and validity testing of the short-form health survey in a sample of community-dwelling African American older adults. Journal of Nursing Measurement, 18, 49–59.CrossRef

55.

Cheak-Zamora, N., Wyrwich, K., & McBride, T. (2009). Reliability and validity of the SF-12v2 in the medical expenditure panel survey. Quality of Life Research, 18, 727–735.CrossRef

56.

Yosef, H. (1988). A sharper Bonferroni procedure for multiple tests of significance. Biometrika, 75(4), 800–802.CrossRef

57.

Meade, A. W., & Wright, N. A. (2012). Solving the measurement invariance anchor item problem in item response theory. Journal of Applied Psychology, 97(5), 1016–1031. https://doi.org/10.1037/a0027934CrossRefPubMed

58.

Sedivy, S. K., Zhang, B., & Traxel, N. M. (2006). Detection of differential item functioning with polytomous items in the presence of missing data. In: Annual meeting of the National Council on Measurement in Education

59.

Rombach, I., Rivero-Arias, O., Gray, A. M., Jenkinson, C., & Burke, Ó. (2016). The current practice of handling and reporting missing outcome data in eight widely used PROMs in RCT publications: A review of the current literature. Quality of Life Research, 25(7), 1613–1623. https://doi.org/10.1007/s11136-015-1206-1CrossRefPubMedPubMedCentral

60.

Finch, H. (2008). Estimation of item response theory parameters in the presence of missing data. Journal of Educational Measurement, 45(3), 225–245.CrossRef

61.

Finch, W. H. (2010). Imputation methods for missing categorical questionnaire data: A comparison of approaches. Journal of Data Science, 8(3), 361–378. https://doi.org/10.6339/jds.2010.08(3).612CrossRef

Titel: A comparison of methods to address item non-response when testing for differential item functioning in multidimensional patient-reported outcome measures
Auteurs: Olawale F. Ayilara
Tolulope T. Sajobi
Ruth Barclay
Eric Bohm
Mohammad Jafari Jozani
Lisa M. Lix
Publicatiedatum: 07-04-2022
Uitgeverij: Springer International Publishing
Gepubliceerd in: Quality of Life Research / Uitgave 9/2022
Print ISSN: 0962-9343
Elektronisch ISSN: 1573-2649
DOI: https://doi.org/10.1007/s11136-022-03129-8

Bohn Stafleu van Loghum

Deel dit onderdeel of sectie (kopieer de link)

Abstract

Purpose

Methods

Results

Conclusion

Log in om toegang te krijgen

Andere artikelen Uitgave 9/2022

Patient-reported outcome measures of musculoskeletal symptoms and psychosocial factors in musicians: a systematic review of psychometric properties

Establishing content validity of LIMB-Q Kids: a new patient-reported outcome measure for lower limb deformities

Polypharmacy and trajectories of health-related quality of life in older adults: an Australian cohort study

Quality of mobility measures among individuals with acquired brain injury: an umbrella review

Health-related quality of life of breast and colorectal cancer patients undergoing active chemotherapy treatment: Patient-reported outcomes

A threshold explanation for the lack of variation in negative composite time trade-off values