Skip to main content
Log in

Constructing an Item Bank Using Item Response Theory: The AMC Linear Disability Score Project

  • Published:
Health Services and Outcomes Research Methodology Aims and scope Submit manuscript

Abstract

Patient relevant outcomes, such as cognitive functioning and functional status, measured using questionnaires, have become important endpoints in medical studies. Traditionally, responses to individual items are simply summed to obtain a score for each patient. Recently, there has been interest in another paradigm, item response theory (IRT), proposed as an alternative to summed scores. The benefits of the use of IRT are greatest, when it is used in conjunction with a calibrated item bank. This is a collection of items, which have been presented to large groups of patients, whose responses are used to estimate the measurement properties of the individual items. This article examines the methodology surrounding the use of IRT to construct and calibrate an item bank and uses the AMC Linear Disability Score project, which aims to develop an item bank to measure functional status as expressed by the ability to perform activities of daily life, as an illustration.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Badia, X., Prieto, L., Roset, M., Diez-Perez, A., and Herdman, M., “Development of a short osteoporosis quality of life questionnaire by equating items from two existing instruments,” J. Clin. Epidemiol. 55(1), 32–40, 2002.

    Google Scholar 

  2. Birnbaum, A., “Some Latent trait models and their use in inferring an examinee's ability,” in Statistical theories of mental test scores (F.M. Lord and M.R. Novick, eds.) Reading, MA, Addison-Wesley, 1968.

    Google Scholar 

  3. Bock, R.D., “Estimating item parameters and latent ability when responses are scored in two or more nominal categories,” Psychometrika 37, 29–51, 1972.

    Google Scholar 

  4. Breithaupt, K. and McDowell, I., “Considerations for measuring functioning of the elderly: IRM dimensionality and scaling analysis,” Health Services and Outcomes Research Methodology 2, 37–50, 2001.

    Google Scholar 

  5. Cella, D. and Chang, C.H., “A discussion of item response theory and its application in health status assessment,” Med. Care 38, II66–II72, 2000.

    Google Scholar 

  6. Cook, K.F., Rabeneck, L., Campbell, C.J., and Wray, N.P., “Evaluation of a multidimensional measure of dyspepsia-related health for use in a randomized clinical trial,” J. Clin. Epidemiol. 52(5), 381–392, 1999.

    Google Scholar 

  7. Cronbach, L.J., “Coefficient alpha and the internal structure of tests,” Psychometrika 16, 297–334, 1951.

    Google Scholar 

  8. Ebel, R.L. and Frisbie, D.A., Essentials of educational measurement. Prentice-Hall, Engelwood Cliffs, 1986.

    Google Scholar 

  9. Fayers, P.M., Curran, D., and Machin, D., “Incomplete quality of life data in randomized trials: Missing items,” Stat Med. 15(17), 679–696, 1998.

    Google Scholar 

  10. Fischer, G.H. and Molenaar, I.W. (eds.), Rasch models: Foundations, recent developments and applications. Springer-Verlag, New York, 1995.

    Google Scholar 

  11. Gibbons, R.D., Clark, D.C., vonAmmonCavanaugh, S., and Davis, J.M., “Application of modern psychometric theory in psychiatric research,” J. Psychiatr. Res. 19, 43–55, 1985.

    Google Scholar 

  12. Glas, C.A.W., “Detection of differential item functioning using Lagrange multiplier tests,” Statistica Sinica 8, 647–667, 1998.

    Google Scholar 

  13. Glass, T.A., “Conjugating the ‘tenses’ of functioning: Disconcordance among hypothetical, experimental, and enacted function in older adults,” Gerontologist 38, 101–112, 1998.

    Google Scholar 

  14. Hambleton, R.K., “Emergence of item response modelling in instrument development and data analysis,” Medical Care 38, II60–II65, 2000.

    Google Scholar 

  15. Hays, R.D., Morales, L.S., and Reise, S.P., “Item response theory and health outcomes measurement in the 21st century,” Med. Care 38, II28–II42, 2000.

    Google Scholar 

  16. Hoijtink, H. and Boomsma, A., “On person parameter estimation in the dichotomous Rasch model,” in Rasch models: Foundations, recent developments and applications (G.H. Fischer and I.W. Molenaar, eds.), Springer-Verlag, New York, 1995.

    Google Scholar 

  17. Holman, R. and Berger, M.P.F., “Optimal calibration designs for tests of polytomously scored items described by item response theory models,” Journal of Educational and Behavioural Statitics 26, 361–380, 2001.

    Google Scholar 

  18. Holman, R., Glas, C.A.W., Zwinderman, A.H., and de Haan, R.J., The treatment of not applicableí responses in an item bank to measure functional status using item response theory. Poster presented at the 23rd meeting of the International Society for Biostatistics. Held in Dijon, France. 11–13 September 2002.

  19. Holman, R., Lindeboom, R., Vermeulen, R., Glas, C.A.W., and de Haan, R.J., “The Amsterdam Linear Disability Score (ALDS) project. The calibration of an item bank to measure functional status using item response theory,” Quality of Life Newsletter 27, 4–5, 2001.

    Google Scholar 

  20. Karagiozis, H., Gray, S., Sacco, J. et al., “The Direct Assessment of Functional Abilities (DAFA): A comparison to an indirect measure of instrumental activities of daily living,” Gerontologist 38, 113–121, 1998.

    Google Scholar 

  21. Kolen, M.J. and Brennan, R.L., Test equating. Springer, New York, 1995.

    Google Scholar 

  22. Kosinski, M., Bjorner, J.B., Ware, J.E., Batenhorst, A., and Cady, R.K., “The responsiveness of headache impact scales scored using ‘classical’ and ‘modern’ psychometric methods: A re-analysis of three clinical trials,” Accepted for publication in Qual Life Res.

  23. Lindeboom, R., Vermeulen, M., Holman, R., and de Haan, R.J., “Activities of daily living instruments in clinical neurology. Optimizing scales for neurologic assessments,” Neurology 60, 738–742, 2003.

    Google Scholar 

  24. Lord, F.M., Applications of item response theory to practical testing problems. LEA, Hillsdale, NJ, 1980.

    Google Scholar 

  25. Lord, F.M., “Small N ustifies Rasch model,” in New horizons in testing (D.J. Weiss, ed.), Academic Press, New York, NJ, 1983.

    Google Scholar 

  26. MacKnight, C. and Rockwood, K., “Rasch analysis of the hierarchical assessment of balance and mobility (HABAM),” J. Clin. Epidemiol. 53(12), 1242–1247, 2000.

    Google Scholar 

  27. McDowell, I. and Newall, C., Measuring health:Aguide to rating scales and questionnaires. Oxford University Press, Oxford, 1996.

    Google Scholar 

  28. McHorney, C.A., Haley, S.M., and Ware, J.E. Jr., “Evaluation of the MOS SF-36 Physical Functioning Scale (PF-10): II. Comparison of relative precision using Likert and Rasch scoring methods,” J. Clin. Epidemiol. 50(4), 451–461, 1997

    Google Scholar 

  29. McHorney, C.A., Ware, J.E. Jr., Lu, J.F., and Sherbourne, C.D., “The MOS 36-item Short-Form Health Survey (SF-36): III. Tests of data quality, scaling assumptions, and reliability across diverse patient groups,” Med. Care 32, 40–66, 1994.

    Google Scholar 

  30. McKinley, R., and Mills, C., “A comparison of several goodness-of-fit statistics,” Applied Psychological Measurement 9, 49–57, 1985.

    Google Scholar 

  31. Molenaar, I.W., “Estimation of item parameters,” in Rasch models: Foundations, recent developments and applications (G.H. Fischer and I.W. Molenaar, eds.), Springer-Verlag, New York, 1995.

    Google Scholar 

  32. Orlando, M. and Thissen, D., “Likelihood-based item-fit indicies for dichotommous item respons theory models,” Applied Psychological Measurement 24, 50–64, 2000.

    Google Scholar 

  33. Raczek, A.E., Ware, J.E., Bjorner, J.B., Gandek, B., Haley, S.M., Aaronson, N.K., Apolone, G., Bech, P., Brazier, J.E., Bullinger, M., and Sullivan, M., “Comparison of Rasch and summated rating scales constructed from SF-36 physical functioning items in seven countries: Results from the IQOLA Project. International quality of life assessment,” J. Clin. Epidemiol. 51, 1203–1214, 1998.

    Google Scholar 

  34. Rasch, G., Probabalistic models for aome intellegence and attainment tests. Danish Institute for Educational Research, Copenhagen, 1960.

    Google Scholar 

  35. Sager, M.A., Dunham, N.C., Schwantes, A. et al., “Measurement of activities of daily living in hospitalized elderly: A comparison of self-report and performance-based methods,” J. Am. Geriatr. Soc. 40, 457–462, 1992.

    Google Scholar 

  36. Streiner, D.L. and Norman, G.R., Health measurement scales: A practical guide to their development and use. Oxford University Press, Oxford, 1995.

    Google Scholar 

  37. Teresi, J.A., Golden, R.R., Cross, P., Gurland, B., Kleinman, M., and Wilder, D., “Item bias in cognitive screening measures: Comparisons of elderly white, Afro-American, Hispanic and high and low education subgroups,” J. Clin. Epidemiol. 48, 473–483, 1995.

    Google Scholar 

  38. Teresi, J.A., Kleinman, M., and Ocepek-Welikson, K., “Modern psychometric methods for detection of differential item functioning: Application to cognitive assessment measures,” Statistics in Medicine 19, 1651–1683, 2000.

    Google Scholar 

  39. Thissen, D., “Marginal maximum likelihood estimation for the one parameter logistic model,” Psychometrika 47, 175–186, 1982.

    Google Scholar 

  40. Thissen, D., MULTILOG user's guide: Multiple categorical item analysis and test scoring using item response theory. Scientific Software, Chicago, 1991.

    Google Scholar 

  41. Thissen, D. and Steinberg, L., “A taxonomy of item response models,” Psychometrika 51, 567–577, 1986.

    Google Scholar 

  42. Thissen, D. and Wainer, H., Test scoring. LEA, Mahwah, NJ.

  43. van Buuren, S. and Hopman-Rock, M., “Revision of the ICIDH severity of disabilities scale by data linking and item response theory,” Stat. Med. 20, 1061–1076, 2001.

    Google Scholar 

  44. van der Linden, W. and Glas, C.A.W. (eds.), Computerised adaptive testing: Theory and practice. Kluwer, Boston, MA, 2000.

    Google Scholar 

  45. van den Wollenberg, AL., “Two new tests for the Rasch model,” Psychometrika 47, 123–140, 1982.

    Google Scholar 

  46. Van Straten, A., de Haan, R.J., Limburg, M. et al., “Clinical meaning of the stroke-adapted sickness impact profile-30 and the sickness impact profile-136,” Stroke 31, 2610–2615, 2000.

    Google Scholar 

  47. Verbrugge, L.M. and Jette, A.M., “The disablement process,” Soc. Sci. Med. 38, 1–14, 1994.

    Google Scholar 

  48. Verhelst, N.D. and Glas, C.A.W., “The one parameter logistic model,” in Rasch models: Foundations, Recent Developments and Applications (G.H. Fischer and I.W. Molenaar, eds.), Springer-Verlag, New York, 1995.

    Google Scholar 

  49. Verhelst, N.D., Glas, C.A.W., and Verstralen, H.H.F.M., OPLM Computer program and manual. Arnhem, The Netherlands: CITO, 1994. Information on obtaining the software can be obtained from pok@cito.nl.

    Google Scholar 

  50. Walters, S.J., Campbell, M.J., and Paisley, S., “Methods for determining sample sizes for studies involving health-related quality of life measures: A tutorial,” Health Services and Outcomes Research Methodology 2, 83–99, 2001.

    Google Scholar 

  51. Yen, W., “Using simulation results to choose a latent trait model,” Applied Psychological Measurement 5, 245–262, 1981.

    Google Scholar 

  52. Zimowski, M.F., Mukari, E., Mislevy, R.J., and Bock, R.D., BILOG-MG. Multiple group IRT analysis and test maintenance for binary items. Software International, Inc., Scientific, Chicago, IL, 1996. www.ssicentral.com/irt.htm.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rebecca Holman.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Holman, R., Lindeboom, R., Glas, C.A. et al. Constructing an Item Bank Using Item Response Theory: The AMC Linear Disability Score Project. Health Services & Outcomes Research Methodology 4, 19–33 (2003). https://doi.org/10.1023/A:1025824810390

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1025824810390

Navigation