Developing tailored instruments: item banking and computerized adaptive assessment

Bjorner, Jakob Bue; Chang, Chih-Hung; Thissen, David; Reeve, Bryce B.

doi:10.1007/s11136-007-9168-6

Developing tailored instruments: item banking and computerized adaptive assessment

Original Paper
Published: 15 February 2007

Volume 16, pages 95–108, (2007)
Cite this article

Quality of Life Research Aims and scope Submit manuscript

Jakob Bue Bjorner^1,2,
Chih-Hung Chang³,
David Thissen⁴ &
…
Bryce B. Reeve⁵

1513 Accesses
151 Citations
4 Altmetric
Explore all metrics

Abstract

Item banks and Computerized Adaptive Testing (CAT) have the potential to greatly improve the assessment of health outcomes. This review describes the unique features of item banks and CAT and discusses how to develop item banks. In CAT, a computer selects the items from an item bank that are most relevant for and informative about the particular respondent; thus optimizing test relevance and precision. Item response theory (IRT) provides the foundation for selecting the items that are most informative for the particular respondent and for scoring responses on a common metric. The development of an item bank is a multi-stage process that requires a clear definition of the construct to be measured, good items, a careful psychometric analysis of the items, and a clear specification of the final CAT. The psychometric analysis needs to evaluate the assumptions of the IRT model such as unidimensionality and local independence; that the items function the same way in different subgroups of the population; and that there is an adequate fit between the data and the chosen item response models. Also, interpretation guidelines need to be established to help the clinical application of the assessment. Although medical research can draw upon expertise from educational testing in the development of item banks and CAT, the medical field also encounters unique opportunities and challenges.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Online Questionnaires

Qualitative Research and Content Analysis

Research design: qualitative, quantitative, and mixed methods approaches / sixth edition

Article 15 November 2023

James P. Takona

References

Wainer, H., Dorans, N. J., & Eignor, D., et al. (2000). Computerized adaptive testing: A primer. Mahwah, NJ: Lawrence Erlbaum Associates.
Google Scholar
Fischer, G. H., & Molenaar, I. W. (1995). Rasch models—foundations, recent developments, and applications. Berlin: Springer-Verlag.
Google Scholar
Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991). Fundamentals of item response theory. London: Sage Publications.
Google Scholar
van der Linden, W. J., & Hambleton, R. K. (1997). Handbook of modern item response theory. Berlin: Springer.
Google Scholar
Ware, J. E., Jr., Bjorner, J. B., & Kosinski, M. (2000). Practical implications of item response theory and computerized adaptive testing: A brief summary of ongoing studies of widely used headache impact scales. Medical Care, 38, II73–II82
Article PubMed Google Scholar
Veit, C. L., & Ware, J. E., Jr. (1983). The structure of psychological distress and well-being in general populations. Journal of Consulting and Clinical Psychology, 51, 730–742.
Article PubMed CAS Google Scholar
Bock, R. D. (1997). The nominal categories model. In W. J. van der Linden & R. K. Hambleton (Eds.), Handbook of modern item response theory (pp. 3–50). Berlin: Springer.
Google Scholar
Muraki, E. (1992). A generalized partial credit model: Application of an EM algorithm. Applied Psychological Measurement, 16, 159–176.
Google Scholar
Muraki, E. (1997). A Generalized partial credit model. In W. J. van der Linden & R. K. Hambleton (Eds.), Handbook of modern item response theory (pp. 153–164). Berlin: Springer.
Google Scholar
Masters, G. N. (1982). A Rasch model for partial credit scoring. Psychometrika, 47, 149–173.
Article Google Scholar
Masters, G. N., & Wright, B. D. (1997). The partial credit model. In W. J. van der Linden & R. K. Hambleton (Eds.), Handbook of modern item response theory (pp. 101–122). Berlin: Springer.
Google Scholar
Andrich, D. (1978). A rating formulation for ordered response categories. Psychometrika, 43, 561–573.
Article Google Scholar
Samejima, F. (1997). Graded response model. In W. J. van der Linden & R. K. Hambleton (Eds.), Handbook of modern item response theory (pp. 85–100). Berlin: Springer.
Google Scholar
Samejima, F. (1969). Estimation of latent ability using a response pattern of graded scores. Psychometrika Monograph, 34(Suppl 17), 1–97.
Google Scholar
Lord, F. M., & Norvick, M. R. (1968). Statistical theories of mental test scores. Reading: Addison-Wesley.
Google Scholar
Mellenbergh, G. J. (1995). Conceptual notes on models for discrete polytomous item responses. Applied Psychological Measurement, 19, 91–100.
Article Google Scholar
Thissen, D., & Steinberg, L. (1986). A taxonomy of item response models. Psychometrika, 51, 567–577.
Article Google Scholar
Roberts, J. S., Donoghue, J. R., & Laughlin, J. E. (2000). A general item response theory model for unfolding unidimensional polytomous responses. Applied Psychological Measurement, 24, 3–32.
Google Scholar
Maydeu-Olivares, A., Drasgow, F., & Mead, A. D. (1994). Distinguishing among parametric item response models for polychotomous ordered data. Applied Psychological Measurement, 18, 245–256.
Article Google Scholar
Muthen, B. O. (1984). A general structural equation model with dichotomous, ordered categorical, and continuous latent variable indicators. Psychometrika, 29, 177–185.
Google Scholar
Takane, Y., & de Leeuw, J. (1987). On the relationship between item response theory and factor analysis of discretized variables. Psychometrika, 52, 393–408
Article Google Scholar
Muraki, E. (1993). Information functions of the generalized partial credit model. Applied Psychological Measurement, 17, 351–363.
Article Google Scholar
Bock, R. D., & Mislevy, R. J. (1982). Adaptive EAP estimation of ability in a microcomputer environment. Applied Psychological Measurement, 6, 431–444.
Article Google Scholar
Thissen, D., & Orlando, M. (2001). Item response theory for items scored in two categories. In D. Thissen & H. Wainer (Eds.), Test scoring (pp. 73–140). Mahwah: Lawrence Erlbaum.
Google Scholar
van der Linden, W. J. (2000). Constrained adaptive testing with shadow tests. In W. J. van der Linden & C. A. W. Glas (Eds.), Computerized adaptive testing, theory and practice (pp. 27–52). Dordrecht: Kluwer Academic Publishers.
Google Scholar
Warm, T. A. (1989). Weighted likelihood estimation of ability in item response theory. Psychometrika, 54, 427–450.
Article Google Scholar
Tarlov, A. R., Ware, J. E., Jr., Greenfield, S., Nelson, E. C., Perrin, E., & Zubkoff, M. (1989). The medical outcomes study. An application of methods for monitoring the results of medical care. JAMA, 262, 925–930.
Article PubMed CAS Google Scholar
Ware, J. E., Jr., Bayliss, M. S., Rogers, W. H., Kosinski, M., & Tarlov, A. R. (1996). Differences in 4-year health outcomes for elderly and poor, chronically ill patients treated in HMO and fee-for-service systems. Results from the Medical Outcomes Study. JAMA, 276, 1039–1047.
Article PubMed Google Scholar
Ware, J. E., Jr., & Kosinski, M. (2001). SF36 physical and mental health summary scales: A manual for users of version 1. Lincoln RI: QualityMetric Inc.
Google Scholar
Bjorner, J. B., Kosinski, M., & Ware, J. E., Jr. (2003). Calibration of an item pool for assessing the burden of headaches: An application of item response theory to the headache impact test (HIT). Quality of Life Research, 12, 913–933.
Article PubMed Google Scholar
Hill, C. D. (2004). Precisions of parameter estimates for the graded item response model. (Masters Thesis) Chapel Hill: University of North Carolina.
Tsutakawa, R. K., & Johnson, J. C. (1990). The effect of uncertainty of item parameter estimation on ability estimates. Psychometrika, 55, 371–390.
Article Google Scholar
Dillman, D. (2007). Mail and Internet surveys: The tailored design method—2007 update with new Internet, visual, and mixed-mode guide. New York, NY: J. Wiley.
Google Scholar
Bjorner, J. B., Ware, J. E., Jr., & Kosinski, M. (2003). The potential synergy between cognitive models and modern psychometric models. Quality of Life Research, 12, 261–274.
Article PubMed Google Scholar
McHorney, C. A., Kosinski, M., & Ware, J. E., Jr. (1994). Comparisons of the costs and quality of norms for the SF-36 health survey collected by mail versus telephone interview: Results from a national survey. Medical Care, 32, 551–567.
Article PubMed CAS Google Scholar
Cook, A. J., Roberts, D. A., Henderson, M. D., Van Winkle, L. C., Chastain, D. C., & Hamill-Ruth, R. J. (2004). Electronic pain questionnaires: A randomized, crossover comparison with paper questionnaires for chronic pain assessment. Pain, 110, 310–317.
Article PubMed Google Scholar
Ryan, J. M., Corry, J. R., Attewell, R., & Smithson, M. J. (2002). A comparison of an electronic version of the SF-36 general health questionnaire to the standard paper version. Quality of Life Research, 11, 19–26.
Article PubMed Google Scholar
Velikova, G., Wright, E. P., & Smith, A. B., et al. (1999). Automated collection of quality-of-life data: A comparison of paper and computer touch-screen questionnaires. Journal of Clinical Oncology, 17, 998–1007.
PubMed CAS Google Scholar
Muthen, B. O., & Muthen, L. (2001). Mplus user’s guide. Los Angeles: Muthén & Muthén.
Google Scholar
Chen, W.-H., & Thissen, D. (1997). Local dependence indexes for item pairs using item response theory. Educational and Behavioral Statistics, 22, 265–289.
Google Scholar
Christensen, K. B., Bjorner, J. B., Kreiner, S., & Petersen, J. H. (2002). Tests for unidimensionality in polytomous Rasch models. Psychometrika, 67, 563–574.
Article Google Scholar
Muraki, E., & Carlson, J. E. (1995). Full-information factor analysis for polytomous item responses. Applied Psychological Measurement, 19, 73–90.
Article Google Scholar
Stout, W., Habing, B., Douglas, J., Kim, R. H., Roussos, L., & Zhang, J. (2001). Conditional covariance-based nonparametric multidimensionality assessment. Psychological Measurement, 20, 331–354.
Article Google Scholar
Ramsay, J. O. (1995). TestGraf—a program for the graphical analysis of multiple choice test and questionnaire data. Montreal: McGill University.
Google Scholar
van der Linden, W. J., & Hambleton, R. K. (1997). Item response theory: Brief history, common models, and extensions. In W. J. van der Linden & R. K. Hambleton (Eds.), Handbook of modern item response theory (pp. 1–28). Berlin: Springer.
Google Scholar
Rasch, G. (1966). An item analysis which takes individual differences into account. The British Journal of Mathematical and Statistical Psychology, 19, 49–57.
PubMed CAS Google Scholar
Rasch, G. (1980). Probabilistic models for some intelligence and attainment tests. Chicago: University of Chicago Press.
Google Scholar
Andrich, D. (1988). Rasch models for measurement. Beverly Hills: Sage Publications.
Google Scholar
Andrich, D., & Luo, G.(2003). Conditional pairwise estimation in the Rasch model for ordered response categories using principal components. Journal of Applied Measurement, 4, 205–221.
PubMed Google Scholar
Molenaar, I. W. (1995). Estimation of item parameters. In G. H. Fischer & I. W. Molenaar (Eds.), Rasch models—foundations recent developments and applications (pp. 39–52). Berlin: Springer.
Google Scholar
Skrondal, A., & Rabe-Hesketh, S. (2004). Generalized latent variable modeling: Multilevel, longitudinal and structural equation models. Chapman & Hall, CRC.
Fischer, G. H., & Ponocny, I. (1995). Extended rating scale and partial credit models for assessing change. In G. H. Fischer & I. W. Molenaar (Eds.), Rasch models—foundations, recent developments, and applications (pp. 353–370). Berlin: Springer.
Google Scholar
Glas, C. A. W., & Verhelst, N. D. (1995). Tests of fit for polytomous Rasch models. In G. H. Fischer & I. W. Molenaar (Eds.), Rasch models—foundations, recent developments, and applications (pp. 325–352). Berlin: Springer.
Google Scholar
Glas, C. A. W., & Verhelst, N. D. (1995). Testing the Rasch model. In G. H. Fischer & I. W. Molenaar (Eds.), Rasch models—foundations, recent developments, and applications (pp. 69–95). Berlin: Springer.
Google Scholar
Muraki, E., & Bock, R. D. (1996). Parscale—IRT based test scoring and item analysis for graded open-ended exercises and performance tasks. Chicago: Scientific Software Inc.
Google Scholar
Stone, C. A., & Zhang, B. (2003). Assessing goodness of fit of item response theory models: A comparison of traditional and alternative procedures. The Journal of Educational Measurement, 4, 331–352.
Article Google Scholar
Stone, C. A. (2000). Monte Carlo based null distribution for an alternative goodness-of-fit test statistic in IRT models. The Journal of Educational Measurement, 37, 58–75.
Article Google Scholar
Stone, C. A. (2003). Empirical power and type I error rates for an IRT fit statistic that considers the precision of ability estimates. Educational and Psychological Measurement, 63, 566–586.
Article Google Scholar
Glas, C. A. W. (1999). Modification indices for the 2-PL and the nominal response model. Psychometrika, 64, 273–294.
Article Google Scholar
Orlando, M., & Thissen, D. (2000). Likelihood-based item-fit indices for dichotomous item response theory models. Applied Psychological Measurement, 24, 50–64.
Article Google Scholar
Bjorner, J. B., Kosinski, M., & Ware, J. E., Jr. (2003). Using item response theory to calibrate the Headache Impact Test (HIT) to the metric of traditional headache scales. Quality of Life Research, 12, 981–1002.
Article PubMed Google Scholar
Kosinski, M., Bayliss, M. S., & Bjorner, J. B., et al. (2003). A six-item short-form survey for measuring headache impact: the HIT-6. Quality of Life Research, 12, 963–974.
Article PubMed CAS Google Scholar
Sands, W. A., Waters, B. K., & McBride, J. R. (1997). Computerized adaptive testing: From inquiry to operation. Washington (DC): American Psychological Association.
Google Scholar
Berwick, D. M., Murphy, J. M., Goldman, P. A., Ware, J. E., Jr., Barsky, A. J., & Weinstein, M. C. (1991). Performance of a five-item mental health screening test. Medical Care, 29, 169–176.
Article PubMed CAS Google Scholar
van der Linden, W. J., & Pashley, P. J. (2000). Item selection and ability estimation in adaptive testing. In W. J. van der Linden & C. A. W. Glas (Eds.), Computerized adaptive testing, theory and practice (pp. 1–25). Dordrecht: Kluwer Adacemic Publishers.
Google Scholar
Karabatsos, G. (2003). Comparing the aberrant response detection performance of thirty-six person-fit statistics. Applied Measurement in Education, 16, 277–298.
Article Google Scholar
Ware, J. E., Jr., Snow, K. K., Kosinski, M., & Gandek, B.(1993). SF-36 health survey. Manual and interpretation guide. Boston: The Health institute, New England Medical Center.
Google Scholar
Ware, J. E., Jr., Kosinski, M., & Bjorner, J. B., et al. (2003). Applications of computerized adaptive testing (CAT) to the assessment of headache impact. Quality of Life Research, 12, 935–952.
Article PubMed Google Scholar
Bayliss, M. S., Dewey, J. E., & Dunlap, I., et al. (2003). A study of the feasibility of Internet administration of a computerized health survey: The headache impact test (HIT). Quality of Life Research, 12, 953–961.
Article PubMed CAS Google Scholar
Segall, D. O. (1996). Multidimensional adaptive testing. Psychometrika, 61, 331–354.
Article Google Scholar
Gardner, W., Kelleher, K. J., & Pajer, K. A. (2002). Multidimensional adaptive testing for mental health problems in primary care. Medical Care, 40, 812–823.
Article PubMed Google Scholar

Download references

Acknowledgements

This paper builds upon presentations by the authors at the conference: Advances in Health Outcomes Measurement: Exploring the Current State and the Future of Item Response Theory, Item Banks, and Computer-Adaptive Testing, Bethesda, MD, June, 2004. This work was supported in part by a grant from the Small Business Innovation Research Program of the National Institute of Neurological Disorders and Stroke, under grant title Computerized Adaptive Assessment of Headache Impact (grant no. 1R43NS047763-01) and in part by the National Institutes of Health through the NIH Roadmap for Medical Research Grant (AG015815), PROMIS Project. The authors would like to thank Howard Wainer of the National Board of Medical Examiners and three anonymous reviewers for comments on a previous version of the paper.

Author information

Authors and Affiliations

QualityMetric Incorporated, 640 George Washington Highway, Suite 201, Lincoln, RI, 02865, USA
Jakob Bue Bjorner
Health Assessment Lab, Waltham, MA, USA
Jakob Bue Bjorner
Northwestern University Feinberg School of Medicine, Chicago, IL, USA
Chih-Hung Chang
The University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
David Thissen
National Cancer Institute, NIH, Bethesda, MD, USA
Bryce B. Reeve

Authors

Jakob Bue Bjorner
View author publications
You can also search for this author in PubMed Google Scholar
Chih-Hung Chang
View author publications
You can also search for this author in PubMed Google Scholar
David Thissen
View author publications
You can also search for this author in PubMed Google Scholar
Bryce B. Reeve
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jakob Bue Bjorner.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bjorner, J.B., Chang, CH., Thissen, D. et al. Developing tailored instruments: item banking and computerized adaptive assessment. Qual Life Res 16 (Suppl 1), 95–108 (2007). https://doi.org/10.1007/s11136-007-9168-6

Download citation

Received: 25 August 2006
Accepted: 22 December 2006
Published: 15 February 2007
Issue Date: August 2007
DOI: https://doi.org/10.1007/s11136-007-9168-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Developing tailored instruments: item banking and computerized adaptive assessment

Abstract

Access this article

Similar content being viewed by others

Online Questionnaires

Qualitative Research and Content Analysis

Research design: qualitative, quantitative, and mixed methods approaches / sixth edition

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Developing tailored instruments: item banking and computerized adaptive assessment

Abstract

Access this article

Similar content being viewed by others

Online Questionnaires

Qualitative Research and Content Analysis

Research design: qualitative, quantitative, and mixed methods approaches / sixth edition

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation