Skip to main content
Top
Gepubliceerd in: Quality of Life Research 7/2018

07-07-2017 | Special Section: Test Construction (by invitation only)

Fit for purpose and modern validity theory in clinical outcomes assessment

Auteurs: Michael C. Edwards, Ashley Slagle, Jonathan D. Rubright, R. J. Wirth

Gepubliceerd in: Quality of Life Research | Uitgave 7/2018

Log in om toegang te krijgen
share
DELEN

Deel dit onderdeel of sectie (kopieer de link)

  • Optie A:
    Klik op de rechtermuisknop op de link en selecteer de optie “linkadres kopiëren”
  • Optie B:
    Deel de link per e-mail

Abstract

Purpose

The US Food and Drug Administration (FDA), as part of its regulatory mission, is charged with determining whether a clinical outcome assessment (COA) is “fit for purpose” when used in clinical trials to support drug approval and product labeling. In this paper, we will provide a review (and some commentary) on the current state of affairs in COA development/evaluation/use with a focus on one aspect: How do you know you are measuring the right thing? In the psychometric literature, this concept is referred to broadly as validity and has itself evolved over many years of research and application.

Review

After a brief introduction, the first section will review current ideas about “fit for purpose” and how it has been viewed by FDA. This section will also describe some of the unique challenges to COA development/evaluation/use in the clinical trials space. Following this, we provide an overview of modern validity theory as it is currently understood in the psychometric tradition. This overview will focus primarily on the perspective of validity theorists such as Messick and Kane whose work forms the backbone for the bulk of high-stakes assessment in areas such as education, psychology, and health outcomes.

Conclusions

We situate the concept of fit for purpose within the broader context of validity. By comparing and contrasting the approaches and the situations where they have traditionally been applied, we identify areas of conceptual overlap as well as areas where more discussion and research are needed.
Voetnoten
1
What a test measures goes by many names: construct, trait, latent variable, dimension, or domain. We use “construct” throughout the remainder of this document as the generic referent to what tests measure. It is a commonly used term and nicely conveys the core idea that what we are trying to measure is a theoretical construction.
 
2
We use terms like assessment, scale, inventory, and test interchangeably in this paper. While “test” is the dominant term in the educational arena (from where much validity theory has emanated) it is generic with respect to the larger points being made here.
 
Literatuur
1.
go back to reference Thissen, D., & Wainer, H. (2001). Test scoring. Mahwah, NJ: Lawrence Erlbaum Associates.CrossRef Thissen, D., & Wainer, H. (2001). Test scoring. Mahwah, NJ: Lawrence Erlbaum Associates.CrossRef
2.
go back to reference U.S. Department of Health and Human Services, Food and Drug Administration, Center for Drug Evaluation and Research, Center for Biologics Evaluation and Research, Center for Devices and Radiological Health. (2009). Guidance for industry patient-reported outcome measures: Use in medical product development to support labeling claims. Retrieved January 30, 2017, from http://www.fda.gov/downloads/Drugs/Guidances/UCM193282.pdf. Published December 2009 U.S. Department of Health and Human Services, Food and Drug Administration, Center for Drug Evaluation and Research, Center for Biologics Evaluation and Research, Center for Devices and Radiological Health. (2009). Guidance for industry patient-reported outcome measures: Use in medical product development to support labeling claims. Retrieved January 30, 2017, from http://​www.​fda.​gov/​downloads/​Drugs/​Guidances/​UCM193282.​pdf. Published December 2009
4.
go back to reference Patrick, D. L., Burke, L. B., Gwaltney, C. J., Kline Leidy, N., Martin, M. L., Molsen, E., et al. (2011). Content validity— establishing and reporting the evidence in newly-developed patient-reported outcomes (PRO) instruments for medical product evaluation: ISPOR PRO good research practices task force report: Part 1—eliciting concepts for a new PRO instrument. Value in Health, 14, 967–977.CrossRefPubMed Patrick, D. L., Burke, L. B., Gwaltney, C. J., Kline Leidy, N., Martin, M. L., Molsen, E., et al. (2011). Content validity— establishing and reporting the evidence in newly-developed patient-reported outcomes (PRO) instruments for medical product evaluation: ISPOR PRO good research practices task force report: Part 1—eliciting concepts for a new PRO instrument. Value in Health, 14, 967–977.CrossRefPubMed
5.
go back to reference Patrick, D. L., Burke, L. B., Gwaltney, C. J., Kline Leidy, N., Martin, M. L., Molsen, E., et al. (2011). Content validity—establishing and reporting the evidence in newly developed patient-reported outcomes (PRO) instruments for medical product evaluation: ISPOR PRO good research practices task force report: Part 2—assessing respondent understanding. Value in Health, 14, 978–988.CrossRefPubMed Patrick, D. L., Burke, L. B., Gwaltney, C. J., Kline Leidy, N., Martin, M. L., Molsen, E., et al. (2011). Content validity—establishing and reporting the evidence in newly developed patient-reported outcomes (PRO) instruments for medical product evaluation: ISPOR PRO good research practices task force report: Part 2—assessing respondent understanding. Value in Health, 14, 978–988.CrossRefPubMed
8.
go back to reference American Educational Research Association, American Psychological Association, National Council on Measurement in Education. (2014). Standards for educational and psychological testing. Washington, DC: American Educational Research Association. American Educational Research Association, American Psychological Association, National Council on Measurement in Education. (2014). Standards for educational and psychological testing. Washington, DC: American Educational Research Association.
9.
go back to reference Thorndike, E. L. (1918). The nature, purposes, and general methods of measurements of educational products. In G. M. Whipple (Ed.), The measurement of educational products. Seventeenth yearbook of the National Society for the Study of Education, Part II (pp. 16–24). Bloomington, IL: Public School Publishing Company. Thorndike, E. L. (1918). The nature, purposes, and general methods of measurements of educational products. In G. M. Whipple (Ed.), The measurement of educational products. Seventeenth yearbook of the National Society for the Study of Education, Part II (pp. 16–24). Bloomington, IL: Public School Publishing Company.
10.
go back to reference American Psychological Association. (1954). Technical recommendations for psychological tests and diagnostic techniques. Psychological Bulletin Supplement, 51(2), 1–38.CrossRef American Psychological Association. (1954). Technical recommendations for psychological tests and diagnostic techniques. Psychological Bulletin Supplement, 51(2), 1–38.CrossRef
11.
go back to reference Cronbach, L. J., & Meehl, P. E. (1955). Construct validity in psychological tests. Psychological Bulletin, 52, 281–302.CrossRefPubMed Cronbach, L. J., & Meehl, P. E. (1955). Construct validity in psychological tests. Psychological Bulletin, 52, 281–302.CrossRefPubMed
12.
go back to reference Pitoniak, M. J., Sireci, S. G., & Luecht, R. M. (2002). A multitrait-multimethod validity investigation of scores from a professional licensure examination. Educational and Psychological Measurement, 62(3), 498–516.CrossRef Pitoniak, M. J., Sireci, S. G., & Luecht, R. M. (2002). A multitrait-multimethod validity investigation of scores from a professional licensure examination. Educational and Psychological Measurement, 62(3), 498–516.CrossRef
13.
go back to reference Ebel, R. L. (1956). Obtaining and reporting evidence on content validity. Educational and Psychological Measurement, 16(3), 269–282.CrossRef Ebel, R. L. (1956). Obtaining and reporting evidence on content validity. Educational and Psychological Measurement, 16(3), 269–282.CrossRef
14.
go back to reference Sireci, S. G. (1998). The construct of content validity. Social Indicators Research, 45, 83–117.CrossRef Sireci, S. G. (1998). The construct of content validity. Social Indicators Research, 45, 83–117.CrossRef
15.
go back to reference Messick, S. (1975). The standard program: Meaning and values in measurement and evaluation. American Psychologist, 30, 955–966.CrossRef Messick, S. (1975). The standard program: Meaning and values in measurement and evaluation. American Psychologist, 30, 955–966.CrossRef
16.
go back to reference American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (1985). Standards for educational and psychological testing. Washington, DC: American Psychological Association. American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (1985). Standards for educational and psychological testing. Washington, DC: American Psychological Association.
17.
go back to reference Messick, S. (1988). The once and future issues of validity. Assessing the meaning and consequences of measurement. In H. Wainer and H. Braun (Eds.), Test validity (pp. 33–45). Hillsdale, NJ: Lawrence Erlbaum. Messick, S. (1988). The once and future issues of validity. Assessing the meaning and consequences of measurement. In H. Wainer and H. Braun (Eds.), Test validity (pp. 33–45). Hillsdale, NJ: Lawrence Erlbaum.
18.
go back to reference Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational measurement (3rd ed., pp. 13–103). New York, NY: American Council on Education and Macmillan. Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational measurement (3rd ed., pp. 13–103). New York, NY: American Council on Education and Macmillan.
19.
go back to reference Messick, S. (1995). Validity of psychological assessment: Validation of inferences from persons’ responses and performances as scientific inquiry into score meaning. American Psychologist, 50, 741–749.CrossRef Messick, S. (1995). Validity of psychological assessment: Validation of inferences from persons’ responses and performances as scientific inquiry into score meaning. American Psychologist, 50, 741–749.CrossRef
20.
go back to reference Kane, M. T. (2001). Current concerns in validity theory. Journal of Educational Measurement, 38(4), 319–342.CrossRef Kane, M. T. (2001). Current concerns in validity theory. Journal of Educational Measurement, 38(4), 319–342.CrossRef
21.
go back to reference Cronbach, L. J. (1980). Selection theory for a political world. Public Personnel Management, 9(1), 37–50.CrossRef Cronbach, L. J. (1980). Selection theory for a political world. Public Personnel Management, 9(1), 37–50.CrossRef
22.
go back to reference House, E. R. (1980). Evaluating with validity. Beverly Hills, CA: Sage. House, E. R. (1980). Evaluating with validity. Beverly Hills, CA: Sage.
23.
go back to reference Cronbach, L. J. (1988). Five perspectives on validity argument. In H. Wainer & H. Braun (Eds.), Test validity (pp. 3–17). Hillsdale, NJ: Lawrence Erlbaum. Cronbach, L. J. (1988). Five perspectives on validity argument. In H. Wainer & H. Braun (Eds.), Test validity (pp. 3–17). Hillsdale, NJ: Lawrence Erlbaum.
24.
go back to reference Kane, M. T. (1992). An argument-based approach to validation. Psychological Bulletin, 112, 527–535.CrossRef Kane, M. T. (1992). An argument-based approach to validation. Psychological Bulletin, 112, 527–535.CrossRef
25.
go back to reference Kane, M. T. (2013). Validating the Interpretations and Uses of Test Scores. Journal of Educational Measurement, 50(1), 1–73.CrossRef Kane, M. T. (2013). Validating the Interpretations and Uses of Test Scores. Journal of Educational Measurement, 50(1), 1–73.CrossRef
26.
go back to reference Kane, M. (2006). Validation. In R. Brennan (Ed.), Educational measurement (4th ed., pp. 17–64). Westport, CT: American Council on Education and Praeger. Kane, M. (2006). Validation. In R. Brennan (Ed.), Educational measurement (4th ed., pp. 17–64). Westport, CT: American Council on Education and Praeger.
27.
go back to reference Borsboom, D., Mellenbergh, G. J., & van Heerden, J. (2004). The concept of validity. Psychological Review, 111(4), 1061–1071.CrossRefPubMed Borsboom, D., Mellenbergh, G. J., & van Heerden, J. (2004). The concept of validity. Psychological Review, 111(4), 1061–1071.CrossRefPubMed
28.
go back to reference Hays, R. D., & Hadorn, D. (1992). Responsiveness to change: An aspect of validity, not a separate dimension. Quality of Life Research, 1, 73–75.CrossRefPubMed Hays, R. D., & Hadorn, D. (1992). Responsiveness to change: An aspect of validity, not a separate dimension. Quality of Life Research, 1, 73–75.CrossRefPubMed
29.
go back to reference Terwee, C. B., Dekker, F. W., Wiersinga, W. M., Prummel, M. F., & Bossuyt, P. M. (2003). On assessing responsiveness of health-related quality of life instruments: Guidelines for instrument evaluation. Quality of Life Research, 12(4), 349–362.CrossRefPubMed Terwee, C. B., Dekker, F. W., Wiersinga, W. M., Prummel, M. F., & Bossuyt, P. M. (2003). On assessing responsiveness of health-related quality of life instruments: Guidelines for instrument evaluation. Quality of Life Research, 12(4), 349–362.CrossRefPubMed
Metagegevens
Titel
Fit for purpose and modern validity theory in clinical outcomes assessment
Auteurs
Michael C. Edwards
Ashley Slagle
Jonathan D. Rubright
R. J. Wirth
Publicatiedatum
07-07-2017
Uitgeverij
Springer International Publishing
Gepubliceerd in
Quality of Life Research / Uitgave 7/2018
Print ISSN: 0962-9343
Elektronisch ISSN: 1573-2649
DOI
https://doi.org/10.1007/s11136-017-1644-z

Andere artikelen Uitgave 7/2018

Quality of Life Research 7/2018 Naar de uitgave

Special Section: Test Construction (by invitation only)

Measurement invariance, the lack thereof, and modeling change