Skip to main content
Top
Gepubliceerd in: Quality of Life Research 6/2014

01-08-2014

Getting serious about test–retest reliability: a critique of retest research and some recommendations

Auteur: Denise F. Polit

Gepubliceerd in: Quality of Life Research | Uitgave 6/2014

Log in om toegang te krijgen
share
DELEN

Deel dit onderdeel of sectie (kopieer de link)

  • Optie A:
    Klik op de rechtermuisknop op de link en selecteer de optie “linkadres kopiëren”
  • Optie B:
    Deel de link per e-mail

Abstract

Purpose

To focus attention on the need for rigorous and carefully designed test–retest reliability assessments for new patient-reported outcomes and to encourage retest researchers to be thoughtful, ambitious, and creative in their retest efforts.

Methods

The paper outlines key challenges that confront retest researchers, calls attention to some limitations in meeting those challenges, and describes some strategies to improve retest research.

Results

Modest retest coefficients are often reported as acceptable, and many important decisions—such as the retest interval—appear not to be evidence-based. Retest assessments are seldom undertaken before a measure has been finalized, which rules out using retest data to select strong, reproducible items.

Conclusions

Strategies for improving retest research include seeking input from patients or experts regarding the stability of the construct to support decisions about the retest interval, analyzing item-level retest data to identify items to revise or discard, establishing a priori standards of acceptability for reliability coefficients, using large, heterogeneous, and representative retest samples and collecting follow-up data to better understand consistent and inconsistent responses over time.
Literatuur
1.
go back to reference Brundage, M., Blazeby, J., Revicki, D., Bass, B., DeVet, H., Duffy, H., et al. (2013). Patient-reported outcomes in randomized clinical trials: Development of ISOQOL reporting standards. Quality of Life Research, 22, 1161–1175.PubMedCentralPubMedCrossRef Brundage, M., Blazeby, J., Revicki, D., Bass, B., DeVet, H., Duffy, H., et al. (2013). Patient-reported outcomes in randomized clinical trials: Development of ISOQOL reporting standards. Quality of Life Research, 22, 1161–1175.PubMedCentralPubMedCrossRef
2.
go back to reference Mokkink, L. B., Terwee, C., Patrick, D., Alonso, J., Stratford, P., Knol, D. L., et al. (2010). The COSMIN study reached international consensus on taxonomy, terminology, and definitions of measurement properties for health-related patient-reported outcomes. Journal of Clinical Epidemiology, 63, 737–745.PubMedCrossRef Mokkink, L. B., Terwee, C., Patrick, D., Alonso, J., Stratford, P., Knol, D. L., et al. (2010). The COSMIN study reached international consensus on taxonomy, terminology, and definitions of measurement properties for health-related patient-reported outcomes. Journal of Clinical Epidemiology, 63, 737–745.PubMedCrossRef
3.
go back to reference DeVellis, R. F. (2012). Scale development: Theory and application (3rd ed.). Thousand Oaks, CA: Sage. DeVellis, R. F. (2012). Scale development: Theory and application (3rd ed.). Thousand Oaks, CA: Sage.
4.
go back to reference Streiner, D. L. (2003). Being inconsistent about consistency: When coefficient alpha does and doesn’t matter. Journal of Personality Assessment, 80, 217–222.PubMedCrossRef Streiner, D. L. (2003). Being inconsistent about consistency: When coefficient alpha does and doesn’t matter. Journal of Personality Assessment, 80, 217–222.PubMedCrossRef
5.
go back to reference DeVet, H. C. W., Terwee, C., Mokkink, L. B., & Knol, D. L. (2011). Measurement in medicine: A practical guide. Cambridge: Cambridge University Press.CrossRef DeVet, H. C. W., Terwee, C., Mokkink, L. B., & Knol, D. L. (2011). Measurement in medicine: A practical guide. Cambridge: Cambridge University Press.CrossRef
6.
go back to reference U. S. Food and Drug Administration. (2009). Guidance for industry, patient-reported outcome measures: Use in medical product development to support labeling claims. Washington, DC: U. S. Department of Health and Human Services. U. S. Food and Drug Administration. (2009). Guidance for industry, patient-reported outcome measures: Use in medical product development to support labeling claims. Washington, DC: U. S. Department of Health and Human Services.
7.
go back to reference Polit, D. F., & Yang, F. (2014). Measurement and the measurement of change: A primer for health professionals. Philadelphia: Lippincott Williams & Wilkins. Polit, D. F., & Yang, F. (2014). Measurement and the measurement of change: A primer for health professionals. Philadelphia: Lippincott Williams & Wilkins.
8.
go back to reference Cronbach, L. (1947). Test “reliability”: Its meaning and determination. Psychometrika, 12, 1–16.PubMedCrossRef Cronbach, L. (1947). Test “reliability”: Its meaning and determination. Psychometrika, 12, 1–16.PubMedCrossRef
9.
go back to reference Nunnally, J., & Bernstein, I. H. (1994). Psychometric theory (3rd ed.). New York: McGraw-Hill. Nunnally, J., & Bernstein, I. H. (1994). Psychometric theory (3rd ed.). New York: McGraw-Hill.
10.
go back to reference Courvoisier, D., Cullati, S., Haller, C., Schmidt, R., Haller, G., Agoritsas, T., et al. (2013). Validation of a 10-item Care-related Regret Intensity Scale (RAI-10) for health care professionals. Medical Care, 51, 285–291.PubMedCrossRef Courvoisier, D., Cullati, S., Haller, C., Schmidt, R., Haller, G., Agoritsas, T., et al. (2013). Validation of a 10-item Care-related Regret Intensity Scale (RAI-10) for health care professionals. Medical Care, 51, 285–291.PubMedCrossRef
11.
go back to reference Simon, A. E., Forbes, L., Boniface, D., Warburton, F., Brain, K., Dessaix, A., et al. (2012). An international measure of awareness and beliefs about cancer: Development and testing of the ABC. BMJ Open, 2(6). doi:10.1136/bmjopen-2012-001758. Simon, A. E., Forbes, L., Boniface, D., Warburton, F., Brain, K., Dessaix, A., et al. (2012). An international measure of awareness and beliefs about cancer: Development and testing of the ABC. BMJ Open, 2(6). doi:10.​1136/​bmjopen-2012-001758.
12.
go back to reference Poelman, M. P., Vermeer, W. M., Vyth, E., & Steenhuis, I. (2013). “I don’t have to go to the gym because I ate very healthy today”: The development of a scale to assess diet-related compensatory health beliefs. Public Health Nutrition, 16, 267–273.PubMedCrossRef Poelman, M. P., Vermeer, W. M., Vyth, E., & Steenhuis, I. (2013). “I don’t have to go to the gym because I ate very healthy today”: The development of a scale to assess diet-related compensatory health beliefs. Public Health Nutrition, 16, 267–273.PubMedCrossRef
13.
go back to reference Ma, X., Barnes, T. L., Freedman, D., Bell, B., Colabianchi, N., & Liese, A. (2013). Test–retest reliability of a questionnaire measuring perceptions of neighbourhood food environment. Health & Place, 21, 65–69.CrossRef Ma, X., Barnes, T. L., Freedman, D., Bell, B., Colabianchi, N., & Liese, A. (2013). Test–retest reliability of a questionnaire measuring perceptions of neighbourhood food environment. Health & Place, 21, 65–69.CrossRef
14.
go back to reference Kröz, M., Schad, F., Reif, M., von Laue, H., Feder, G., Zerm, R., et al. (2011). Validation of the state version questionnaire on autonomic regulation (state-aR) for cancer patients. European Journal of Medical Research, 16, 457–468.PubMedCentralPubMedCrossRef Kröz, M., Schad, F., Reif, M., von Laue, H., Feder, G., Zerm, R., et al. (2011). Validation of the state version questionnaire on autonomic regulation (state-aR) for cancer patients. European Journal of Medical Research, 16, 457–468.PubMedCentralPubMedCrossRef
15.
go back to reference Watson, D. (2004). Stability versus change, dependability versus error: Issues in the assessment of personality over time. Journal of Research in Personality, 8, 319–350.CrossRef Watson, D. (2004). Stability versus change, dependability versus error: Issues in the assessment of personality over time. Journal of Research in Personality, 8, 319–350.CrossRef
16.
go back to reference Schmidt, F. L., Le, H., & Ilies, R. (2003). Beyond alpha: An empirical examination of the effects of different sources of measurement error on reliability estimates for measures of individual difference constructs. Psychological Methods, 8, 206–224.PubMedCrossRef Schmidt, F. L., Le, H., & Ilies, R. (2003). Beyond alpha: An empirical examination of the effects of different sources of measurement error on reliability estimates for measures of individual difference constructs. Psychological Methods, 8, 206–224.PubMedCrossRef
17.
go back to reference Tourangeau, R., Lance, J. R., & Rasinski, K. (2000). The psychology of survey response. Cambridge: Cambridge University Press.CrossRef Tourangeau, R., Lance, J. R., & Rasinski, K. (2000). The psychology of survey response. Cambridge: Cambridge University Press.CrossRef
18.
go back to reference Sprangers, M. A., & Schwartz, C. E. (1999). Integrating response shift into health-related quality-of-life research: A theoretical model. Social Science and Medicine, 48, 1507–1515.PubMedCrossRef Sprangers, M. A., & Schwartz, C. E. (1999). Integrating response shift into health-related quality-of-life research: A theoretical model. Social Science and Medicine, 48, 1507–1515.PubMedCrossRef
19.
go back to reference Rapkin, B. D., & Schwartz, C. E. (2004). Towards a theoretical model of quality-of-life appraisal: Implications of findings from studies of response shift. Health and Quality of Life Outcomes, 2, 14.PubMedCentralPubMedCrossRef Rapkin, B. D., & Schwartz, C. E. (2004). Towards a theoretical model of quality-of-life appraisal: Implications of findings from studies of response shift. Health and Quality of Life Outcomes, 2, 14.PubMedCentralPubMedCrossRef
20.
go back to reference Geere, J. H., Geere, J. L., & Hunter, P. R. (2013). Meta-analysis identifies Back Pain Questionnaire reliability influenced more by instrument than study design or population. Journal of Clinical Epidemiology, 66, 261–267.PubMedCrossRef Geere, J. H., Geere, J. L., & Hunter, P. R. (2013). Meta-analysis identifies Back Pain Questionnaire reliability influenced more by instrument than study design or population. Journal of Clinical Epidemiology, 66, 261–267.PubMedCrossRef
21.
go back to reference Willis, G. B. (2005). Cognitive interviewing. Thousand Oaks, CA: Sage. Willis, G. B. (2005). Cognitive interviewing. Thousand Oaks, CA: Sage.
22.
go back to reference Polit, D., Beck, C. T., & Owen, S. (2007). Is the CVI an acceptable indicator of content validity? Appraisal and recommendations. Research in Nursing & Health, 30, 459–467.CrossRef Polit, D., Beck, C. T., & Owen, S. (2007). Is the CVI an acceptable indicator of content validity? Appraisal and recommendations. Research in Nursing & Health, 30, 459–467.CrossRef
23.
go back to reference Nevo, B. (1977). Using item test–retest stability (ITRS) as a criterion for item selection. Educational and Psychological Measurement, 37, 847–852.CrossRef Nevo, B. (1977). Using item test–retest stability (ITRS) as a criterion for item selection. Educational and Psychological Measurement, 37, 847–852.CrossRef
24.
go back to reference Ashford, S., Turner-Stokes, L., Siegert, R., & Slade, M. (2013). Initial psychometric evaluation of the Arm Activity Measure (ArmA): A measure of activity in the hemiparetic arm. Clinical Rehabilitation, 27, 728–740.PubMedCrossRef Ashford, S., Turner-Stokes, L., Siegert, R., & Slade, M. (2013). Initial psychometric evaluation of the Arm Activity Measure (ArmA): A measure of activity in the hemiparetic arm. Clinical Rehabilitation, 27, 728–740.PubMedCrossRef
25.
go back to reference Jones, R. R., & Goldberg, L. R. (1967). Interrelationships among personality scale parameters: Item response stability and scale reliability. Educational and Psychological Measurement, 27, 323–333.CrossRef Jones, R. R., & Goldberg, L. R. (1967). Interrelationships among personality scale parameters: Item response stability and scale reliability. Educational and Psychological Measurement, 27, 323–333.CrossRef
26.
go back to reference Yorke, J., Swigris, J., Russell, A., Moosavi, S. H., Kwong, G. N. M., Longshaw, M., et al. (2011). Dyspnea-12 is a valid and reliable measure of breathlessness in patients with interstitial lung disease. Chest, 139, 159–164.PubMedCentralPubMedCrossRef Yorke, J., Swigris, J., Russell, A., Moosavi, S. H., Kwong, G. N. M., Longshaw, M., et al. (2011). Dyspnea-12 is a valid and reliable measure of breathlessness in patients with interstitial lung disease. Chest, 139, 159–164.PubMedCentralPubMedCrossRef
27.
go back to reference Deyo, R. A., Diehr, P., & Patrick, D. L. (1991). Reproducibility and responsiveness of health status measures: Statistics and strategies for evaluation. Controlled Clinical Trials, 12(4 suppl), 142S–158S.PubMedCrossRef Deyo, R. A., Diehr, P., & Patrick, D. L. (1991). Reproducibility and responsiveness of health status measures: Statistics and strategies for evaluation. Controlled Clinical Trials, 12(4 suppl), 142S–158S.PubMedCrossRef
28.
go back to reference Giraudeau, B., & Mary, J. Y. (2001). Planning a reproducibility study: How many subjects and how many replicates per subject for an expected width of 95 percent confidence interval for the intraclass correlation coefficient? Statistics in Medicine, 20, 3205–3214.PubMedCrossRef Giraudeau, B., & Mary, J. Y. (2001). Planning a reproducibility study: How many subjects and how many replicates per subject for an expected width of 95 percent confidence interval for the intraclass correlation coefficient? Statistics in Medicine, 20, 3205–3214.PubMedCrossRef
29.
go back to reference Terwee, C. B., Mokkink, L. B., Knol, D. L., Ostelo, R., Bouter, L. M., & DeVet, H. C. W. (2012). Rating the methodological quality in systematic reviews of studies on measurement properties: A scoring system for the COSMIN checklist. Quality of Life Research, 21, 651–657.PubMedCentralPubMedCrossRef Terwee, C. B., Mokkink, L. B., Knol, D. L., Ostelo, R., Bouter, L. M., & DeVet, H. C. W. (2012). Rating the methodological quality in systematic reviews of studies on measurement properties: A scoring system for the COSMIN checklist. Quality of Life Research, 21, 651–657.PubMedCentralPubMedCrossRef
Metagegevens
Titel
Getting serious about test–retest reliability: a critique of retest research and some recommendations
Auteur
Denise F. Polit
Publicatiedatum
01-08-2014
Uitgeverij
Springer International Publishing
Gepubliceerd in
Quality of Life Research / Uitgave 6/2014
Print ISSN: 0962-9343
Elektronisch ISSN: 1573-2649
DOI
https://doi.org/10.1007/s11136-014-0632-9

Andere artikelen Uitgave 6/2014

Quality of Life Research 6/2014 Naar de uitgave