Skip to main content
Log in

Interobserver Agreement in Behavioral Research: Importance and Calculation

Journal of Behavioral Education Aims and scope Submit manuscript

Abstract

Behavioral researchers have developed a sophisticated methodology to evaluate behavioral change which is dependent upon accurate measurement of behavior. Direct observation of behavior has traditionally been the mainstay of behavioral measurement. Consequently, researchers must attend to the psychometric properties, such as interobserver agreement, of observational measures to ensure reliable and valid measurement. Of the many indices of interobserver agreement, percentage of agreement is the most popular. Its use persists despite repeated admonitions and empirical evidence indicating that it is not the most psychometrically sound statistic to determine interobserver agreement due to its inability to take chance into account. Cohen's (1960) kappa has long been proposed as the more psychometrically sound statistic for assessing interobserver agreement. Kappa is described and computational methods are presented.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

REFERENCES

  • American Psychological Association, American Educational Research Association, and National Council on Measurement in Education. (1985). Standards for educational and psychological testing. Washington, DC: American Psychological Association.

    Google Scholar 

  • Baer, D. M. (1977). Reviewer's comment: Just because it's reliable doesn't mean that you can use it. Journal of Applied Behavior Analysis, 10, 117–119.

    Google Scholar 

  • Berk, R. A. (1979). Generalizability of behavioral observations: A clarification of interobserver agreement and interobserver reliability. American Journal of Mental Deficiency, 83, 460–472.

    Google Scholar 

  • Cicchetti, D. V. (1994). Guidelines, criteria, and rules of thumb for evaluating normed and standardized assessment instruments in psychology. Psychological Assessment, 6, 284–290.

    Google Scholar 

  • Ciminero, A. R., Calhoun, K. S., & Adams, H. E. (Eds.). (1986). Handbook of behavioral assessment (2nd ed.). New York: Wiley.

    Google Scholar 

  • Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20, 37–46.

    Google Scholar 

  • Cone, J. D. (1977). The relevance of reliability and validity for behavioral assessment. Behavior Therapy, 8, 411–426.

    Google Scholar 

  • Cone, J. D. (1988). Psychometric considerations and the multiple models of behavioral assessment. In A. S. Bellack & M. Hersen (Eds.), Behavioral assessment: A practical handbook (3rd Edition). NY: Pergamon.

    Google Scholar 

  • Dunn, G., & Everitt, B. (1995). Clinical biostatistics: An introduction to evidence-based medicine. London: Edward Arnold.

    Google Scholar 

  • Everitt, B. S. (1994). Statistical methods for medical investigations (2nd Edition). NY: Halsted Press.

    Google Scholar 

  • Fleiss, J. L. (1971). Measuring nominal scale agreement among many raters. Psychological Bulletin, 76, 378–382.

    Google Scholar 

  • Fleiss, J. L. (1981). Statistical methods for rates and proportions. NY: Wiley.

    Google Scholar 

  • Foster, S. L., BellDolan, D. J., & Burge, D. A. (1988). Behavioral observation. In A. S. Bellack & M. Hersen (Eds.), Behavioral assessment: A practical handbook (3rd Edition). NY: Pergamon.

    Google Scholar 

  • Gresham, F. M. (1998). Designs for evaluating behavior change. In T. S. Watson & F. M. Gresham (Eds.), Handbook of child behavior therapy. NY: Plenum.

    Google Scholar 

  • Hartmann, D. P. (1977, Spring). Considerations in the choice of interobserver reliability estimates. Journal of Applied Behavior Analysis, 10, 103–116.

    Google Scholar 

  • Hoge, R. D. (1985). The validity of direct observation measures of pupil classroom behavior. Review of Educational Research, 55, 469–483.

    Google Scholar 

  • Hops, H., Davis, B., & Longoria, N. (1995). Methodological issues in direct observation: Illustrations with the living in familial environments (LIFE) coding system. Journal of Clinical Child Psychology, 24, 193–203.

    Google Scholar 

  • Johnson, J. M., & Pennypacker, H. S. (1993). Strategies and tactics of human behavioral research (2nd Edition). Hillsdale, NJ: Erlbaum.

    Google Scholar 

  • Landis, J. R., & Koch, G. G. (1977). The measurement of observer agreement for categorical data. Biometrics, 33, 159–174.

    Google Scholar 

  • Langenbucher, J., Labouvie, E., & Morgenstern, J. (1996). Methodological developments: Measuring diagnostic agreement. Journal of Consulting and Clinical Psychology, 64, 1285–1289.

    Google Scholar 

  • McDermott, P. A. (1988). Agreement among diagnosticians or observers: Its importance and determination. Professional School Psychology, 3, 225–240.

    Google Scholar 

  • Nelson, L. D., & Cicchetti, D. V. (1995). Assessment of emotional functioning in brainimpaired individuals. Psychological Assessment, 7, 404–413.

    Google Scholar 

  • Shrout, P. E., Spitzer, R. L.,& Fleiss, J. L. (1987). Comment: Quantification of agreement in psychiatric diagnosis revisited. Archives of General Psychiatry, 44, 172–178.

    Google Scholar 

  • Suen, H. K. (1988). Agreement, reliability, accuracy, and validity: Toward a clarification. Behavioral Assessment, 10, 343–366.

    Google Scholar 

  • Suen, H. K., & Lee, P. S. (1985). Effects of the use of percentage agreement on behavioral observation reliabilities: A reassessment. Journal of Psychopathology and Behavioral Assessment, 7, 221–234.

    Google Scholar 

  • Wasik, B. H., & Loven, M. D. (1980). Classroom observational data: Sources of inaccuracy and proposed solutions. Behavioral Assessment, 2, 211–227.

    Google Scholar 

  • Watkins, M. W. (1988). MacKappa [Computer software]. Pennsylvania State University: Author.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Watkins, M.W., Pacheco, M. Interobserver Agreement in Behavioral Research: Importance and Calculation. Journal of Behavioral Education 10, 205–212 (2000). https://doi.org/10.1023/A:1012295615144

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1012295615144

Navigation