Abstract
In the reliability analysis literature, little attention has been given to the various possible ways of creating a basis for the comparison required to compute observer agreement. One needs this comparison to turn a sequential list of behavioral records into a confusion matrix. It is shown that the way to do this depends on the research question one needs to answer. Four methods for creating a basis for comparison for the computation of observer agreement in observational data are presented. Guidelines are given for computing observer agreement in a way that fits one’s goals. Finally, we discuss how these methods have been implemented in The Observer software. The Observer 4.1 supports all the methods that have been discussed. Most of these methods are not present in any other software package.
Article PDF
Similar content being viewed by others
References
Bakeman, R., &Quera, V. (1995).Analyzing interaction: Sequential analysis with SDIS and GSEQ. New York: Cambridge University Press.
Bloor, R. N. (1983). A computer program to determine interrater reliability for dichotomous-ordinal rating scales.Behavior Research Methods & Instrumentation,15, 615.
Bravais, A. (1846). Analyse mathématique sur les probabilités des erreurs de situation d’un point.Mémoires présentés par divers savants à l’Académie royale des sciences de l’Institut de France,9, 255–332.
Burns, E., &Cavallaro, C. (1982). A computer program to determine interobserver reliability statistics.Behavior Research Methods & Instrumentation,14, 42.
Chan, T. S. C. (1987). A DBASE III program that performs significance testing for the Kappa coefficient.Behavior Research Methods, Instruments, & Computers,19, 53–54.
Cohen, J. (1960). A coefficient of agreement for nominal scales.Educational & Psychological Measurement,20, 37–46.
Galton, F. (1888). Co-relations and their measurement, chiefly from anthropometric data.Proceedings of the Royal Society of London,45, 135–145.
Haccou, P., &Meelis, E. (1992).Statistical analysis of behavioural data: An approachbased on time-structured models. Oxford: Oxford University Press.
Hollenbeck, A. R. (1978). Problems of reliability in observational data. In G. P. Sackett (Ed.),Observing behavior: Vol. 2. Data collection and analysis methods (pp. 79–98). Baltimore: University Park Press.
Johnson, S. M., &Bolstad, O. D. (1973). Methodological issues in naturalistic observation: Some problems and solutions for field research. In L. A. Hamerlynck, L. C. Handy, & E. J. Mash (Eds.),Behavior change: Methodology, concepts and practice (pp. 7–67). Champaign, IL: Research Press.
MacLean, W. E., Tapp, J.T., &Johnson, W.L. (1985). Alternate methods and software for calculating interobserver agreement for continuous observation data.Journal of Psychopathology & Behavioral Assessment,7, 65–73.
Noldus, L. P. J. J. (1991). The Observer: A software system for collection and analysis of observational data.Behavior Research Methods, Instruments, & Computers,23, 415–429.
Noldus, L. P. J. J., Trienes, R.J.H., Hendriksen, A.H.M., Jansen, H., &Jansen, R. G. (2000). The Observer Video-Pro: New software for the collection, management, and presentation of time-structured data from videotapes and digital media files.Behavior Research Methods, Instruments, & Computers,32, 197–206.
Pearson, K. (1920). Notes on the history of correlation.Biometrika,13, 25–45.
Repp, A. C., Harman, M. L., Felce, D., van Acker, R., &Karsh, K. G. (1989). Conducting behavioral assessments on computer-collected data.Behavioral Assessment,11, 249–268.
Robinson, B. F., &Bakeman, R. (1998). ComKappa: A Windows ’95 program for calculating kappa and related statistics.Behavior Research Methods, Instruments, & Computers,30, 731–734.
Strube, M. J. (1989). A general program for the calculation of the kappa coefficient.Behavior Research Methods, Instruments, & Computers,21, 643–644.
Suen, H. K., &Ary, D. (1989).Analyzing quantitative behavioral observation data. Hillsdale, NJ: Erlbaum.
Thomann, B. (2001). Observation and judgment in psychology: Assessing agreement among markings of behavioral events.Behavior Research Methods, Instruments, & Computers,33, 339–348.
Valiquette, C. A. M., Lesage, A. D., Cyr, M., &Toupin, J. (1994). Computing Cohen’s kappa coefficients using SPSS MATRIX.Behavior Research Methods, Instruments, & Computers,26, 60–61.
van der Vlugt, M. J., Kruk, M. R., van Erp, A.M.M., &Geuze, R. H. (1992). CAMERA: A system for fast and reliable acquisition of multiple ethological records.Behavior Research Methods, Instruments, & Computers,24, 147–149.
Watkins, M. W., &Larimer, L. D. (1980). Interrater agreement statistics with the microcomputer.Behavior Research Methods & Instrumentation,12, 466.
Author information
Authors and Affiliations
Corresponding author
Additional information
The authors have a commercial interest in the software described in this paper.
Rights and permissions
About this article
Cite this article
Jansen, R.G., Wiertz, L.F., Meyer, E.S. et al. Reliability analysis of observational data: Problems, solutions, and software implementation. Behavior Research Methods, Instruments, & Computers 35, 391–399 (2003). https://doi.org/10.3758/BF03195516
Received:
Accepted:
Issue Date:
DOI: https://doi.org/10.3758/BF03195516