Abstract
Experiments are considered where each of a sample of subjects is assigned to one of C categories separately by each of a fixed or varying group of observers. Building on earlier publications, general procedures are proposed to analyze agreements and disagreements among observers. In the case of a varying group of observers, it is shown that it is not necessary to demand a constant number of observers per subject. In the case of a fixed group of observers, the problem of missing data is considered.
The procedures are illustrated within the context of two clinical diagnosis examples. In the first example it is investigated which categories are relatively hard to distinguish from one another; a new theorem is applied that shows a useful property of the statistic kappa. In the second example it is investigated if a subgroup of observers can be found with a significantly higher degree of interobserver agreement.
Similar content being viewed by others
References
Cohen, J. (1960). A coefficient of agreement for nominal scales.Educational and Psychological Measurement, 20, 37–46.
Cohen, J. (1968). Weighted kappa: Nominal scale agreement with provision for scaled disagreement or partial credit.Psychological Bulletin, 70, 213–220.
Conger, A. J. (1980). Integration and generalization of kappas for multiple raters.Psychological Bulletin, 88, 322–328.
Efron, B. (1982).The jackknife, the bootstrp and other resampling plans. Philadelphia: S.I.A.M.
Efron, B., & Gong, G. (1983). A leisurely look at the bootstrap, the jackknife, and cross-validation.The American Statistician, 37, 36–48.
Fleiss, J. L. (1971). Measuring nominal scale agreement among many raters.Psychological Bulletin, 76, 378–382.
Fleiss, J. L., & Davies, M. (1982). Jackknifing functions of multinomial frequencies, with an application to a measure of concordance.American Journal of Epidemiology, 115, 841–845.
James, I. R. (1983). Analysis of nonagreements among multiple raters.Biometrics, 39, 651–657.
Kraemer, H. C. (1980). Extension of the kappa coefficient.Biometrics, 36, 207–216.
Parr, W. C., & Tolley, H. D. (1982). Jackknifing in categorical data analysis.The Australian Journal of Statistics, 24, 67–79.
Schouten, H. J. A. (1980). Measuring pairwise agreement among many observers.Biometrical Journal, 22, 497–504.
Schouten, H. J. A. (1982a). Measuring pairwise agreement among many observers, II: Some improvements and additions.Biometrical Journal, 24, 431–435.
Schouten, H. J. A. (1982b). Measuring pairwise interobserver agreement when all subjects are judged by the same observers.Statistica Neerlandica, 36, 45–61.
Schouten, H. J. A. (1985).Statistical Measurement of Interobserver Agreement. Unpublished doctoral dissertation, Erasmus University Rotterdam.
Van den Berge, J. H., Schouten, H. J. A., Boomstra, S., van Drunen Littel, S., & Braakman, R. (1979). Interobserver agreement in assessment of ocular signs in coma.Journal of Neurology, Neurosurgery and Psychiatry, 42, 1163–1168.
Author information
Authors and Affiliations
Additional information
The author gratefully acknowledges the valuable suggestions by W. Molenaar, R. van Strik, R. Popping and the referees.
Rights and permissions
About this article
Cite this article
Schouten, H.J.A. Nominal scale agreement among observers. Psychometrika 51, 453–466 (1986). https://doi.org/10.1007/BF02294066
Received:
Revised:
Issue Date:
DOI: https://doi.org/10.1007/BF02294066