Original Article

School of Journalism and Communication, Jinan University, Guangzhou, Guangdong, PR China

Published Online:January 01, 2015https://doi.org/10.1027/1614-2241/a000086

Abstract

Although intercoder reliability has been considered crucial to the validity of a content study, the choice among them has been controversial. This study analyzed all the content studies published in the two major communication journals that reported intercoder reliability, aiming to find how scholars conduct the intercoder reliability test. The results revealed that some intercoder reliability indices were misused persistently concerning the levels of measurement, the number of coders, and the means of reporting reliability over the past 30 years. Implications of misuse, disuse, and abuse were discussed, and suggestions regarding proper choice of indices in various situations were made at last.

References

Artstein, R., & Poesio, M. (2008). Inter-coder agreement for computational linguistics. Computational Linguistics, 34, 555–596. doi: 10.1162/coli.07-034-R2v First citation in article Crossref, Google Scholar
Bennett, E. M., Alpert, R., & Goldstein, A. C. (1954). Communications through limited-response questioning. Public Opinion Quarterly, 18, 303–308. doi: 10.1086/266520 First citation in article Crossref, Google Scholar
Bland, J. M., & Altman, D. G. (1999). Measuring agreement in method comparison studies. Statistical Methods in Medical Research, 8, 135–160. doi: 10.1177/096228029900800204 First citation in article Crossref, Google Scholar
Brennan, R., & Prediger, D. (1981). Coefficient kappa: Some uses, misuses, and alternatives. Educational and Psychological Measurement, 41, 687–699. doi: 10.1177/001316448104100307 First citation in article Crossref, Google Scholar
Byrt, T., Bishop, J., & Carlin, J. B. (1993). Bias, prevalence and kappa. Journal of Clinical Epidemiology, 46, 423–429. doi: 10.1016/0895-4356(93)90018-V First citation in article Crossref, Google Scholar
Campbell, D., & Fiske, D. (1959). Convergent and discriminant validation by the multitrait-multimethod matrix. Psychological Bulletin, 56, 81–105. doi: 10.1037/h0046016 First citation in article Crossref, Google Scholar
Canty, A., & Ripley, B. (2012). Boot: Bootstrap functions [Computer software manual]. March(1.3–4th ed.). Vienna, Austria: R Foundation for Statistical Computing Retrieved from cran.r-project.org/web/packages/boot/index.html First citation in article Google Scholar
Cicchetti, D., & Feinstein, A. (1990). High agreement but low kappa: II. Resolving the paradoxes. Journal of Clinical Epidemiology, 43, 551–558. doi: 10.1016/0895-4356(90)90159-M First citation in article Crossref, Google Scholar
Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20, 37–46. doi: 10.1177/001316446002000104 First citation in article Crossref, Google Scholar
Cohen, J. (1968). Weighted kappa: Nominal scale agreement provision for scaled disagreement or partial credit. Psychological Bulletin, 70, 213–220. doi: 10.1037/h0026256 First citation in article Crossref, Google Scholar
Conger, A. (1980). Integration and generalization of kappas for multiple raters. Psychological Bulletin, 88, 322–328. doi: 10.1037/0033-2909.88.2.322 First citation in article Crossref, Google Scholar
Cronbach, L., Rajaratnam, N., & Gleser, G. (1963). Theory of generalizability: A liberalization of reliability theory. British Journal of Statistical Psychology, 16(2), 137–163. doi: 10.1111/j.2044-8317.1963.tb00206.x First citation in article Crossref, Google Scholar
Feinstein, A. R., & Cicchetti, D. V. (1990). High agreement but low kappa: I. The problems of two paradoxes. Journal of Clinical Epidemiology, 43, 543–549. doi: 10.1016/0895-4356(90)90158-L First citation in article Crossref, Google Scholar
Feng, G. C. (2013a). Factors affecting intercoder reliability: A Monte Carlo experiment. Quality and Quantity, 47, 2959–2982. doi: 10.1007/s11135-012-9745-9 First citation in article Crossref, Google Scholar
Feng, G. C. (2013b). Underlying determinants driving agreement among coders. Quality and Quantity, 47, 2983–2997. doi: 10.1007/s11135-012-9807-z First citation in article Crossref, Google Scholar
Feng, G. C. (2014). Estimating intercoder reliability: A structural equation modeling approach. Quality & Quantity, 48, 2355–2369. doi: 10.1007/s11135-014-0034-7 First citation in article Crossref, Google Scholar
Finn, R. (1970). A note on estimating the reliability of categorical data. Educational and Psychological Measurement, 30, 71–76. doi: 10.1177/001316447003000106 First citation in article Crossref, Google Scholar
Fleiss, J. (1971). Measuring nominal scale agreement among many raters. Psychological Bulletin, 76, 378–382. doi: 10.1037/h0031619 First citation in article Crossref, Google Scholar
Fleiss, J. L., Cohen, J., & Everitt, B. S. (1969). Large sample standard errors of kappa and weighted kappa. Psychological Bulletin, 72, 323–327. doi: 10.1037/h0028106 First citation in article Crossref, Google Scholar
Fleiss, J. L., Levin, B., & Paik, M. C. (2004). The measurement of interrater agreement. In J. L. FleissB. LevinM. C. PaikEds., Statistical methods for rates and proportions (3rd ed.). (pp. 598–626). Hoboken, NJ: Wiley. First citation in article Google Scholar
Grayson, K., & Rust, R. (2001). Interrater reliability. Journal of Consumer Psychology, 10, 71–73. doi: 10.1207/15327660151043998 First citation in article Crossref, Google Scholar
Gwet, K. (2002). Inter-rater reliability: Dependency on trait prevalence and marginal homogeneity. Statistical Methods for Inter-Rater Reliability Assessment Series, 2, 1–9. Retrieved from advancedanalyticsllc.com/irrhbk/research_papers/inter_rater_reliability_dependency.pdf First citation in article Google Scholar
Gwet, K. (2008). Computing inter-rater reliability and its variance in the presence of high agreement. The British Journal of Mathematical and Statistical Psychology, 61, 29–48. doi: 10.1348/000711006X126600 First citation in article Crossref, Google Scholar
Gwet, K. (2010). Handbook of inter-rater reliability – a definitive guide to measuring the extent of agreement among multiple raters. Gaithersburg, MD: Advanced Analytics, LLC. First citation in article Google Scholar
Hayes, A., & Krippendorff, K. (2007). Answering the call for a standard reliability measure for coding data. Communication Methods and Measures, 1, 77–89. doi: 10.1080/19312450709336664 First citation in article Crossref, Google Scholar
Holley, J., & Guilford, J. (1964). A note on the g index of agreement. Educational and Psychological Measurement, 24, 749–753. doi: 10.1177/001316446402400402 First citation in article Crossref, Google Scholar
Holsti, O. (1969). Content analysis for the social sciences and humanities. Reading, MA: Addison-Wesley. First citation in article Google Scholar
Hughes, M. A., & Garrett, D. E. (1990). Intercoder reliability estimation approaches in marketing: A generalizability theory framework for quantitative data. Journal of Marketing Research, 27, 185–195. Retrieved from www.jstor.org/stable/3172845 First citation in article Crossref, Google Scholar
James, L., Demaree, R., & Wolf, G. (1993). R_wg: An assessment of within-group interrater agreement. Journal of Applied Psychology, 78, 306–309. doi: 10.1037/0021-9010.78.2.306 First citation in article Google Scholar
Janson, S., & Vegelius, J. (1979). On generalizations of the g index and the phi coefficient to nominal scales. Multivariate Behavioral Research, 14, 255–269. doi: 10.1207/s15327906mbr14029 First citation in article Crossref, Google Scholar
Klemens, B. (2012). Mutual information as a measure of intercoder agreement. Journal of Official Statistics, 28, 395–412. Retrieved from www.jos.nu/Articles/abstract.asp?article=283395 First citation in article Google Scholar
Kolbe, R. H., & Burnett, M. S. (1991). Content-analysis research: An examination of applications with directives for improving research reliability and objectivity. Journal of Consumer Research, 18, 243–250. Retrieved from www.jstor.org/stable/2489559 First citation in article Crossref, Google Scholar
Kozlowski, S., & Hattrup, K. (1992). A disagreement about within-group agreement: Disentangling issues of consistency versus consensus. Journal of Applied Psychology, 77, 161–167. doi: 10.1037/0021-9010.77.2.161 First citation in article Crossref, Google Scholar
Krippendorff, K. (1970). Bivariate agreement coefficients for reliability of data. Sociological Methodology, 2, 139–150. Retrieved from www.jstor.org/stable/270787 First citation in article Crossref, Google Scholar
Krippendorff, K. (2004a). Content analysis: An introduction to its methodology (2nd ed.). Thousand Oaks, CA: Sage. First citation in article Google Scholar
Krippendorff, K. (2004b). Reliability in content analysis. Some common misconceptions and recommendations. Human Communication Research, 30, 411–433. doi: 10.1111/j.1468-2958.2004.tb00738.x First citation in article Crossref, Google Scholar
Krippendorff, K. (2007, June). Computing Krippendorff’s alpha reliability. Unpublished manuscript. Retrieved from www.asc.upenn.edu/usr/krippendorff/mwebreliability5.pdf First citation in article Google Scholar
Krippendorff, K. (2011). Agreement and information in the reliability of coding. Communication Methods and Measures, 5, 93–112. doi: 10.1080/19312458.2011.568376 First citation in article Crossref, Google Scholar
Krippendorff, K. (2012). A dissenting view on so-called paradoxes of reliability coefficients. In C. T. SalmonEd., Communication Yearbook, Vol. 36, (pp. 481–500). New York, NY: Routledge. First citation in article Google Scholar
Light, R. J. (1971). Measures of response agreement for qualitative data: Some generalizations and alternatives. Psychological Bulletin, 76, 365–377. doi: 10.1037/h0031643 First citation in article Crossref, Google Scholar
Lin, L. (1989). A concordance correlation coefficient to evaluate reproducibility. Biometrics, 45, 255–268. doi: 10.2307/2532051 First citation in article Crossref, Google Scholar
Lin, L., Hedayat, A. S., & Wenting, W. (2007). A unified approach for assessing agreement for continuous and categorical data. Journal of Biopharmaceutical Statistics, 17, 629–652. doi: 10.1080/10543400701376498 First citation in article Crossref, Google Scholar
Lombard, M., Snyder Duch, J., & Bracken, C. (2002). Content analysis in mass communication: Assessment and reporting of intercoder reliability. Human Communication Research, 28, 587–604. doi: 10.1093/hcr/28.4.587 First citation in article Crossref, Google Scholar
Maxwell, A. E. (1977). Coefficients of agreement between observers and their interpretation. The British Journal of Psychiatry, 130, 79–83. doi: 10.1192/bjp.130.1.79 First citation in article Crossref, Google Scholar
McGraw, K., & Wong, S. (1996). Forming inferences about some intraclass correlation coefficients. Psychological Methods, 1, 30–46. doi: 10.1037/1082-989X.1.1.30 First citation in article Crossref, Google Scholar
Osgood, C. (1959). The representational model and relevant research methods. In I. de Sola PoolEd., Trends in content analysis (pp. 33–88). Urbana, IL: University of Illinois Press. First citation in article Google Scholar
Perreault, J., William, D., & Leigh, L. E. (1989). Reliability of nominal data based on qualitative judgments. Journal of Marketing Research, 26(2), 135–148. Retrieved from www.jstor.org/stable/3172601 First citation in article Crossref, Google Scholar
Potter, W. J., & Levine-Donnerstein, D. (1999). Rethinking validity and reliability in content analysis. Journal of Applied Communication Research, 27, 258–284. doi: 10.1080/00909889909365539 First citation in article Crossref, Google Scholar
Scott, W. (1955). Reliability of content analysis: The case of nominal scale coding. Public Opinion Quarterly, 19, 321–325. doi: 10.1086/266577 First citation in article Crossref, Google Scholar
Shrout, P., & Fleiss, J. (1979). Intraclass correlations: Uses in assessing rater reliability. Psychological Bulletin, 86, 420–428. doi: 10.1037/0033-2909.86.2.420 First citation in article Crossref, Google Scholar
Sim, J., & Wright, C. C. (2005). The kappa statistic in reliability studies: Use, interpretation, and sample size requirements. Physical Therapy, 85, 257–268. Retrieved from ptjournal.apta.org/content/85/3/257.abstract First citation in article Google Scholar
Spiegelman, M., Terwilliger, C., & Fearing, F. (1953). The reliability of agreement in content analysis. The Journal of Social Psychology, 37, 175–187. Retrieved from doi.apa.org/?uid=1954-02550-001 First citation in article Google Scholar
Tinsley, H. E., & Weiss, D. J. (1975). Interrater reliability and agreement of subjective judgments. Journal of Counseling Psychology, 22, 358–376. doi: 10.1037/h0076640 First citation in article Crossref, Google Scholar
Zhao, X., Liu, J. S., & Deng, K. (2012). Assumptions behind inter-coder reliability indices. In C. T. SalmonEd., Communication Yearbook, Vol. 36, (pp. 419–480). New York, NY: Routledge. First citation in article Google Scholar
Zhao, X. (2012, August). A Reliability Index (ai) that assumes honest coders and variable randomness. Chicago, IL: Association for Education in Journalism and Mass Communication. First citation in article Google Scholar

Volume 11Issue 1January 2015

ISSN: 1614-1881eISSN: 1614-2241

History

AcceptedMarch 17, 2014

Licenses & Copyright

Keywords

PDF download

Verify Phone

Congrats!

Mistakes and How to Avoid Mistakes in Using Intercoder Reliability Indices

Abstract

References

History

Licenses & Copyright

Support & Contact

Support & Contact

Legal information

Legal information

More offers

More offers

Our partners

Our partners

Change Password

Your password must have 8 characters or more and contain 3 of the following:

Password Changed Successfully

Create a new account

Request Username

Verify Phone

Congrats!

Mistakes and How to Avoid Mistakes in Using Intercoder Reliability Indices

Abstract

References

History

Licenses & Copyright

Support & Contact

Support & Contact

Legal information

Legal information

More offers

More offers

Our partners

Our partners