Skip to main content
Log in

Intercoder reliability indices: disuse, misuse, and abuse

  • Published:
Quality & Quantity Aims and scope Submit manuscript

Abstract

Although intercoder reliability has been considered crucial to the validity of a content study, the choice among them has been controversial. This study analyzed all the content studies published in the two major communication journals that reported intercoder reliability, aiming to find how scholars conduct intercoder reliability test. The results revealed that some intercoder reliability indices were misused persistently concerning the levels of measurement, the number of coders, and the means of reporting reliability over the past 30 years. Implications of misuse, disuse, and abuse were discussed, and suggestions regarding proper choice of indices in various situations were made at last.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. \(^{1}\) Coders could be also called annotators, judges, raters, observers, classifiers and others, depending on the research field. Intercoder, as well as interrater, is used interchangeably throughout the paper.

  2. When the reliability value is exceedingly lower than the value of percent agreement, e.g., percent agreement is higher than 0.8, while reliability is close or lower than 0, this may indicate that the marginal distribution is too skewed.

  3. It is identical to Bennett et al. (1954)’s \(S\) coefficient.

  4. As Lombard et al. (2002) argued, the proportion of percent agreement was probably underestimated because most “NAs” would actually adopt percent agreement.

  5. They have corresponding multiple coder versions proposed by other scholars. For instance, Fleiss (1971) extended \(\pi \) while Conger (1980) and Light (1971) suggested the multiple coder version of \(\kappa \).

  6. Cohen (1968) later proposed weighted \(\kappa \) for ordinal ratings. Krippendorff (2004a)’s \(\alpha \) is able to be applied to all levels of measurement. Some indices like ICCs are only applicable to interval ratings, and yet some like \(I_{r}\), Brennan and Prediger (1981)’s \(\kappa \) and \(\pi \) do not have higher levels of counterparts.

  7. Although it has been a consensus that percent agreement, including Holsti generally overestimates reliability in that it does not make allowance for chance agreement, but it is not considered as misuse if used for nominal scaled codings. The rationale is to be explained below.

  8. Reporting standard errors for the reliability value obtained is still arguable in the literature. Therefore, not reporting standard errors is not a problem for the present.

  9. There are plenty of modeling approaches, such as log-linear, IRT (item response theory), latent class, and mixture modeling. In a separate study of the author, the approach of log-linear modeling was found to be no better than most indices.

  10. Although variables with binary outcomes belong to the nominal level, most indices share more characteristics between binary and interval variables.

References

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Guangchao Charles Feng.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Feng, G.C. Intercoder reliability indices: disuse, misuse, and abuse. Qual Quant 48, 1803–1815 (2014). https://doi.org/10.1007/s11135-013-9956-8

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11135-013-9956-8

Keywords

Navigation