On the Inappropriateness of Using Items to Calculate Total Scale Score Reliability via Coefficient Alpha for Multidimensional Scales
Abstract
Researchers have the implicit option of calculating internal consistency reliability (coefficient α) for total scale scores derived from multidimensional inventories based on either the inter-item correlation matrix (item unit-level) or the inter-subscale correlation matrix (subscale unit-level). It is demonstrated that item unit-level and subscale unit-level reliability estimates often diverge substantially in practice. Specifically, the item unit-level reliability estimation is often larger than the corresponding subscale unit-level estimate. It is recommended that if researchers calculate total scale score reliability at the item unit-level, then a model-based approach to the estimation of internal consistency reliability (i.e., omega hierarchical) should be applied, when the underlying model is multidimensional. If omega hierarchical cannot be applied for any particular reason, it is recommended that total scale score reliabilities be calculated at the subscale unit-level of analysis, not the item unit-level.
References
1996). Beck Depression Inventory manual (2nd edn.). San Antonio, TX: Psychological Corporation.
(1952). Inter-judge vs. intra-judge reliability in the order of merit method. The American Journal of Psychology, 65, 84–88.
(1989). Structural equations with latent variables. New York, NY: Wiley & Sons.
(2005). Analyzing the reliability of multidimensional measures: An example from intelligence research. Educational and Psychological Measurement, 65, 227–240.
(1993). What is coefficient alpha? An examination of theory and applications. Psychological Bulletin, 78, 98–104.
(1986). Introduction to classical and modern test theory. New York, NY: Holt, Rinehart & Winston.
(1951). Coefficient alpha and the internal structure of tests. Psychometrika, 16, 297–334.
(2013). A psychometric examination of the interpersonal sexual objectification scale among college men. Journal of Counseling Psychology, 60, 239–250.
(1998). A psychometric evaluation of the Beck Depression Inventory – II. Psychological Assessment, 10, 83–89.
(2003). Using commonly available software for bootstrapping in both substantive and measurement analyses. Educational and Psychological Measurement, 63, 24–50.
(2007). Working memory and fluid intelligence are both identical to g?! Reanalyses and critical evaluation. Psychology Science, 49, 187–207.
(2008). Higher-order models versus direct hierarchical models: g as superordinate or breadth factor? Psychology Science, 50, 21–43.
(2013). Modeling the Balanced Inventory of desirable responding: Evidence in favor of a revised model of socially desirable responding. Journal of Personality Assessment. doi: 10.1080/00223891.2013.816717
(2007). Implications relevant to CFA model misfit, reliability, and the Five Factor Model as measured by the NEO–FFI. Personality and Individual Differences, 43, 1051–1062.
(2007). A confirmatory factor analytic investigation of the TAS-20: Corroboration of a five-factor model and suggestions for improvement. Journal of Personality Assessment, 89, 247–257.
(in press ). Bifactor modeling and the estimation of model-based reliability in the WAIS-IV. Multivariate Behavioral Research.2006). Congeneric and (essentially) tau-equivalent estimates of score reliability: What they are and how to use them. Educational and Psychological Measurement, 66, 930–944.
(1937). The bifactor method. Psychometrika, 2, 41–54.
(2013). The bifactor model of the strengths and difficulties questionnaire. European Journal of Psychological Assessment. doi: 10.1027/1015-5759/a000160
(1968). Statistical theories of mental test scores. Reading, MA: Addison-Wesley.
(2002). Mayer-Salovey-Caruso Emotional Intelligence Test (MSCEIT)–User’s manual. North Tonawanda, NY: Multi-Health Systems.
(2003). Measuring emotional intelligence with the MSCEIT V2.0. Emotion, 3, 97–105.
(1978). Generalizability in factorable domains: “Domain validity and generalizability”. Educational and Psychological Measurement, 38, 75–79.
(1985). Factor analysis and related methods. Hillsdale, NJ: Erlbaum.
(1999). Test theory: A unified treatment. Mahwah, NJ: Erlbaum.
(1995). Coefficient alpha: A basic introduction from the perspectives of classical test theory and structural equation modeling. Structural Equation Modeling, 2, 255–273.
(2001). Performance of bootstrapping approaches to model test statistics and parameter standard error estimation in structural equation modeling. Structural Equation Modeling, 8, 353–377.
(1994). Psychometric theory. New York, NY: McGraw-Hill.
(1995). Factor structure of the Barratt impulsiveness scale. Journal of Clinical Psychology, 6, 768–774.
(2013). R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing Retrieved from www.R-project.org/
. (1997). Estimation of composite reliability for congeneric measures. Applied Psychological Measurement, 22, 173–184.
(2012). The rediscovery of bifactor measurement models. Multivariate Behavioral Research, 47, 667–696.
(2012). Scoring and modeling psychological measures in the presence of multidimensionality. Journal of Personality Assessment, 95, 129–140. doi: 10.1080/00223891.2012.725437
(2006). Autobiographical memory in dysphoric and non-dysphoric college students using a computerised version of the AMT. Cognition & Emotion, 20, 506–515.
(1992). Confirmatory factor analysis and reliability: Testing measurement model assumptions. Educational and Psychological Measurement, 52, 795–811.
(1979). Hierarchical cluster analysis and the internal structure of tests. Multivariate Behavioral Research, 14, 57–74.
(2013). psych: Procedures for personality and psychological research [Computer software manual]. Retrieved from cran.r-project.org/web/packages/psych/ (R package version 1.3.2).
(2009). Coefficients alpha, beta, omega, and the glb: Comments on Sijtsma. Psychometrika, 74, 145–154.
(1988). Some theory and applications of confirmatory second-order factor analysis. Multivariate Behavioral Research, 23, 51–67.
(2010). Testing reasoning ability with handheld computers, notebooks, and paper and pencil. European Journal of Psychological Assessment, 26, 284–293. doi: 10.1027/1015-5759/a000038
(2010). The c-bifactor model as a tool for the construction of semi-homogenous upper-level measures. Psychological Test and Assessment Modeling, 3, 298–312.
(2009). On the use, misuse, and the very limited usefulness of Cronbach’s alpha. Psychometrika, 74, 107–120.
(2009). Fifty years of the Barratt Impulsiveness Scale: An update and review. Personality and Individual Differences, 47, 385–395.
(2003). Starting at the beginning: An introduction to coefficient alpha and internal consistency. Journal of Personality Assessment, 80, 99–103.
(2005). Cronbach’s α, Revelle’s β, and McDonald’s ω h: Their relations with each other and two alternative conceptualizations of reliability. Psychometrika, 70, 123–133.
(