Much ado About Nothing, or Much to do About Something?
Effects of Scale Shortening on Criterion Validity and Mean Differences
Abstract
Short scales have become widely used in settings in which participant time is limited and when assessment would otherwise be impossible. Based on simulated data on the population level we investigate whether scale shortening affects the desired invariance of criterion-related validities as well as differences between estimated expected values of populations. We conclude that, under a unidimensional model, decreasing the number of items does neither affect criterion validity nor differences between expected values between two populations. It is, however, discussed, that possible problems of scale shortening can occur on the construct level and, practically more important, on the individual score level.
References
1973). A goodness of fit test for the Rasch model. Psychometrika, 38, 123–140.
(1972). Social sciences as sorcery. London, UK: Deutsch.
(1968). Some latent trait models and their use in inferring an examinee’s ability. In , Statistical theories of mental test scores (pp. 395–479). Reading, MA: Addison-Wesley.
(1984). You don’t always get what you pay for: Measuring depression with short and simple versus long and sophisticated scales. Journal of Research in Personality, 18, 81–98.
(1997). Test length and validity revisited. European Journal of Personality, 11, 303–315.
(2007). The insidious effects of failing to include design-driven correlated residuals in latent-variable covariance structure analysis. Psychological Methods, 12, 381.
(1994). Simulation-extrapolation estimation in parametric measurement error models. Journal of the American Statistical Association, 89, 1314–1328.
(1992). Revised NEO Personality Inventory (NEO PI-R) and NEO Five-Factor Inventory (NEO FFI): Professional Manual. Odessa, FL: Psychological Assessment Resources.
(2012). An evaluation of the consequences of using short measures of the Big Five personality traits. Journal of Personality and Social Psychology, 102, 874–888.
(2007). On the consistency of individual classification using short scales. Psychological Methods, 12, 105–120.
(1973). Can a personality construct be validated empirically? Psychological Bulletin, 80, 89–92.
(2013). Item reduction based on rigorous methodological guidelines is necessary to maintain validity when shortening composite measurement scales. Journal of Clinical Epidemiology, 66, 710–718.
(2003). A very brief measure of the Big-Five personality domains. Journal of Research in Personality, 37, 504–528.
(1977). What is not what in statistics. Journal of the Royal Statistical Society. Series D (The Statistician), 26, 81–107. doi: 10.2307/2987957
(2000). A critique of Rasch residual fit statistics. Journal of Applied Measurement, 1, 152–176.
(2001). The Rasch model, additive conjoint measurement, and new models of probabilistic measurement theory. Journal of Applied Measurement, 2, 389–423.
(2014). TAM: Test Analysis Modules. Retrieved from CRAN.R-project.org/package=TAM
(eRm: Extended Rasch Modeling 2013 Retrieved from CRAN.R-project.org/package=eRm
2008). Formalizing dimension and response violations of local independence in the unidimensional Rasch model. Journal of Applied Measurement, 9, 200–215.
(1955). Antecedent probability and the efficiency of psychometric signs, patterns, or cutting scores. Psychological Bulletin, 52, 194–216.
(2010). How similar are personality scales of the “same” construct? A meta-analytic investigation. Personality and Individual Differences, 49, doi: 10.1016/j.paid.2010.06.014 669–676.
(2002). A comparison of item selection techniques and exposure control mechanisms in CATs using the generalized partial credit model. Applied Psychological Measurement, 26, 147–163.
(1985). The validity of formal and informal personality assessments. Journal of Research in Personality, 19, 331–342.
(2001). Nonparametric goodness-of-fit tests for the Rasch model. Psychometrika, 66, 437–459. doi: 10.1007/BF02294444
(2013). R: A Language and Environment for Statistical Computing. Vienna, Austria: R Core Team. Retrieved from www.R-project.org/
. (2007). Measuring personality in one minute or less: A 10-item short version of the Big Five inventory in English and German. Journal of Research in Personality, 41, 203–212.
(2006). ltm: An R package for latent variable modelling and item response theory analyses. Journal of Statistical Software, 17, 1–25.
(2001). Measuring global self-esteem: Construct validation of a single-item measure and the Rosenberg Self-Esteem Scale. Personality and Social Psychology Bulletin, 27, 151–161.
(1997). Some new results on hit-rates and base-rates in mental testing. Chinese Journal of Psychology, 39, 173–192.
(1996). Hit-rate bias in mental testing. Current Psychology of Cognition, 15, 3–28.
(2011). The effects of local item dependence on estimates of ability in the Rasch model. Retrieved from www.rasch.org/rmt/rmt253d.htm
(1988). The distributional properties of Rasch standardized residuals. Educational and Psychological Measurement, 48, 657–667.
(1904). The proof and measurement of association between two things. The American Journal of Psychology, 15, 72–101.
(1978). A history of factor indeterminacy. In , Theory Construction and Data Analysis in the Behavioral Sciences (pp. 136–178). San Francisco, CA: Jossey-Bass.
(2011). Comparative validity of brief to medium-length Big Five and Big Six Personality Questionnaires. Psychological Assessment, 23, 995–1009.
(1989). Trace lines for testlets: A use of multiple-categorical-response models. Journal of Educational Measurement, 26, 247–260.
(2001a). Non-modeled item interactions lead to distorted discrimination parameters: A case study. Methods of Psychological Research Online, 6, 159–174.
(2001b). The Effect of ignoring item interactions on the estimated discrimination parameters in item response theory. Psychological Methods, 6, 181–195.
(2005). Item parameter recovery, standard error estimates, and fit statistics of the Winsteps program for the family of Rasch models. Educational and Psychological Measurement, 65, 376–404.
(1989). Weighted likelihood estimation of ability in item response theory. Psychometrika, 54, 427–450.
(2010). Identification and measurement of a more comprehensive set of person-descriptive trait markers from the English lexicon. Journal of Research in Personality, 44, 258–272.
(1993). Scaling performance assessments: Strategies for managing local item dependence. Journal of Educational Measurement, 30, 187–213.
(