Skip to main content
Published Online:https://doi.org/10.1027/1015-5759/a000131

The major theories of psychological measurement suggest that the world of assessment is a rather simple one. According to true-score theory as described by Lord and Novick (1968) we can have true scores that do a good job in representing human attributes if we are only able to control the influence of random error. In congeneric test theory as found in the work of McDonald (1999) and others – and based on the congeneric model of measurement (Jöreskog, 1971) – true scores are associated with one latent dimension, an association considered an assumption in need of empirical support. The standard models of item-response theory (see Mellenbergh, 1994) also assume a latent dimension associated with a human attribute, although the situation is a bit more complex because of the consideration of the properties of items.

However, there are indications suggesting that the assumed simplicity of the world of assessment is a bit illusionary. Back in 1959 Campbell and Fiske investigated a number of datasets from the perspective of true-score theory and came to the conclusion that method effects lead to an overestimation of the quality of a measure since method effects usually originate in the observational procedures and contribute to the systematic variance thought to be true variance. To obtain an estimate of the impact on measurement the authors proposed the multitrait-multimethod approach.

During the last decade further evidence has accumulated suggesting an influence of methods on measurement. Statistical methods have repeatedly revealed the presence of the item-wording effect (DiStefano & Motl, 2006; Vautier, Steyer, Jmel, & Raufaste, 2005) as well as the position effect (Hartig, Hölzl, & Moosbrugger, 2007; Schweizer, Schreiner, & Gold, 2009) in the responses to items on psychological measures. The outcomes of these studies indicate that other sources may also contribute to true-score variance and may indeed result in a distortion of indices of the quality of a measure. The observed position effect was also identified by methods based on item-response theory (Kubinger, Formann, & Farkas, 1991; Verguts & De Boeck, 2000).

Meanwhile, the multitrait-multimethod approach has been associated with a number of specific confirmatory factor models that enable the separation of trait and method effects (e.g., Eid, 2000; Marsh & Grayson, 1995). The prerequisite for applying these models is the availability of data that were sampled on the basis of a multitrait-multimethod design. Such models are so-called bifactor models and were also chosen to represent complex constructs (Reise, Moore, & Haviland, 2010). Since there is the need to compare latent variables in these models, special methods for scaling variances of latent variables have been considered (Reise et al., 2010; Schweizer, 2011). Such scaling methods may be also useful in assessing the strength of the influence of method effects on assessment.

Apparently, it is not really advisable to stick with the assumption of the world of assessment as a very simple one. There is reason to consider method effects in test construction seriously and to make use of the methodology that has been developed for this purpose. Therefore, studies designed according to the multitrait-multimethod approach should be highly appreciated and find their way into journals like the European Journal of Psychological Assessment.

However, a check of last year’s issues of European Journal of Psychological Assessment reveals that studies including a multitrait-multimethod design are in fact rare. The 2010 issues do not include a single paper with a multitrait-multimethod design, although there were six papers that report investigations of convergent and discriminant validity (Blickle, Momm, Liu, Witzki, & Steinmayr, 2011; Fossati, Borroni, Marchione, & Maffei, 2011; Gorostiaga, Balluerka, Alonso-Arbiol, & Haranburu, 2011; Teubert, & Pinquart, 2011; Veirman, Brouwers, & Fontaine, 2011; Zohar, & Cloninger, 2011). Furthermore, five studies concentrated solely on convergent validity (Carelli, Wiberg, B., & Wiberg, M., 2011; Campos, & Gonçalves, 2011; Fernandez, Dufey, & Kramp, 2011; Gorska, 2011; Höfling, Moosbrugger, Schermelleh-Engel, & Heidenreich, 2011).

Most studies emphasize the investigation of structural validity. Although this endeavor is reasonable and very appreciated, it does not reflect the complexity of the world of assessment sufficiently well. It remains to express my hope that in the future the researchers might give more weight in their research work to the investigation of distortions originating in one of the sources of the various method effects.

References

  • Blickle, G. , Momm, T. , Liu, Y. , Witzki, A. , Steinmayr, R. (2011). Construct validation of the Test of Emotional Intelligence (TEMINT): A two-study investigation. European Journal of Psychological Assessment, 27, 282–298. LinkGoogle Scholar

  • Campbell, D. T. , Fiske, D. W. (1959). Convergent and discriminant validation by the multitrait-multimethod matrix. Psychological Bulletin, 56, 81–105. CrossrefGoogle Scholar

  • Campos, R. , Gonçalves, B. (2011). The Portuguese version of Beck Depression Inventory-II (BDIII): Preliminary psychometric data with two nonclinical samples. European Journal of Psychological Assessment, 27, 258–264. LinkGoogle Scholar

  • Carelli, M. G. , Wiberg, B. , Wiberg, M. (2011). Development and construct validation of the Swedish Zimbardo Time Perspective Inventory. European Journal of Psychological Assessment, 27, 220–227. LinkGoogle Scholar

  • DiStefano, C. , Motl, R. W. (2006). Further investigating method effects associated with negatively worded items on self-report surveys. Structural Equation Modeling, 13, 440–464. CrossrefGoogle Scholar

  • Eid, M. (2000). A multitrait-multimethod model with minimal assumptions. Psychometrika, 65, 241–261. CrossrefGoogle Scholar

  • Fernandez, A. M. , Dufey, M. , Kramp, U. (2011). Testing the psychometric properties of the Interpersonal Reactivity Index (IRI) in Chile: Empathy in a different cultural context. European Journal of Psychological Assessment, 27, 179–185. LinkGoogle Scholar

  • Fossati, A. , Borroni, S. , Marchione, D. , Maffei, C. (2011) The Big Five Inventory (BFI): Reliability and validity of its Italian translation in three independent nonclinical samples. European Journal of Psychological Assessment, 27, 50–58. LinkGoogle Scholar

  • Gorostiaga, A. , Balluerka, N. , Alonso-Arbiol, I. , Haranburu, M. (2011). Validation of the Basque Revised NEO Personality Inventory (NEO PI-R). European Journal of Psychological Assessment, 27, 193–204. LinkGoogle Scholar

  • Gorska, M. (2011). Psychometric properties of the Polish version of the Interpersonal Competence Questionnaire (ICQ-R). European Journal of Psychological Assessment, 27, 186–192. LinkGoogle Scholar

  • Hartig, J. , Hölzel, B. , Moosbrugger, H. (2007). A confirmatory analysis of item reliability trends (CAIRT): Differentiating true score and error variance in the analysis of item context effects. Multivariate Behavioral Research, 42, 157–183. CrossrefGoogle Scholar

  • Höfling, V. , Moosbrugger, H. , Schermelleh-Engel, K. , Heidenreich, T. (2011). Mindfulness or mindlessness? A modified version of the Mindful Attention and Awareness Scale (MAAS). European Journal of Psychological Assessment, 27, 59–64. LinkGoogle Scholar

  • Jöreskog, K. G. (1971). Statistical analysis of sets of congeneric tests. Psychometrika, 36, 109–133. CrossrefGoogle Scholar

  • Kubinger, K. D. , Formann, A. K. , Farkas, M. G. (1991). Psychometric shortcomings of Raven’s Standard Progressive Matrices (SPM) in particular for computerized testing. European Review of Applied Psychology, 41, 295–300. Google Scholar

  • Lord, F. M. , Novick, M. R. (1968). Statistical theories of mental test scores. Reading, MA: Addison-Wesley. Google Scholar

  • Marsh, H. , Grayson, D. (1995). Latent variable models of multitrait-multimethod data. In R. H. Hoyle, (Ed.), Structural equation modeling: concepts, issues, and applications (pp. 177–187). Thousand Oaks, CA: Sage. Google Scholar

  • McDonald, R. P. (1999). Test theory: A unified treatment. Mahwah, NJ: Erlbaum. Google Scholar

  • Mellenbergh, G. J. (1994). Generalized linear item response theory. Psychological Bulletin, 115, 300–307. CrossrefGoogle Scholar

  • Reise, S. P. , Moore, T. M. , Haviland, M. G. (2010). Bifactor models and rotations: Exploring the extent to which multidimensional data yield univocal scores. Journal of Personality Assessment, 92, 544–559. CrossrefGoogle Scholar

  • Schweizer, K. (2011). Scaling variances of latent variables by standardizing loadings: Applications to working memory and the position effect. Multivariate Behavioral Research. DOI 10.1080/00273171.2011.625312 Google Scholar

  • Schweizer, K. , Schreiner, M. , Gold, A. (2009). The confirmatory investigation of APM items with loadings as a function of the position and easiness of items: A two-dimensional model of APM. Psychology Science Quarterly, 51, 47–64. Google Scholar

  • Teubert, D. , Pinquart, M. (2011). The Coparenting Inventory for Parents and Adolescents (CI-PA): Reliability and validity. European Journal of Psychological Assessment, 27, 206–214. LinkGoogle Scholar

  • Vautier, S. , Steyer, R. , Jmel, S. , Raufaste, E. (2005). Imperfect or perfect dynamic bipolarity? The case of autonomous affective judgments. Structural Equation Modeling, 12, 391–410. CrossrefGoogle Scholar

  • Veirman, E. , Brouwers, S. A. , Fontaine, J. R. J. (2011). The assessment of emotional awareness in children: Validation of the Levels of Emotional Awareness Scale for Children. European Journal of Psychological Assessment, 27, 265–273. LinkGoogle Scholar

  • Verguts, T. , De Boeck, P. (2000). A Rasch model for detecting learning while solving an intelligence test. Applied Psychological Measurement, 24, 151–162. CrossrefGoogle Scholar

  • Zohar, A. H. , Cloninger, C. R. (2011). The psychometric properties of the TCI-140 in Hebrew. European Journal of Psychological Assessment, 27, 73–80. LinkGoogle Scholar

Karl Schweizer, Department of Psychology, Goethe University Frankfurt, Mertonstr. 17, 60054 Frankfurt a. M., Germany, +49 69 798-22081, +49 69 798-23847,