Mokken scale analysis as time goes by: An update for scaling practitioners

https://doi.org/10.1016/j.paid.2010.08.016Get rights and content

Abstract

We explain why invariant item ordering (IIO) is an important property in non-cognitive measurement and we discuss that IIO cannot be easily generalized from dichotomous data to polytomous data, as some authors seem to suggest. Methods are discussed to investigate IIO for polytomous items and an empirical example shows how these methods can be used in practice.

Introduction

Recently, Watson and colleagues (Stewart et al., 2010, Watson et al., 2007, Watson et al., 2008) investigated for different personality inventories whether items measuring the same attribute formed a hierarchical scale. Items form a hierarchical scale when the ordering of the items according to their popularity (or mean score) is the same across different values of the latent trait. This property is also named invariant item ordering (IIO; Sijtsma & Junker, 1996). Meijer (2010) discussed that the way Watson and others investigated whether items form a hierarchical scale was not correct. In a reply to his article, Watson and Deary (2010) partly agreed with his criticism, but also referred to an article by Sijtsma, Debets, and Molenaar (1990) that allegedly used the P-matrix to investigate whether items form a hierarchical scale. However, in the Sijtsma et al. (1990) paper this matrix is not used to investigate whether items form a hierarchical scale, but whether item step response functions form a hierarchy. In the present article, we argue that a hierarchy of item step response functions need not imply a hierarchical scale for the items. Hence, the P-matrix is not an adequate tool for assessing whether a scale is hierarchical.

We applaud the use of more sophisticated techniques by Watson and colleagues, but apparently the literature on Mokken scale analysis (MSA) gives rise to some misunderstandings. Therefore, the aim of this paper is to discuss (1) why IIO is an important aspect of personality scales; (2) Mokken’s model for the analysis of polytomous items; and (3) why the results for dichotomous items cannot be easily generalized to polytomous items. This takes us to a second source of confusion surrounding hierarchical scales, which is the assumption made by practitioners that high values of Mokken’s scalability coefficient H support such a hierarchy. We argue that high values of H found in real-data analysis are not adequate for assessing whether a scale is hierarchical. Instead, high H values establish a person ordering, which is exactly what Hemker, Sijtsma, and Molenaar (1995, p. 340) claimed (however, see Watson & Deary, 2010). Hence, we discuss (4) why scalability coefficient H is not an index for IIO, and continue with (5) a method to investigate IIO for polytomous items; (6) an R program that can be used by practitioners to investigate IIO for polytomous items, and (7) an empirical example illustrating the use of the R program.

Section snippets

Why is invariant item ordering important in non-cognitive measurement?

The measurement of psychological traits often assumes, either implicitly or explicitly, that items used in inventories represent different levels of intensity with respect to the attribute of interest. For example, when measuring depression we assume that the item “thoughts of ending your life” represents a higher level of depression than the item “feeling no interest in things”, and when measuring anxiety, the item “spells of terror or panic” has a higher intensity than the item “feeling

Mokken’s models for the analysis of dichotomous and polytomous items

Mokken (1971) proposed two models for dichotomous items, nowadays recognized as item response theory (IRT) models, one of which was meant for ordinal person measurement and the other both for ordinal person and item measurement. We discuss the polytomous-item versions as proposed by Molenaar, 1982, Molenaar, 1986, Molenaar, 1991, Molenaar, 1997, of which Mokken’s dichotomous-item models are special cases.

The first model is the monotone homogeneity model (MHM), which is based on the following

Why results for dichotomous items cannot be easily generalized to polytomous items

For dichotomous items (scores 0, 1), the ISRF equals P(Xj  1|θ) = P(Xj = 1|θ), so that one ISRF, now called item response function (IRF), suffices to describe the item scores. Figure 2A shows two IRFs for dichotomous items j (solid curve) and k (dashed curve). For each value of latent variable θ, the probability of obtaining a 1 score on item j is greater than for item k; hence, they exhibit IIO. Sijtsma and Junker (1996) reviewed several methods for investigating the IIO property in

Is coefficient H an index for IIO?

The second problem refers to coefficient H, which some authors mistakenly use as an index for IIO. The source of this confusion seems to reside with the deterministic Guttman model for dichotomous items, for which all H coefficients are equal to 1. Fig. 3 shows the typical step IRFs for four Guttman items, which do not intersect (although they mostly coincide) and hence exhibit IIO. Data consistent with the deterministic Guttman model are error-free but real data contain much error, and the

A method to investigate IIO for polytomous items

Ligtvoet, Van der Ark, Te Marvelde, and Sijtsma (2010) proposed a method to investigate IIO for polytomous items without the assumption of a particular IRT model. First, their method manifest IIO is checked for pairs of items. We define the rest score R as the total score on the J  2 items excluding the scores on items j and k. For J  2 items with m + 1 ordered scores each, rest score R theoretically runs from 0 to (J  2)m. The method checks for each pair of items j, k with item means ordered such

A computer program to investigate IIO for polytomous items

Method manifest IIO is available in the R package Mokken as method check.iio (Van der Ark, 2007). Furthermore, this package contains different functions (coefH, aisp, check.monotonicity, check.pmatrix, and check.restscore) to investigate different assumptions of the MHM and the DMM. Except for the graphics, the function names and the output in Mokken are similar to function names and output in the package MSP5 for Windows (Molenaar & Sijtsma, 2000). An advantage of the R package Mokken over

Example: analysis of SPPC data

We used R package Mokken to analyze the data from the six subscales of Harter’s (1985) Self-Perception Profile for Children (SPPC) (N = 268, boys; see Meijer, Egberink, Emons, & Sijtsma, 2008). The SPPC measures how children between 8 and 12 years of age judge their own functioning in several specific domains and how they judge their global self-worth. Five of the six subscales represent specific domains of self-concept: Scholastic Competence (SC), Social Acceptance (SA), Athletic Competence (AC),

Recommendations

We recommend that researchers first investigate whether the measurement model fits their data before they interpret the H or HT coefficients. Table 2 summarizes MSA.

For dichotomous-item inventories and the MHM, we recommend first investigating unidimensionality by means of the automated item selection procedure (AISP) in MSP5.0 and Mokken. Monotonicity should be investigated by means of the item-restscore regressions (IR-regr) in both programs. Coefficient Hj gives the strength of the

References (28)

  • R.R. Meijer et al.

    Detection and validation of unscalable item score patterns using item response theory: An illustration with Harter’s self-perception profile for children

    Journal of Personality Assessment

    (2008)
  • R.J. Mokken

    A theory and procedure of scale analysis

    (1971)
  • R.J. Mokken et al.

    Rejoinder to ‘The Mokken scale: A critical discussion’

    Applied Psychological Measurement

    (1986)
  • I.W. Molenaar

    Mokken scaling revisited

    Kwantitatieve Methoden

    (1982)
  • Cited by (102)

    View all citing articles on Scopus
    View full text