Skip to main content
Log in

An Exact Method for Partitioning Dichotomous Items Within the Framework of the Monotone Homogeneity Model

  • Published:
Psychometrika Aims and scope Submit manuscript

Abstract

The monotone homogeneity model (MHM—also known as the unidimensional monotone latent variable model) is a nonparametric IRT formulation that provides the underpinning for partitioning a collection of dichotomous items to form scales. Ellis (Psychometrika 79:303–316, 2014, doi:10.1007/s11336-013-9341-5) has recently derived inequalities that are implied by the MHM, yet require only the bivariate (inter-item) correlations. In this paper, we incorporate these inequalities within a mathematical programming formulation for partitioning a set of dichotomous scale items. The objective criterion of the partitioning model is to produce clusters of maximum cardinality. The formulation is a binary integer linear program that can be solved exactly using commercial mathematical programming software. However, we have also developed a standalone branch-and-bound algorithm that produces globally optimal solutions. Simulation results and a numerical example are provided to demonstrate the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • Bacci, S., Bartolucci, F., & Gnaldi, M. (2014). A class of multidimensional latent class IRT models for ordinal polytomous item responses. Communication in Statistics, 43, 787–800.

    Article  Google Scholar 

  • Balas, E. (1965). An additive algorithm for solving linear programs with zero-one variables. Operations Research, 13, 517–546.

    Article  Google Scholar 

  • Bartolucci, F. (2007). A class of multidimensional IRT models for testing unidimensionality and clustering items. Psychometrika, 72, 141–157.

    Article  Google Scholar 

  • Bartolucci, F., Bacci, S., & Gnaldi, M. (2014). MultiLCIRT: An R package for multidimensional latent class item response models. Computational Statistics and Data Analysis, 71, 971–985.

    Article  Google Scholar 

  • Bartolucci, F., Montanari, G. E., & Pandolfi, S. (2012). Dimensionality of the latent structure and item selection via latent class multidimensional IRT models. Psychometrika, 77, 782–802.

    Article  Google Scholar 

  • Birnbaum, A. (1968). Some latent trait models and their uses in inferring an examinee’s ability. In F. M. Lord & R. M. Novick (Eds.), Statistical theories of mental test scores (pp. 397–479). Reading, MA: Addison-Wesley.

    Google Scholar 

  • Brusco, M. J., & Stahl, S. (2005). Branch-and-bound applications in combinatorial data analysis. New York, NY: Springer.

    Google Scholar 

  • Brusco, M. J., & Stahl, S. (2007). An algorithm for extracting maximum cardinality subsets with perfect dominance or anti-Robinson structures. British Journal of Mathematical and Statistical Psychology, 60, 333–351.

    Article  Google Scholar 

  • Clapham, C. (1996). The concise Oxford dictionary of mathematics. New York, NY: Oxford University Press.

    Google Scholar 

  • D’Angelo, G. M., Luo, J., & Xiong, C. (2012). Missing data methods for partial correlations. Journal of Biometrics & Biostatistics, 3(8). doi:10.4172/2155-6180.1000155.

  • Dean, N., & Raftery, A. E. (2010). Latent class analysis variable selection. Annals of the Institute of Statistical Mathematics, 62, 11–35.

    Article  PubMed Central  PubMed  Google Scholar 

  • Ellis, J. (2014). An inequality for correlations in unidimensional monotone latent variable models for binary variables. Psychometrika, 79, 303–316.

    Article  PubMed  Google Scholar 

  • Fisher, R. A. (1921). On the probable error of a coefficient of correlation deduced from a small sample. Metron, 1, 3–32.

    Google Scholar 

  • Fisher, R. A. (1924). The distribution of the partial correlation coefficient. Metron, 3, 329–332.

    Google Scholar 

  • Hessen, D. J. (2005). Constant latent odds-ratios models and Mantzel–Haenszel null hypothesis. Psychometrika, 70, 497–516.

    Article  Google Scholar 

  • Junker, B. W., & Sijtsma, K. (2001). Nonparametric item response theory in action: An overview of the special issue. Applied Psychological Measurement, 22, 211–220.

    Article  Google Scholar 

  • Klein, G., & Aronson, J. E. (1991). Optimal clustering: A model and method. Naval Research Logistics, 38, 447–461.

    Article  Google Scholar 

  • Land, A. H., & Doig, A. (1960). An automatic method of solving discrete programming problems. Econometrica, 28, 497–520.

    Article  Google Scholar 

  • Loevinger, J. (1948). The technique of homogeneous tests compared with some aspects of "scale analysis" and factor analysis. Psychological Bulletin, 45, 507–530.

    Article  PubMed  Google Scholar 

  • Mokken, R. J. (1971). A theory and procedure of scale analysis. The Hauge/Berlin: Mouton/DeGruyter.

    Book  Google Scholar 

  • Molenaar, I. W., & Sijtsma, K. (2000). User’s manual for MSP5 for Windows. Groningen: iecProGAMMA.

  • Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests. Copenhagen: Nielsen and Lydische.

    Google Scholar 

  • Reckase, M. D. (2009). Multidimensional item response theory. New York, NY: Springer.

    Book  Google Scholar 

  • Roussos, L. A., Stout, W. F., & Marden, J. I. (1998). Using new proximity measures with hierarchical cluster analysis to detect multidimensionality. Journal of Educational Measurement, 35, 1–30.

    Article  Google Scholar 

  • Samejima, F. (1969). Estimation of latent trait ability using a response pattern of graded scores. Psychometrika Monograph No. 17.

  • Sijtsma, K., & Molenaar, I. W. (2002). Introduction to nonparametric item response theory. Thousand Oaks, CA: Sage.

    Google Scholar 

  • Straat, J. H., Van der Ark, L. A., & Sijtsma, K. (2013). Comparing optimization algorithms for item selection in Mokken scale analysis. Journal of Classification, 30, 75–99.

    Article  Google Scholar 

  • Steinley, D., & Brusco, M. J. (2008). Selection of variables in cluster analysis: An empirical comparison of eight procedures. Psychometrika, 73, 125–144.

    Article  Google Scholar 

  • Van Abswoude, A. A. H., Van der Ark, L. A., & Sijtsma, K. (2004). A comparative study of test data dimensionality assessment procedures under nonparametric IRT models. Applied Psychological Measurement, 28, 3–24.

    Article  Google Scholar 

  • Van der Ark, L. A. (2007). Mokken scale analysis in R (version 2.4). Journal of Statistical Software, 20, 1–19.

    Article  Google Scholar 

  • Van der Ark, L. A. (2012). New developments in Mokken scale analysis in R. Journal of Statistical Software, 48, 1–27.

    Article  Google Scholar 

  • Van der Ark, L. A., Croon, M. A., & Sijtsma, K. (2008). Mokken scale analysis for dichotomous items using marginal models. Psychometrika, 73, 183–208.

    Article  PubMed Central  PubMed  Google Scholar 

  • Van der Ark, L. A., & Sijtsma, K. (2005). The effect of missing data imputation on Mokken scale analysis. In L. A. Van der Ark, M. A. Croon, & K. Sijtsma (Eds.), New developments in categorical data analysis for the social and behavioral sciences (pp. 147–166). Mahwah, NJ: Lawrence Erlbaum.

    Google Scholar 

  • Verweij, A. C., Sijtsma, K., & Koops, W. (1996). A Mokken scale for transitive reasoning suited for longitudinal research. International Journal of Behavioral Development, 23, 241–264.

    Article  Google Scholar 

  • Zhang, J., & Stout, W. F. (1999). The theoretical DETECT index of dimensionality and its application to simple structure. Psychometrika, 64, 213–249.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Michael J. Brusco.

Appendices

Appendix 1

The problem of partitioning a set of binary scale items based on Ellis’ (2014) inequality can be formulated as a BILP. The formulation uses the parameters described in Section 3. The decision variables for the formulation are denoted \(y_{jk}\), where \(y_{jk} = 1\) if item \(j\) is assigned to cluster \(k\) and \(y_{jk} = 0\), otherwise, \(\forall \, 1 \le j \le J\) and \(1 \le k \le K\). The partitioning problem can be formulated as a BILP as follows:

$$\begin{aligned} \text {Max: }&f=\sum _{j=1}^J {\sum _{k=1}^K {\eta _k y_{jk} } } \end{aligned}$$
(7)
$$\begin{aligned} \text {subject to:}&\sum _{k=1}^K {y_{jk} \le 1} \,\forall \,1 \le j \le J; \end{aligned}$$
(8)
$$\begin{aligned}&y_{ik}+y_{jk} \le 1, \quad \forall \,1 \le i < j \le J : \Gamma _{ij} = 1, \forall \, 1 \le k \le K; \end{aligned}$$
(9)
$$\begin{aligned}&y_{gk}+y_{ik}+y_{jk} \le 2, \quad \forall \, 1 \le g < i < j \le J : \Phi _{gij} = 1, \forall \, 1 \le k \le K;\quad \quad \end{aligned}$$
(10)
$$\begin{aligned}&y_{jk} \in \{0, 1\}, \quad \forall \, 1 \le j \le J, \forall \, 1 \le k \le K. \end{aligned}$$
(11)

The objective function (7) of the BILP is to maximize the sum of the weighted cluster cardinalities, which is denoted as \(f\). Recall from Section 3 that the \(\eta _{k}=J^{(-k)}\) are the objective function weights afforded to the assignment of an item to cluster \(k, \forall \, 1 \le k \le K\).

Constraint set (8) guarantees that each item is assigned to no more than one cluster. Constraint set (9) prohibits two items, \(i\) and \(j\), from being placed in the same cluster if their bivariate correlation is negative. Constraint set (10) enforces the restrictions on the partial correlations, as described by Ellis (2014). Constraint (11) requires that the assignment variables are binary.

Appendix 2

The current cluster assignment variables for the branch-and-bound algorithm are denoted \(a(j)\), where \(a(j)=k\) indicates that item \(j\) is assigned to cluster \(k, \forall \, 1 \le j \le J\). During the search process, the best-found cluster assignments (i.e., those associated with the current lower bound, \(f_{\mathrm{LB}}\)) are stored as variables \(a^*(j), \forall \, 1 \le j \le J\). The algorithm also makes use of the following auxiliary variables: \(q(k)\) is the number of items assigned to cluster \(k, \forall \, 1 \le k \le K; \xi (k)\) is a computed upper bound on the number of items that could possibly be assigned to cluster \(k, \forall \, 1 \le k \le K\); \(\omega (k)\) is a variable that facilitates the computation of \(\xi (k), \forall \, 1 \le k \le K\); and \(f_{\mathrm{UB}}\) is a computed upper bound on the objective function. The precise steps of the branch-and-bound algorithm are as follows:

  1. Step 0.

    INITIALIZATION. Set \(j = 0, f = 0, f_{\mathrm{LB}} = -\infty , a(j) = 0\, \forall \,1 \le j \le J, a^*(j) = 0\, \forall \,1 \le j \le J\) and \(q(k) = 0 \,\forall \,1 \le k \le K\).

  2. Step 1.

    SET CONSTRAINT PARAMETERS. Compute \(\Phi _{gij}\) using (2) for all 1 \(\le g, i, j \le J\).

  3. Step 2.

    STEP FORWARD. Set \(j=j + 1, k = 1, a(j)=k\), and \(q(k)=q(k) + 1\). If \(k=K\), then go to Step 4; otherwise, proceed to Step 3.

  4. Step 3.

    FEASIBILITY TESTS. Use the pseudo code in Figure 1 to perform the feasibility tests with respect to pairs formed by item \(j\) with previously assigned items (\(1 \le i \le j - 1\)), as well as triples formed by item \(j\) with previously assigned items (\(1 \le g < i \le j - 1\)). If the result from the pseudo code is “pass” after both tests, then go to Step 4; otherwise, go to Step 6.

  5. Step 4.

    BOUND TEST. Use the pseudo code in Figure 2 to perform the bound test. If “pass,” then go to Step 5; otherwise, go to Step 6.

  6. Step 5.

    NEW BEST-FOUND SOLUTION. If \(j < J\), then proceed to Step 2; otherwise set \(a^*(i)=a(i)\) for \(1 \le i \le J, f_{\mathrm{LB}}=f_{\mathrm{UB}}\), and proceed to Step 6.

  7. Step 6.

    LAST CLUSTER? If \(k=K\), then go to Step 8; otherwise, go to Step 7.

  8. Step 7.

    BRANCH RIGHT. Set \(q(k)=q(k) - 1, k=k + 1, a(j)=k\), and \(q(k)=q(k) + 1\), and return to Step 3.

  9. Step 8.

    RETRACTION. Set \(a(j) = 0, q(k)=q(k) - 1\), and \(j=j - 1\). If \(j = 0\), then return \(a^*(i)\) for \(1 \le i \le J, f_{\mathrm{LB}}\), and STOP; otherwise, identify \(k : a(j)=k\) and go to Step 6.

Fig. 1
figure 1

Pseudo code for the FEASIBILITY TEST (Step 3) of the branch-and-bound algorithm.

Fig. 2
figure 2

Pseudo code for the BOUND TEST (Step 4) of the branch-and-bound algorithm.

The algorithm begins in Step 0 with the initialization of all parameters and arrays. The \(\Phi _{gij}\) parameters are computed using (2) in Step 1. Step 2 increments the item pointer index (\(j\)), sets the cluster index to one (\(k = 1\)), assigns item \(j\) to cluster \(k\,(a(j)=k)\), and increments the counter for the number of items in cluster \(k\,(q(k)=q(k) + 1)\). Step 3 uses the pseudo code in Figure 1 to ensure that item \(j\) is positively correlated to other items assigned to cluster \(k\), and to determine whether Ellis’ (2014) partial correlation condition is satisfied if item \(j\) is assigned to cluster \(k\). This latter component of the feasibility test is accomplished via the examination of all partial correlations associated with item \(j\) and all possible pairs of items (\(g, i\)) previously assigned to cluster \(k\). If any triple (g, \(i, j\)) violates the condition, then the result of the test is “fail”; otherwise, the result is “pass.” Note that Step 3 is not performed if \(k=K\) because that cluster is reserved for unscaled items that do not have to satisfy any conditions.

Step 4 uses the pseudo code in Figure 2 to perform a test to determine whether completion of the current partial solution can possibly produce an objective function value that exceeds the current lower bound on the objective function value. Part of the upper bound (\(f_{\mathrm{UB}}\)) is computed from the direct assignments of the first \(j\) items, whereas the remainder of the bound stems from the potential contributions of the yet unassigned items. The second component of the bound is determined based on the feasibility of assigning the yet unassigned items to each cluster. If the feasibility and bound tests in Steps 3 and 4 are “passed,” then a check is made in Step 5 to determine if the assignment of items is complete. If not all of the \(J\) items have been assigned to clusters (i.e., \(j < J\)), then control returns to Step 2 to assign the next item. However, if \(j=J\), then the current assignment is installed as the best-found assignment (\(a\)*(\(j\)), for 1 \(\le j \le J\)) and the lower bound is set equal to the upper bound (\(f_{\mathrm{LB}}=f_{\mathrm{UB}}\)).

The current cluster assignment for item \(j\) determines the processing at Step 6. If \(k < K\), then a branch right operation occurs at Step 7 whereby item \(j\) is moved from cluster \(k\) to cluster \(k+1\), and control is subsequently passed to Step 3 for the feasibility test. However, if \(k=K\) at Step 5, then it is not possible to branch right because there is no cluster index \(K+1\) and, therefore, depth retraction occurs in Step 8. This occurs by removing the assignment for item \(j\) and then reducing the item index pointer (\(j=j - 1\)). The algorithm terminates when \(j = 0\), which indicates that all partitions have been either implicitly or explicitly evaluated. The incumbent best-found assignment is returned as the optimal partition, and \(f_{\mathrm{LB}}\) is the optimal value of the objective function (\(f^*\)).

Given that cluster \(K\) is reserved for unscaled items, it is helpful to determine what occurs when too many or too few clusters are selected. Suppose we have \(J = 10\) items, and there is one four-item scale, one three-item scale, and three items that cannot be scaled. If we specify two clusters, only the four-item scale is found and the other six items are placed in the cluster of unscaled items. If we specify three clusters, then cluster sizes are {4, 3, 3}, with the last cluster containing the three unscaled items. If we specify too many clusters, say 4, 5, or 6, then the sizes of the clusters would be {4, 3, 1, 2}, {4, 3, 1, 1, 1} and {4, 3, 1, 1, 1, 0}, respectively. Succinctly, the last cluster always contains unscaled items, and any item that is the only item in a cluster is also an unscaled item.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Brusco, M.J., Köhn, HF. & Steinley, D. An Exact Method for Partitioning Dichotomous Items Within the Framework of the Monotone Homogeneity Model. Psychometrika 80, 949–967 (2015). https://doi.org/10.1007/s11336-015-9459-8

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11336-015-9459-8

Keywords

Navigation