Elsevier

Behaviour Research and Therapy

Volume 98, November 2017, Pages 91-102
Behaviour Research and Therapy

Fitting latent variable mixture models

https://doi.org/10.1016/j.brat.2017.04.003Get rights and content

Highlights

  • Latent variable mixture models (LVMMs) are models for multivariate observed data from a potentially heterogeneous population.

  • The observed responses are thought to be driven by one or more latent continuous factors and/or latent categorical variables.

  • The first part of this paper provides the theoretical background of LVMMs, emphasizing their exploratory character.

  • The second part provides a growth mixture modeling example with simulated data and covers practical issues when fitting LVMMs.

Abstract

Latent variable mixture models (LVMMs) are models for multivariate observed data from a potentially heterogeneous population. The responses on the observed variables are thought to be driven by one or more latent continuous factors (e.g. severity of a disorder) and/or latent categorical variables (e.g., subtypes of a disorder). Decomposing the observed covariances in the data into the effects of categorical group membership and the effects of continuous trait differences is not trivial, and requires the consideration of a number of different aspects of LVMMs. The first part of this paper provides the theoretical background of LVMMs and emphasizes their exploratory character, outlines the general framework together with assumptions and necessary constraints, highlights the difference between models with and without covariates, and discusses the interrelation between the number of classes and the complexity of the within-class model as well as the relevance of measurement invariance. The second part provides a growth mixture modeling example with simulated data and covers several practical issues when fitting LVMMs.

Introduction

Latent variable mixture models (LVMMs) combine latent class analysis models and factor models or more complex structural equation models (Muthén, 2001). LVMMs are most commonly used to investigate population heterogeneity, which refers to the presence of subgroups in the population. LVMMs can serve to analyse data from heterogeneous populations without knowing beforehand which individual belongs to which of the subgroups.

The simplest types of mixture models are latent class analysis (LCA) models. These models are designed for multiple observed variables (e.g., symptom endorsements, of questionnaire items), and have a single latent class variable that groups the individuals in a sample into a user-specified number of latent groups (Lazarsfeld and Henry, 1968, McCutcheon, 1987). LCA models do not have factors within class, and the covariances between the observed variables within class are constrained to zero.1 This is a very stringent assumption. Suppose we have 5 observed items measuring some disorder. Not allowing these items to covary within class means that there are no systematic severity differences between participants within a class in LCA models. The covariances between observed variables in the total sample only deviate from zero due to mean differences between the classes.

Factor models on the other hand are models for a single homogeneous population (i.e., no differences between subtypes), and observed variables in the sample are assumed to covary due to systematic differences along the underlying continuous latent factors (Bollen, 1989).

LVMMs can have one or more latent class variables, and permit the specification of factor models, growth models, or even more complex models within each class. If the within class model is a factor model, the resulting LVMM is often called factor mixture model. Covariances between observed variables in the total sample are attributed partially to mean differences between classes, and partially to continuous latent factors within each class. For example, consider data collected on several questionnaire items that measure anger. Suppose the population consists of two groups, a majority group of participants with very low levels of anger and a smaller group characterized by high scores on most of the items. The observed anger items in the total sample covary because of the mean differences between the two groups. In addition, the items can also covary if there are differences in the severity of anger within each group. These two sources of covariance are modeled in LVMMs by using latent categorical and latent continuous variables.

Latent class models are a special case of the LVMM where factor variances (or, alternatively, factor loadings) are zero. In the anger example this would mean that all participants within the low-scoring class do not differ in the severity of anger (i.e., zero anger factor variance within group). The same holds for the high scoring group: the assumption of the latent class model is no variability of anger within group because if there were systematic anger differences within class then the items would in fact covary. The observed covariances between the anger items in this model are modeled to be entirely due to mean differences between the groups. Factor models for a homogeneous population are also a special case: they are LVMMs with a single latent class. In the anger example this would boil down to neglecting the presence of two subgroups, and attributing all covariances to one underlying anger factor within a single homogenous population.

The LVMM framework is extremely flexible, and permits the specification of different types of mixture models. Models such as path models, factor models, survival models, growth curve models, and more general structural equation models can all be specified for multiple subgroups instead of for a single homogeneous population (see for instance Arminger et al., 1999, Dolan and van der Maas, 1998; Jedidi et al., 1997, Muthén and Shedden, 1999, Muthén and Muthén, 2000, Ram and Grimm, 2009, Varriale and Vermunt, 2012, Vermunt, 2008, Yung, 1997). The flexibility comes at a price. The framework is built on a set of assumptions that should be realistic for the data. Further, in order to estimate a model, all relations between observed variables, between observed variables and latent variables, and between latent variables have to be specified. It is therefore necessary to decide whether within-class model parameters are class specific, or are the same for all classes (i.e., class invariant). As will be discussed in this paper, the interpretation of the model depends on these decisions. It is important to note that different within-class parameterization can influence how many classes best fit the data (Lubke & Neale, 2008). However, comparing a set of carefully parameterized mixture models can provide great insight into the processes and interrelations between variables when the assumption of population homogeneity is unrealistic.

The paper is organized into two main parts. The first part provides the theoretical background. After discussing the generally exploratory character of mixture analyses, the modeling framework is presented together with some of the necessary assumptions and constraints. The first part concludes with the discussion of issues that deserve consideration prior to fitting models to data, such as the interrelation between number of classes and within-class model complexity, measurement invariance, and models with and without covariates. The second part consists of a growth mixture analysis with covariates, and illustrates some of the practical issues discussed in the first part of the paper.

Section snippets

Exploration of heterogeneity using mixture models

Latent variable mixture models (LVMMs) afford the possibility to detect groups of subjects in a sample, and to investigate the differences between the groups. LVMMs differ from other techniques to detect groups in data, such as taxometrics and cluster analysis, in that they require the user to specify all relations between observed and latent variables in the model (Lubke and Miller, 2014, Meehl, 1995). LVMMs are therefore prone to misspecifications. However, if there is sufficient a priori

Illustration of growth mixture modeling using simulated data

This illustration concerns a longitudinal analysis using growth mixture models, and assumes the reader is familiar with linear and quadratic growth curve models for a single homogeneous population. For this illustration data were generated for an intermediate sample size (N = 1200) with 5 measurement occasions.

Part II summary

The overall results address the main research questions highlighted at the outset of the data analysis. There are two subgroups in the population that have meaningfully different trajectories. The intercept factor, or baseline level, varies across individuals, but fixing the linear and quadratic slopes is adequate for these data. Including the covariates in the mixture model drastically changes the interpretation of the data structure. Similar conditional and unconditional models differed in

Conclusion

The mixture modeling framework is largely an exploratory device. A number of assumptions and constraints are necessary to fit mixture models to data. These assumptions need to be realistic for a given data set, and should correspond to existing knowledge about the data. Apart from assumptions that are inherited from structural equation modeling (e.g. distributional assumptions, linear relations), the assumption that each mixture component corresponds to a meaningful group in the sample (i.e.,

References (50)

  • G.J. Mellenbergh

    Item bias and item response theory

    International Journal of Educational Research

    (1989)
  • A. Agresti

    Categorical data analysis

    (2002)
  • G. Arminger et al.

    Mixtures of conditional mean- and covariance-structure models

    Psychometrika

    (1999)
  • T. Asparouhov et al.

    Auxiliary variables in mixture modeling: Three-Step approaches using M plus

    Structural Equation Modeling: A Multidisciplinary Journal

    (2014)
  • J.T. Behrens

    Principles and procedures of exploratory data analysis

    Psychological Methods

    (1997)
  • K.A. Bollen

    Structural equations with latent variables

    (1989)
  • C.V. Dolan et al.

    Fitting multivariate normal finite mixtures subject to structural equation modeling

    Psychometrika

    (1998)
  • K.J. Grimm et al.

    Model selection in finite mixture models: A k-fold cross-validation approach

    Structural Equation Modeling: A Multidisciplinary Journal

    (2017)
  • C. Hurvich et al.

    The impact of model selection on inference in linear regression

    The American Statistician

    (1990)
  • K. Jedidi et al.

    STEMM: A general finite mixture structural equation model

    Journal of Classification

    (1997)
  • N.O. Jeffries

    A note on ‘Testing the number of components in a normal mixture’

    Biometrika

    (2003)
  • M. Kim et al.

    Modeling predictors of latent classes in regression mixture models

    Structural Equation Modeling: A Multidisciplinary Journal

    (2016)
  • P.F. Lazarsfeld et al.

    Latent structure analysis

    (1968)
  • L. Li et al.

    On inclusion of covariates for class enumera- tion of growth mixture models

    Multivariate Behavioral Research

    (2011)
  • G.H. Lubke et al.

    Assessing model selection uncertainty using a bootstrap approach: An update

    Structural Equation Modeling: A Multidisciplinary Journal

    (2017)
  • G. Lubke et al.

    Inference based on the best-fitting model can contribute to the replication crisis: Assessing model selection uncertainty using a bootstrap approach

    Structural Equation Modeling

    (2016)
  • G.H. Lubke et al.

    Does nature have joints worth carving? A discussion of taxometrics, model-based clustering and latent variable mixture modeling

    Psychological Medicine

    (2014)
  • G.H. Lubke et al.

    Performance of factor mixture models as a function of model size, covariate effects, and class-specific parameters

    Structural Equation Modeling: A Multidisciplinary Journal

    (2007)
  • G.H. Lubke et al.

    Distinguishing between latent classes and continuous factors: Resolution by maximum likelihood

    Multivariate Behavioral Research

    (2006)
  • G.H. Lubke et al.

    Distinguishing between latent classes and continuous factors with categorical outcomes: Class Invariance of parameters of factor mixture models

    Multivariate Behavioral Research

    (2008)
  • G.H. Lubke et al.

    Choosing a “correct” factor mixture model: Power, limitations, and graphical data exploration

  • G.H. Lubke et al.

    Latent class detection and class assignment: A comparison of the MAXEIG Taxometric procedure and factor mixture modeling approaches

    Structural Equation Modeling: A Multidisciplinary Journal

    (2010)
  • Z.L. Lu et al.

    Bayesian inference for growth mixture models with latent class dependent missing data

    Multivariate Behavioral Research

    (2011)
  • K.E. Masyn

    Measurement invariance and differential item functioning in latent class analysis with stepwise multiple indicator multiple cause modeling

    Structural Equation Modeling: A Multidisciplinary Journal

    (2017)
  • A.L. McCutcheon
    (1987)
  • Cited by (0)

    View full text