Elsevier

Pattern Recognition

Volume 80, August 2018, Pages 83-93
Pattern Recognition

Finite mixtures of skewed matrix variate distributions

https://doi.org/10.1016/j.patcog.2018.02.025Get rights and content

Highlights

  • The first use of skewed matrix variate distributions for clustering and for semi-supervised classification.

  • The first use of mixtures of skewed matrix variate distributions.

  • In addition to component skewness, component concentration (tail weight) is also parameterized.

  • Helps lay the foundations for what is sure to be a rich body of work on matrix variate approaches to clustering.

Abstract

Clustering is the process of finding underlying group structures in data. Although mixture model-based clustering is firmly established in the multivariate case, there is a relative paucity of work on matrix variate distributions and none for clustering with mixtures of skewed matrix variate distributions. Four finite mixtures of skewed matrix variate distributions are considered. Parameter estimation is carried out using an expectation-conditional maximization algorithm, and both simulated and real data are used for illustration.

Introduction

Over the years, there has been increased interest in the applications involving three-way (matrix variate) data. Although there are countless examples of clustering for multivariate distributions using finite mixture models, as discussed in Section 2, there is very little work for matrix variate distributions. Moreover, the examples in the literature deal exclusively with symmetric (non-skewed) matrix variate distributions such as the matrix variate normal and the matrix variate t distributions.

There are many different areas of application for matrix variate distributions. One area is multivariate longitudinal data, where multiple variables are measured over time [e.g., [2]]. In this case, each row of a matrix would correspond to a time point and the columns would represent each of the variables. Furthermore, the two scale matrices, a defining characteristic of matrix variate distributions, allow for simultaneous modelling of the inter-variable covariances as well as the temporal covariances. A second application, considered herein, is image recognition. In this case, an image is analyzed as an n × p pixel intensity matrix. Herein, a finite mixture of four different skewed matrix distributions, the matrix variate skew-t, generalized hyperbolic, variance-gamma and normal inverse Gaussian (NIG) distributions are considered. These mixture models are illustrated for both clustering (unsupervised classification) and semi-supervised classification using both simulated and real data.

Section snippets

Model-based clustering and mixture models

Clustering and classification look at finding and analyzing underlying group structures in data. One common method used for clustering is model-based, and generally makes use of a G-component finite mixture model. A multivariate random variable X from a finite mixture model has density f(x|ϑ)=g=1Gπgfg(x|θg),where ϑ=(π1,π2,,πG,θ1,θ2,,θG), fg( · ) is the gth component density, and πg > 0 is the gth mixing proportion such that i=1Gπg=1. McNicholas [37] traces the association between clustering

Likelihoods

In the mixture model context, X is assumed to come from a population with G subgroups each distributed according to the same one of the four skewed matrix variate distributions discussed previously. Now suppose N n × p matrices X1,X2,,XN are observed, then the observed-data likelihood is L(ϑ)=i=1Ng=1Gπgf(Xi|Mg,Ag,Σg,Ψg,θg),where θg are the parameters associated with the distribution of Wig. For the purposes of parameter estimation, we proceed as if the observed data is incomplete. In

Overview

Two simulations are performed, where the first simulation has two groups and the second has three. The chosen parameters have no intrinsic meaning; however, they can be viewed as representations of multivariate longitudinal data and the parameters introduced by the distribution of Wig are meant to illustrate the flexibility in concentration. Simulation 1 considers 3 × 4 data, Simulation 2 illustrates 4 × 3 data. In the first simulation, Σg and Ψg are set to Σ1=(10.50.10.510.50.10.51),Σ2=(10.10.1

Image recognition example

We now apply the matrix variate mixture models introduced herein to image recognition with the MNIST handwriting dataset [29]. The original dataset consists of 60,000 training images of handwritten digits 0 to 9, which can be represented as 28 × 28 pixel matrices with greyscale intensities ranging from 0 to 255. However, these dimensions resulted in an infinite calculation for the Bessel function and its derivative with respect to λ. Moreover, because two unstructured 28 × 28 dimensional

Discussion

Four matrix variate mixture distributions, with component densities that parameterize skewness, have been used for model-based clustering — and its semi-supervised analogue — of three-way data. Specifically, we considered MVST, MVGH, MVVG, and MVNIG mixtures, respectively, and an ECM algorithm was used for parameter estimation in each case. Simulated and real data were used for illustration. In the first simulation, there was good separation between the two groups and, in the second, we

Michael P.B. Gallaugher is a Ph.D. student in statistics at McMaster University in Hamilton, Ontario, Canada. He holds a prestigious Vanier Canada Graduate Scholarship from the Natural Sciences and Engineering Research Council of Canada. His research focuses on clustering, with some emphasis on matrix variate distributions; he has recently published work on a matrix variate skew-t distribution in the International Statistics Institute journal Stat.

References (61)

  • P.M. Murray et al.

    Hidden truncation hyperbolic distributions, finite mixtures thereof, and their application for clustering

    J. Multivar. Anal.

    (2017)
  • I. Vrbik et al.

    Analytic calculations for the EM algorithm for multivariate skew-t mixture models

    Stat. Probab. Lett.

    (2012)
  • I. Vrbik et al.

    Parsimonious skew mixture models for model-based clustering and classification

    Comput. Stat. Data Anal.

    (2014)
  • A.C. Aitken

    A series formula for the roots of algebraic and transcendental equations

    Proc. R. Soc. Edinb.

    (1926)
  • L. Anderlucci et al.

    Covariance pattern mixture models for the analysis of multivariate heterogeneous longitudinal data

    Ann. Appl. Stat.

    (2015)
  • J.L. Andrews et al.

    Extending mixtures of multivariate t-factor analyzers

    Stat. Comput.

    (2011)
  • J.L. Andrews et al.

    Model-based clustering, classification, and discriminant analysis via mixtures of multivariate t-distributions: the tEIGEN family

    Stat. Comput.

    (2012)
  • A. Baricz

    Tur type inequalities for some probability density functions

    Studia Scientiarum Mathematicarum Hungarica

    (2010)
  • L.E. Baum et al.

    A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains

    Ann. Math. Stat.

    (1970)
  • C. Biernacki et al.

    Assessing a mixture model for clustering with the integrated completed likelihood

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2000)
  • D. Böhning et al.

    The distribution of the likelihood ratio for mixtures of densities from the one-parameter exponential family

    Ann. Inst. Stat. Math.

    (1994)
  • R.P. Browne et al.

    A mixture of generalized hyperbolic distributions

    Can. J. Stat.

    (2015)
  • G. Celeux et al.

    Computational and inferential difficulties with mixture posterior distributions

    J. Am. Stat. Assoc.

    (2000)
  • J.T. Chen et al.

    Matrix variate skew normal distributions

    Statistics (Ber)

    (2005)
  • U.J. Dang et al.

    Mixtures of multivariate power exponential distributions

    Biometrics

    (2015)
  • F.Z. Doğru et al.

    Finite mixtures of matrix variate t distributions

    Gazi Univ. J. Sci.

    (2016)
  • J.A. Domínguez-Molina et al.

    A matrix variate closed skew-normal distribution with applications to stochastic frontier analysis

    Commun. Stat. Theory Methods

    (2007)
  • B.C. Franczak et al.

    Mixtures of shifted asymmetric Laplace distributions

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2014)
  • M.P.B. Gallaugher et al.

    A matrix variate skew-t distribution

    Statistics

    (2017)
  • M.P.B. Gallaugher, P.D. McNicholas, Three skewed matrix variate distributions, 2017b,...
  • Cited by (77)

    View all citing articles on Scopus

    Michael P.B. Gallaugher is a Ph.D. student in statistics at McMaster University in Hamilton, Ontario, Canada. He holds a prestigious Vanier Canada Graduate Scholarship from the Natural Sciences and Engineering Research Council of Canada. His research focuses on clustering, with some emphasis on matrix variate distributions; he has recently published work on a matrix variate skew-t distribution in the International Statistics Institute journal Stat.

    Paul D. McNicholas is the Canada Research Chair in Computational Statistics at McMaster University, where he is a Professor and University Scholar in the Department of Mathematics and Statistics as well as Director of the MacDATA Institute. He has published extensively in computational statistics, with the vast majority of his over 80 journal articles focusing on mixture model-based clustering. He is one of the leaders in this field and recently published a monograph devoted to the topic (Mixture Model-Based Classification; Chapman and Hall/CRC Press, 2016). He is a Senior Member of the IEEE and a member of the College of the Royal Society of Canada.

    The authors gratefully acknowledge the very helpful comments of two anonymous reviewers as well as the support of a Vanier Canada Graduate Scholarship (Gallaugher) and the Canada Research Chairs program (McNicholas).

    View full text