Finite mixtures of skewed matrix variate distributions☆
Introduction
Over the years, there has been increased interest in the applications involving three-way (matrix variate) data. Although there are countless examples of clustering for multivariate distributions using finite mixture models, as discussed in Section 2, there is very little work for matrix variate distributions. Moreover, the examples in the literature deal exclusively with symmetric (non-skewed) matrix variate distributions such as the matrix variate normal and the matrix variate t distributions.
There are many different areas of application for matrix variate distributions. One area is multivariate longitudinal data, where multiple variables are measured over time [e.g., [2]]. In this case, each row of a matrix would correspond to a time point and the columns would represent each of the variables. Furthermore, the two scale matrices, a defining characteristic of matrix variate distributions, allow for simultaneous modelling of the inter-variable covariances as well as the temporal covariances. A second application, considered herein, is image recognition. In this case, an image is analyzed as an n × p pixel intensity matrix. Herein, a finite mixture of four different skewed matrix distributions, the matrix variate skew-t, generalized hyperbolic, variance-gamma and normal inverse Gaussian (NIG) distributions are considered. These mixture models are illustrated for both clustering (unsupervised classification) and semi-supervised classification using both simulated and real data.
Section snippets
Model-based clustering and mixture models
Clustering and classification look at finding and analyzing underlying group structures in data. One common method used for clustering is model-based, and generally makes use of a G-component finite mixture model. A multivariate random variable X from a finite mixture model has density where fg( · ) is the gth component density, and πg > 0 is the gth mixing proportion such that . McNicholas [37] traces the association between clustering
Likelihoods
In the mixture model context, is assumed to come from a population with G subgroups each distributed according to the same one of the four skewed matrix variate distributions discussed previously. Now suppose N n × p matrices are observed, then the observed-data likelihood is where θg are the parameters associated with the distribution of Wig. For the purposes of parameter estimation, we proceed as if the observed data is incomplete. In
Overview
Two simulations are performed, where the first simulation has two groups and the second has three. The chosen parameters have no intrinsic meaning; however, they can be viewed as representations of multivariate longitudinal data and the parameters introduced by the distribution of Wig are meant to illustrate the flexibility in concentration. Simulation 1 considers 3 × 4 data, Simulation 2 illustrates 4 × 3 data. In the first simulation, Σg and Ψg are set to
Image recognition example
We now apply the matrix variate mixture models introduced herein to image recognition with the MNIST handwriting dataset [29]. The original dataset consists of 60,000 training images of handwritten digits 0 to 9, which can be represented as 28 × 28 pixel matrices with greyscale intensities ranging from 0 to 255. However, these dimensions resulted in an infinite calculation for the Bessel function and its derivative with respect to λ. Moreover, because two unstructured 28 × 28 dimensional
Discussion
Four matrix variate mixture distributions, with component densities that parameterize skewness, have been used for model-based clustering — and its semi-supervised analogue — of three-way data. Specifically, we considered MVST, MVGH, MVVG, and MVNIG mixtures, respectively, and an ECM algorithm was used for parameter estimation in each case. Simulated and real data were used for illustration. In the first simulation, there was good separation between the two groups and, in the second, we
Michael P.B. Gallaugher is a Ph.D. student in statistics at McMaster University in Hamilton, Ontario, Canada. He holds a prestigious Vanier Canada Graduate Scholarship from the Natural Sciences and Engineering Research Council of Canada. His research focuses on clustering, with some emphasis on matrix variate distributions; he has recently published work on a matrix variate skew-t distribution in the International Statistics Institute journal Stat.
References (61)
- et al.
Discrete data clustering using finite mixture models
Pattern Recognit.
(2009) - et al.
Gaussian parsimonious clustering models
Pattern Recognit.
(1995) - et al.
Simultaneous high-dimensional clustering and feature selection using asymmetric Gaussian mixture models
Image Vis. Comput.
(2015) - et al.
Unsupervised learning via mixtures of skewed distributions with hypercube contours
Pattern Recognit. Lett.
(2015) - et al.
Finite mixtures of multivariate Poisson distributions with application
J. Stat. Plan Inference
(2007) - et al.
Capturing patterns via parsimonious t mixture models
Stat. Probab. Lett.
(2014) Model-based classification using latent Gaussian mixture models
J. Stat. Plan Inference
(2010)- et al.
Serial and parallel implementations of model-based clustering via parsimonious Gaussian mixture models
Comput. Stat. Data Anal.
(2010) - et al.
Dimension reduction for model-based clustering via mixtures of shifted asymmetric Laplace distributions
Stat. Probab. Lett.
(2013) - et al.
Mixtures of skew-t factor analyzers
Comput. Stat. Data Anal.
(2014)
Hidden truncation hyperbolic distributions, finite mixtures thereof, and their application for clustering
J. Multivar. Anal.
Analytic calculations for the EM algorithm for multivariate skew-t mixture models
Stat. Probab. Lett.
Parsimonious skew mixture models for model-based clustering and classification
Comput. Stat. Data Anal.
A series formula for the roots of algebraic and transcendental equations
Proc. R. Soc. Edinb.
Covariance pattern mixture models for the analysis of multivariate heterogeneous longitudinal data
Ann. Appl. Stat.
Extending mixtures of multivariate t-factor analyzers
Stat. Comput.
Model-based clustering, classification, and discriminant analysis via mixtures of multivariate t-distributions: the tEIGEN family
Stat. Comput.
Tur type inequalities for some probability density functions
Studia Scientiarum Mathematicarum Hungarica
A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains
Ann. Math. Stat.
Assessing a mixture model for clustering with the integrated completed likelihood
IEEE Trans. Pattern Anal. Mach. Intell.
The distribution of the likelihood ratio for mixtures of densities from the one-parameter exponential family
Ann. Inst. Stat. Math.
A mixture of generalized hyperbolic distributions
Can. J. Stat.
Computational and inferential difficulties with mixture posterior distributions
J. Am. Stat. Assoc.
Matrix variate skew normal distributions
Statistics (Ber)
Mixtures of multivariate power exponential distributions
Biometrics
Finite mixtures of matrix variate t distributions
Gazi Univ. J. Sci.
A matrix variate closed skew-normal distribution with applications to stochastic frontier analysis
Commun. Stat. Theory Methods
Mixtures of shifted asymmetric Laplace distributions
IEEE Trans. Pattern Anal. Mach. Intell.
A matrix variate skew-t distribution
Statistics
Cited by (77)
Optimality in high-dimensional tensor discriminant analysis
2023, Pattern RecognitionA symmetric matrix-variate normal local approximation for the Wishart distribution and some applications
2022, Journal of Multivariate AnalysisClustering longitudinal ordinal data via finite mixture of matrix-variate distributions
2024, Statistics and ComputingMatrix-variate normal mean-variance Birnbaum–Saunders distributions and related mixture models
2024, Computational StatisticsContamination transformation matrix mixture modeling for skewed data groups with heavy tails and scatter
2024, Advances in Data Analysis and ClassificationModeling matrix variate time series via hidden Markov models with skewed emissions
2024, Statistical Analysis and Data Mining
Michael P.B. Gallaugher is a Ph.D. student in statistics at McMaster University in Hamilton, Ontario, Canada. He holds a prestigious Vanier Canada Graduate Scholarship from the Natural Sciences and Engineering Research Council of Canada. His research focuses on clustering, with some emphasis on matrix variate distributions; he has recently published work on a matrix variate skew-t distribution in the International Statistics Institute journal Stat.
Paul D. McNicholas is the Canada Research Chair in Computational Statistics at McMaster University, where he is a Professor and University Scholar in the Department of Mathematics and Statistics as well as Director of the MacDATA Institute. He has published extensively in computational statistics, with the vast majority of his over 80 journal articles focusing on mixture model-based clustering. He is one of the leaders in this field and recently published a monograph devoted to the topic (Mixture Model-Based Classification; Chapman and Hall/CRC Press, 2016). He is a Senior Member of the IEEE and a member of the College of the Royal Society of Canada.
- ☆
The authors gratefully acknowledge the very helpful comments of two anonymous reviewers as well as the support of a Vanier Canada Graduate Scholarship (Gallaugher) and the Canada Research Chairs program (McNicholas).