Parametric Cluster Analysis and Mixture Regression

Mair, Patrick

doi:10.1007/978-3-319-93177-7_12

Patrick Mair⁵

Part of the book series: Use R! ((USE R))

5744 Accesses

Abstract

This chapter is about advanced parametric clustering techniques based on the concept of mixture distributions. The first section introduces mixture distributions from a general perspective, followed by two popular applications in clustering: normal mixture models (latent profile analysis) for metric input variables and multinomial mixture models (latent class analysis) for categorical variables. Subsequently, these ideas are extended to mixed input scale levels. In the following section, the mixture distribution concept is embedded into a regression framework. In mixture regression models, clustering and estimation of regression parameters are performed simultaneously. By means of Dirichlet process regression, we add another complexity layer to the modeling framework by letting an algorithm determine the optimal number of clusters. Finally, the focus is on latent Dirichlet allocations: topic models for clustering text data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Note that in mclust a maximum-BIC strategy is used; that is, the higher the BIC, the better the fit.
2.
In practice, the user should again try out different numbers of clusters and pick the one with the lowest BIC.
3.
For an overview see the corresponding task view on CRAN (URL: https://cran.r-project.org/web/views/NaturalLanguageProcessing.html).
4.
Other packages for topic modeling in R are mallet (Mimno, 2013) and lda (Chang, 2015).

References

Betebenner, D. W. (2017). randomNames: Function for generating random names and a dataset. R package version 0.4-0. https://cran.r-project.org/package=randomNames
Chang, J. (2015). lda: Collapsed Gibbs sampling methods for topic models. R package version 1.4.2. https://CRAN.R-project.org/package=lda
Charrad, M., Ghazzali, N., Boiteau, V., & Niknafs, A. (2014). NbClust: An R package for determining the relevant number of clusters in a dataset. Journal of Statistical Software, 61(6), 1–36. http://www.jstatsoft.org/v61/i06/
Article Google Scholar
da Silva, A. F. (2012). dpmixsim: Dirichlet process mixture model simulation for clustering and image segmentation. R package version 0.0-8. https://CRAN.R-project.org/package=dpmixsim
Ellis, P. (2017). Cross-validation of topic modelling. R-bloggers. http://www.r-bloggers.com/cross-validation-of-topic-modelling/
Everitt, B. S. (2011). Cluster analysis (5th ed.). New York: Wiley.
Book Google Scholar
Everitt, B., & Hothorn, T. (2011). An introduction to applied multivariate analysis with R. New York: Springer.
Book Google Scholar
Feinerer, I., Hornik, K., & Meyer, D. (2008). Text mining infrastructure in R. Journal of Statistical Software, 25(5), 1–54. http://www.jstatsoft.org/v25/i05/
Article Google Scholar
Fellows, I. (2014). wordcloud: Word clouds. R package version 2.5. https://CRAN.R-project.org/package=wordcloud
Fraley, C., & Raftery, A. E. (2002). Model-based clustering, discriminant analysis and density estimation. Journal of the American Statistical Association, 97, 611–631.
Article MathSciNet Google Scholar
Frick, H., Strobl, C., Leisch, F., & Zeileis, A. (2012). Flexible Rasch mixture models with package psychomix. Journal of Statistical Software, 48(7), 1–25. http://www.jstatsoft.org/v48/i07/
Article Google Scholar
Gandrud, C. (2017). A link between topicmodels LDA and LDAvis. R-bloggers. https://www.r-bloggers.com/a-link-between-topicmodels-lda-and-ldavis/
Gershman, S. J., & Blei, D. M. (2012). A tutorial on Bayesian nonparametric models. Journal of Mathematical Psychology, 56, 1–12.
Article MathSciNet Google Scholar
Grün, B., & Hornik, K. (2011). topicmodels: An R package for fitting topic models. Journal of Statistical Software, 40(13), 1–30. https://doi.org/10.18637/jss.v040.i13
Article Google Scholar
Grün, B., & Leisch, F. (2008). FlexMix version 2: Finite mixtures with concomitant variables and varying and constant parameters. Journal of Statistical Software, 28(4), 1–35. http://www.jstatsoft.org/v28/i04/
Article Google Scholar
Haegeli, P., Gunn, M., & Haider, W. (2012). Identifying a high-risk cohort in a complex and dynamic risk environment: Out-of-bounds skiing—An example from avalanche safety. Prevention Science, 13, 562–573.
Article Google Scholar
Hornik, K., Meyer, D., & Buchta, C. (2016). slam: Sparse lightweight arrays and matrices. R package version 0.1-40. https://CRAN.R-project.org/package=slam
Koller, I., & Alexandrowicz, R. W. (2010). Eine psychometrische Analyse der ZAREKI-R mittels Rasch-Modellen [A psychometric analysis of the ZAREKI-R using Rasch models]. Diagnostica, 56, 57–67.
Article Google Scholar
Kwartler, T. (2017). Text mining in practice with R. New York: Wiley.
Book Google Scholar
Lazarsfeld, P. F., & Henry, N. W. (1968). Latent structure analysis. Boston: Houghton Mifflin.
MATH Google Scholar
Leisch, F. (2004). FlexMix: A general framework for finite mixture models and latent class regression in R. Journal of Statistical Software, 11(8), 1–18. https://www.jstatsoft.org/v011/i08
Article Google Scholar
Linzer, D. A., & Lewis, J. B. (2011). poLCA: An R package for polytomous variable latent class analysis. Journal of Statistical Software, 42(10), 1–29. http://www.jstatsoft.org/v42/i10/
Article Google Scholar
Maechler, M., Rousseeuw, P., Struyf, A., Hubert, M., & Hornik, K. (2017). cluster: Cluster analysis basics and extensions. R package version 2.0.6.
Google Scholar
Mair, P., Rusch, T., & Hornik, K. (2014). The grand old party: A party of values? SpringerPlus, 3(697), 1–10.
Google Scholar
Mair, P., Hofmann, E., Gruber, K., Zeileis, A., & Hornik, K. (2015). Motivation, values, and work design as drivers of participation in the R open source project for statistical computing. Proceedings of the National Academy of Sciences of the United States of America, 112, 14788–14792.
Google Scholar
McLachlan, G., & Peel, D. (2000). Finite mixture models. New York: Wiley.
Book Google Scholar
Meyer, D., Zeileis, A., & Hornik, K. (2006). The strucplot framework: Visualizing multi-way contingency tables with vcd. Journal of Statistical Software, 17(3), 1–48. http://www.jstatsoft.org/v17/i03/
Article Google Scholar
Mimno, D. (2013). mallet: A wrapper around the Java machine learning tool MALLET. R package version 1.0. https://CRAN.R-project.org/package=mallet
Murzintcev, N. (2016). ldatuning: Tuning of the latent Dirichlet allocation models parameters. R package version 0.2.0. https://CRAN.R-project.org/package=ldatuning
Savitsky, T. D., & Paddock, S. M. (2014). Bayesian semi- and non-parametric models for longitudinal data with multiple membership effects in R. Journal of Statistical Software, 57(3), 1–35. http://www.jstatsoft.org/v57/i03/
Article Google Scholar
Shotwell, M. S. (2013). profdpm: An R package for MAP estimation in a class of conjugate product partition models. Journal of Statistical Software, 53(8), 1–18. http://www.jstatsoft.org/v53/i08/
Article Google Scholar
Sievert, C., & Shirley, K. (2015). LDAvis: Interactive visualization of topic models. R package version 0.3.2. https://CRAN.R-project.org/package=LDAvis
Silge, J., & Robinson, D. (2017). Text mining with R : A tidy approach. Sebastopol: O’Reilly Media.
Google Scholar
Winter, B., & Grawunder, S. (2012). The phonetic profile of Korean formality. Journal of Phonetics, 40, 808–815.
Article Google Scholar
Wood, S. N. (2017). Generalized additive models: An introduction with R (2nd ed.). Boca Raton: CRC Press.
MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Psychology, Harvard University, Cambridge, MA, USA
Patrick Mair

Authors

Patrick Mair
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Mair, P. (2018). Parametric Cluster Analysis and Mixture Regression. In: Modern Psychometrics with R. Use R!. Springer, Cham. https://doi.org/10.1007/978-3-319-93177-7_12

Download citation

DOI: https://doi.org/10.1007/978-3-319-93177-7_12
Published: 21 September 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-93175-3
Online ISBN: 978-3-319-93177-7
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics