Skip to main content

Bayesian Mixed Membership Models for Soft Clustering and Classification

  • Conference paper
Classification — the Ubiquitous Challenge

Abstract

The paper describes and applies a fully Bayesian approach to soft clustering and classification using mixed membership models. Our model structure has assumptions on four levels: population, subject, latent variable, and sampling scheme. Population level assumptions describe the general structure of the population that is common to all subjects. Subject level assumptions specify the distribution of observable responses given individual membership scores. Membership scores are usually unknown and hence we can also view them as latent variables, treating them as either fixed or random in the model. Finally, the last level of assumptions specifies the number of distinct observed characteristics and the number of replications for each characteristic. We illustrate the flexibility and utility of the general model through two applications using data from: (i) the National Long Term Care Survey where we explore types of disability; (ii) abstracts and bibliographies from articles published in The Proceedings of the National Academy of Sciences. In the first application we use a Monte Carlo Markov chain implementation for sampling from the posterior distribution. In the second application, because of the size and complexity of the data base, we use a variational approximation to the posterior. We also include a guide to other applications of mixed membership modeling.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • BARNARD, K., DUYGULU, P., FORSYTH, D., de FREITAS, N., BLEI, D. M. and JORDAN, M. I. (2003): Matching words and pictures. Journal of Machine Learning Research, 3, 1107–1135.

    Article  MATH  Google Scholar 

  • BLEI, D. M. and JORDAN, M. I. (2003a): Modeling annotated data. Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM Press, 127–134.

    Google Scholar 

  • BLEI, D. M., JORDAN, M. I. and NG, A. Y. (2003b): Latent Dirichlet models for application in information retrieval. In J. Bernardo, et al. eds., Bayesian Statistics 7. Proceedings of the Seventh Valencia International Meeting, Oxford University Press, Oxford, 25–44.

    Google Scholar 

  • BLEI, D. M., NG, A. Y. and JORDAN, M. I. (2003c): Latent Dirichlet allocation. Journal of Machine Learning Research, 3, 993–1002.

    MATH  Google Scholar 

  • BRANDTBERG, T. (2002): Individual tree-based species classification in high spatial resolution aerial images of forests using fuzzy sets. Fuzzy Sets and Systems, 132, 371–387.

    Article  MATH  MathSciNet  Google Scholar 

  • COHN, D. and HOFMANN, T. (2001): The missing link: A probabilistic model of document content and hypertext connectivity. Neural Information Processing Systems (NIPS*13), MIT Press.

    Google Scholar 

  • COOIL, B. and VARKI, S. (2003): Using the conditional Grade-of-Membership model to assess judgment accuracy. Psychometrika, 68, 453–471.

    Article  MathSciNet  Google Scholar 

  • DENISON, D.G.T., HOLMES, C.C., MALLICK, B.K., and SMITH, A.F.M. (2002): Bayesian Methods for Nonlinear Classification and Regression. Wiley, New York.

    MATH  Google Scholar 

  • EROSHEVA, E. A. (2002): Grade of Membership and Latent Structure Models With Applicsation to Disability Survey Data. Ph.D. Dissertation, Department of Statistics, Carnegie Mellon University. PhD thesis, Carnegie Mellon University.

    Google Scholar 

  • EROSHEVA, E. A. (2003a): Bayesian estimation of the Grade of Membership Model. In J. Bernardo et al. (Eds.): Bayesian Statistics 7. Proceedings of the Seventh Valencia International Meeting, Oxford University Press, Oxford, 501–510.

    Google Scholar 

  • EROSHEVA, E. A. (2003b): Partial Membership Models With Application to Disability Survey Data In H. Bozdogan (Ed.): New Frontiers of Statistical Data Mining, Knowledge Discovery, and E-Business, CRC Press, Boca Raton, 117–134.

    Google Scholar 

  • EROSHEVA, E.A., FIENBERG, S.E. and LAFFERTY, J. (2004): Mixed Membership Models of Scientific Publications. Proceedings of the National Academy of Sciences, in press.

    Google Scholar 

  • GRIFFITHS, T. L. and STEYVERS, M. (2004): Finding scientific topics. Proceedings of the National Academy of Sciences, in press.

    Google Scholar 

  • HOFMANN, T. (2001): Unsupervised learning by probabilistic latent semantic analysis. Machine Learning, 42, 177–196.

    Article  MATH  Google Scholar 

  • KOVTUN, M., AKUSHEVICH, I., MANTON, K.G. and TOLLEY, H.D. (2004a): Grade of membership analysis: Newest development with application to National Long Term Care Survey. Unpublished paper presented at Annual Meeting of Population Association of America (dated March 18, 2004).

    Google Scholar 

  • KOVTUN, M., AKUSHEVICH, I., MANTON, K.G. and TOLLEY, H.D. (2004b): Grade of membership analysis: One possible approach to foundations. Unpublished manuscript.

    Google Scholar 

  • MANTON, K. G., WOODBURY, M. A. and TOLLEY, H. D. (1994): Statistical Applications Using Fuzzy Sets. Wiley, New York.

    MATH  Google Scholar 

  • MINKA, T. P. and LAFFERTY, J., (2002): Expectation-propagation for the generative aspect model. Uncertainty in Artificial Intelligence: Proceedings of the Eighteenth Conference (UAI-2002), Morgan Kaufmann, San Francisco, 352–359.

    Google Scholar 

  • NURMBERG, H.G., WOODBURY, M.A. and BOGENSCHUTZ, M.P. (1999): A mathematical typology analysis of DSM-III-R personality disorder classification: grade of membership technique. Compr Psychiatry, 40, 61–71.

    Article  Google Scholar 

  • POTTHOFF, R. F., MANTON, K. G. and WOODBURY, M. A., (2000): Dirichlet generalizations of latent-class models. Journal of Classification, 17, 315–353.

    Article  MathSciNet  MATH  Google Scholar 

  • PRITCHARD, J. K., STEPHENS, M. and DONNELLY, P., (2000): Inference of population structure using multilocus genotype data. Genetics, 155, 945–959.

    Google Scholar 

  • ROSENBERG, N. A., PRITCHARD, J. K., WEBER, J. L., CANN, H. M., KIDD, K. K., ZHIVOTOVSKY, L. A. and FELDMAN, M. W. (2002): Genetic structure of human populations. Science, 298, 2381–2385.

    Article  Google Scholar 

  • SEETHARAMAN, P.B., FEINBERG, F.M. and CHINTGUNTA, P.K. (2001): Product line management as dynamic, attribute-level competition. Unpublished manuscript.

    Google Scholar 

  • SPIEGELHALTER, D. J., BEST, N. G., CARLIN, B. P. and VAN DER LINDE, A. (2002) Bayesian measures of model complexity and fit. Journal of the Royal Statistical Society, Series B, Methodological, 64, 1–34.

    Article  Google Scholar 

  • TALBOT, B.G., WHITEHEAD, B.B. and TALBOT, L.M. (2002): Metric Estimation via a Fuzzy Grade-of-Membership Model Applied to Analysis of Business Opportunities. 14th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2002, 431–437.

    Google Scholar 

  • TALBOT, L.M. (1996): A Statistical Fuzzy Grade-of-Membership Approach to Unsupervised Data Clustering with Application to Remote Sensing. Unpublished Ph.D. dissertation, Department of Electrical and Computer Engineering, Brigham Young University.

    Google Scholar 

  • VARKI, S. and CHINTAGUNTA, K. (2003): The augmented latent class model: Incorporating additional heterogeneity in the latent class model for panel data. Journal of Marketing Research, forthcoming.

    Google Scholar 

  • VARKI, S., COOIL, B. and RUST, R.T. (2000): Modeling Fuzzy Data in Qualitative Marketing Research. Journal of Marketing Research, XXXVII, 480–489.

    Article  Google Scholar 

  • WOODBURY, M. A. and CLIVE, J. (1974): Clinical pure types as a fuzzy partition. Journal of Cybernetics, 4, 111–121.

    Article  Google Scholar 

  • WOODBURY, M. A., CLIVE, J. and GARSON, A. (1978): Mathematical typology: A Grade of Membership technique for obtaining disease definition. Computers and Biomedical Research, 11, 277–298.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin · Heidelberg

About this paper

Cite this paper

Erosheva, E.A., Fienberg, S.E. (2005). Bayesian Mixed Membership Models for Soft Clustering and Classification. In: Weihs, C., Gaul, W. (eds) Classification — the Ubiquitous Challenge. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-28084-7_2

Download citation

Publish with us

Policies and ethics