Abstract
Qualitative methods potentially add depth to prevention research but can produce large amounts of complex data even with small samples. Studies conducted with culturally distinct samples often produce voluminous qualitative data but may lack sufficient sample sizes for sophisticated quantitative analysis. Currently lacking in mixed-methods research are methods allowing for more fully integrating qualitative and quantitative analysis techniques. Cluster analysis can be applied to coded qualitative data to clarify the findings of prevention studies by aiding efforts to reveal such things as the motives of participants for their actions and the reasons behind counterintuitive findings. By clustering groups of participants with similar profiles of codes in a quantitative analysis, cluster analysis can serve as a key component in mixed-methods research. This article reports two studies. In the first study, we conduct simulations to test the accuracy of cluster assignment using three different clustering methods with binary data as produced when coding qualitative interviews. Results indicated that hierarchical clustering, K-means clustering, and latent class analysis produced similar levels of accuracy with binary data and that the accuracy of these methods did not decrease with samples as small as 50. Whereas the first study explores the feasibility of using common clustering methods with binary data, the second study provides a “real-world” example using data from a qualitative study of community leadership connected with a drug abuse prevention project. We discuss the implications of this approach for conducting prevention research, especially with small samples and culturally distinct communities.
Similar content being viewed by others
References
Aldenderfer, M. S., & Blashfield, R. K. (1984). Cluster analysis: A Sage University paper. Beverly Hills: Sage.
Anderberg, M. R. (1973). Cluster analysis for applications: DTIC document.
Campbell, D. T., & Fiske, D. W. (1959). Convergent and discriminant validation by the multitrait-multimethod matrix. Psychological Bulletin, 56, 81–105.
R Core Team (2013). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. http://www.R-project.org/. Accessed 22 Feb 2014.
Dimitriadou, E., Dolnicar, S., & Weingessel, A. (2002). An examination of indexes for determining the number of clusters in binary data sets. Psychometrika, 67, 137–160.
Eshghi, A., Haughton, D., Legrand, P., Skaletsky, M., & Woolford, S. (2011). Identifying groups: A comparison of methodologies. Journal of Data Science, 9, 271–291.
Farrell, A. D., Erwin, E. H., Allison, K., Meyer, A. L., Sullivan, T. N., Camou, S., Esposito, L. E. (2007). Problematic situations in the lives of urban African American middle school students: A qualitative study. Journal of Research on Adolescence, 17, 413-454.
Finch, H. (2005). Comparison of distance measures in cluster analysis with dichotomous data. Journal of Data Science, 3, 85–100.
Hands, S., & Everitt, B. (1987). A Monte Carlo study of the recovery of cluster structure in binary data by hierarchical clustering techniques. Multivariate Behavioral Research, 22, 235–243.
Haughton, D., & Haughton, J. (2011). Chapter 6: Grouping methods. Springer Science+Business Media, LLC, Berlin.
Henry, D., Tolan, P. H., & Gorman-Smith, D. (2005). Cluster analysis in family psychology research. Journal of Family Psychology, 19, 121–132.
IBM Support Portal. (2012). Clustering binary data with K-means (should be avoided). Technote Retrieved March 4, 2013, from http://www-1.ibm.com/support/docview.wss?uid=swg21477401
Jick, T. D. (1979). Mixing qualitative and quantitative methods: Triangulation in action. Administrative Science Quarterly, 24, 602–611.
Kass, R. E., & Raftery, A. E. (1995). Bayes factors. Journal of the American Statistical Association, 90, 773–795.
MacQueen, J. B. (1967). Some methods for classification and analysis of multivariate observations. In L. M. Le Cam & J. Neyman (Eds.), Proceedings of 5th Berkeley symposium on mathematical statistics and probability (Vol. 1, pp. 281–297). Berkeley: University of California Press.
Mandara, J., & Murray, C. B. (2002). Development of an empirical typology of African American family functioning. Journal of Family Psychology, 16, 318.
McCutcheon, A. L. (1987). Latent class analysis. Newbury Park: Sage.
Nguyen, Q. H., & Rayward-Smith, V. J. (2008). Internal quality measures for clustering in metric spaces. International Journal of Business Intelligence and Data Mining, 3, 4–29. doi:10.1504/IJBIDM.2008.017973.
Ordonez, C. (2003). Clustering binary data streams with kmeans. Proceedings of the 8th ACM SIGMOD workshop on research issues in data mining and knowledge discovery, 12-19.
Ostlund, U., Kidd, L., Wengstrom, Y., & Rowa-Dewar, N. (2011). Combining qualitative and quantitative research within mixed method research designs: A methodological review. International Journal of Nursing Studies, 48, 369–383. doi:10.1016/j.ijnurstu.2010.10.005.
Strauss, A., & Corbin, J. (1990). Basics of qualitative research: Grounded theory procedures and techniques. Newbury Park: Sage.
Strauss, A., & Corbin, J. (1998). Basics of qualitative research: Techniques and procedures for developing grounded theory (2nd ed.). Newbury Park: Sage.
Tandon, S. D., Azelton, L. S., Kelly, J. G., & Strickland, D. (1998). Constructing a tree for community leaders: Contexts and processes in collaborative inquiry. American Journal of Community Psychology, 26, 669–696.
Vermunt, J. K., & Magidson, J. (1999). Exploratory latent class cluster, factor, and regression analysis: the Latent GOLD approach. Paper presented at the Proceedings EMPS_99 conference, Lunenburg, Germany.
Yukl, G. (1998). Leadership in organizations (4th ed.). Englewood Cliffs: Prentice-Hall.
Acknowledgments
The authors gratefully acknowledge the contributions of Mary Murray, M.A., who performed the cluster analyses for the original DCP study, Debra Strickland, Former Executive Director of the Developing Communities Project, and the community leaders who participated in the interviews for Study 2. Partial support for this study was provided by grant number R13 DA030834 (PI, Ching Fok, Ph.D.) for the conference, “Advancing Science with Culturally Distinct Communities.”
Conflict of Interest
The authors declare that they have no conflicts of interest.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Henry, D., Dymnicki, A.B., Mohatt, N. et al. Clustering Methods with Qualitative Data: a Mixed-Methods Approach for Prevention Research with Small Samples. Prev Sci 16, 1007–1016 (2015). https://doi.org/10.1007/s11121-015-0561-z
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11121-015-0561-z