skip to main content
10.1145/2531602.2531653acmconferencesArticle/Chapter ViewAbstractPublication PagescscwConference Proceedingsconference-collections
research-article

Crowd synthesis: extracting categories and clusters from complex data

Published:15 February 2014Publication History

ABSTRACT

Analysts synthesize complex, qualitative data to uncover themes and concepts, but the process is time-consuming, cognitively taxing, and automated techniques show mixed success. Crowdsourcing could help this process through on-demand harnessing of flexible and powerful human cognition, but incurs other challenges including limited attention and expertise. Further, text data can be complex, high-dimensional, and ill-structured. We address two major challenges unsolved in prior crowd clustering work: scaffolding expertise for novice crowd workers, and creating consistent and accurate categories when each worker only sees a small portion of the data. To address these challenges we present an empirical study of a two-stage approach to enable crowds to create an accurate and useful overview of a dataset: A) we draw on cognitive theory to assess how re-representing data can shorten and focus the data on salient dimensions; and B) introduce an iterative clustering approach that provides workers a global overview of data. We demonstrate a classification-plus-context approach elicits the most accurate categories at the most useful level of abstraction.

References

  1. André, P., Bernstein, M., and Luther, K. Who gives a tweet?: evaluating microblog content value. In Proc. CSCW 2012, 471--474. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. André, P., Zhang, H., Kim, J., Chilton, L. B., Dow, S., and Miller, R. Community clustering: Leveraging an academic crowd to form coherent sessions. In Proc. HCOMP 2013.Google ScholarGoogle Scholar
  3. Baker, A., van der Hoek, A., Ossher, H., and Petre, M. Guest editors' introduction: Studying professional software design. Software, IEEE 29, 1 (2012), 28--33. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Blei, D., Ng, A., and Jordan, M. Latent dirichlet allocation. Journal of Machine Learning Research 3 (2003), 993--1022. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Blum, A. L., and Langley, P. Selection of relevant features and examples in machine learning. Artificial intelligence 97, 1 (1997). Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Bowker, G. C., and Star, S. L. Sorting things out: Classification and its consequences. The MIT Press, 2000. Google ScholarGoogle ScholarCross RefCross Ref
  7. Chaney, A. J., and Blei, D. M. Visualizing topic models. In Proc. ICWSM 2012.Google ScholarGoogle Scholar
  8. Chi, M. T., Feltovich, P. J., and Glaser, R. Categorization and representation of physics problems by experts and novices. Cognitive science 5, 2 (1981), 121--152.Google ScholarGoogle Scholar
  9. Chilton, L. B., Little, G., Edge, D., Weld, D., and Landay, J. Cascade: Crowdsourcing taxonomy creation. In Proc. CHI 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Chuang, J., Manning, C. D., and Heer, J. Termite: Visualization techniques for assessing textual topic models. In Proc. AVI 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Chuang, J., Ramage, D., Manning, C., and Heer, J. Interpretation and trust: Designing model-driven visualizations for text analysis. In Proc. CHI 2012, 443--452. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Clapper, J. P., and Bower, G. H. Learning and applying category knowledge in unsupervised domains. Psychology of Learning and Motivation 27 (1991), 65--108.Google ScholarGoogle ScholarCross RefCross Ref
  13. Endert, A., Fiaux, P., and North, C. Semantic interaction for visual text analytics. In Proc. CHI 2012, 473--482. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Fernandez, A., and Gomez, S. Solving non-uniqueness in agglomerative hierarchical clustering using multidendrograms. Journal of Classification 25, 1 (2008), 43--65. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Fried, L., and Holyoak, K. Induction of category distributions: A framework for classification learning. J Exp Psychol Learn Mem Cogn. 10, 2 (1984), 234.Google ScholarGoogle ScholarCross RefCross Ref
  16. Gick, M., and Holyoak, K. Schema induction and analogical transfer. Cognitive psychology 15, 1 (1983), 1--38.Google ScholarGoogle Scholar
  17. Gomes, R., Welinder, P., Krause, A., and Perona, P. Crowdclustering. In Advances in Neural Information Processing Systems (NIPS 2011).Google ScholarGoogle Scholar
  18. Kolbe, R., and Burnett, M. Content-analysis research: An examination of applications with directives for improving research reliability and objectivity. Journal of consumer research (1991), 243--250.Google ScholarGoogle Scholar
  19. Kriplean, T., Beschastnikh, I., and McDonald, D. Articulations of wikiwork: uncovering valued work in wikipedia through barnstars. In Proc. CSCW 2008, 47--56. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Lassaline, M., and Murphy, G. Induction and category coherence. Psychonomic Bulletin & Review 3, 1 (1996), 95--99.Google ScholarGoogle ScholarCross RefCross Ref
  21. Law, E., Settles, B., Snook, A., Surana, H., von Ahn, L., and Mitchell, T. Human computation for attribute and attribute value acquisition. In Proc. FGVC 2011.Google ScholarGoogle Scholar
  22. Law, E., Von Ahn, L., Dannenberg, R., and Crawford, M. Tagatune: A game for music and sound annotation. In Proc. ISMIR 2007.Google ScholarGoogle Scholar
  23. Marr, D. Vision. W. H. Freeman and Company, San Francisco, 1982.Google ScholarGoogle Scholar
  24. Medin, D., and Bettger, J. Presentation order and recognition of categorically related examples. Psychonomic Bulletin & Review 1, 2 (1994), 250--254.Google ScholarGoogle ScholarCross RefCross Ref
  25. Medin, D. L., and Schaffer, M. M. Context theory of classification learning. Psychological review 85, 3 (1978), 207.Google ScholarGoogle Scholar
  26. Robertson, S., Vojnovic, M., and Weber, I. Rethinking the esp game. In Proc. CHI'09 EA, ACM, 3937--3942. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Rosch, E., Mervis, C. B., Gray, W. D., Johnson, D. M., and Boyes-Braem, P. Basic objects in natural categories. Cognitive psychology 8, 3 (1976), 382--439.Google ScholarGoogle Scholar
  28. Russell, D., Stefik, M., Pirolli, P., and Card, S. The cost structure of sensemaking. In Proc. InterCHI 1993, 269--276. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Shafto, P., and Coley, J. D. Development of categorization and reasoning in the natural world: Novices to experts, naive similarity to ecological knowledge. J Exp Psychol Learn Mem Cogn. 29, 4 (2003).Google ScholarGoogle ScholarCross RefCross Ref
  30. Snow, R., O'Connor, B., Jurafsky, D., and Ng, A. Cheap and fast--but is it good?: evaluating non-expert annotations for natural language tasks. In Proc. EMNLP 2008, 254--263. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Sorokin, A., and Forsyth, D. Utility data annotation with amazon mechanical turk. In Proc. CVPR Workshops 2008, 1--8.Google ScholarGoogle ScholarCross RefCross Ref
  32. Strehl, A., and Ghosh, J. Cluster ensembles--a knowledge reuse framework for combining multiple partitions. J. Mach. Learn. Res. 3 (2003), 583--617. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Suchman, L. Do categories have politics? the language/action perspective reconsidered. In Proc. ECSCW 1993, 1--14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Tamuz, O., Liu, C., Belongie, S., Shamir, O., and Kalai, A. Adaptively learning the crowd kernel. In Proc. ICML 2011.Google ScholarGoogle Scholar
  35. Tanaka, J. W., and Taylor, M. Object categories and expertise: Is the basic level in the eye of the beholder? Cognitive psychology 23, 3 (1991), 457--482.Google ScholarGoogle Scholar
  36. Thompson, B. Planned versus unplanned and orthogonal versus nonorthogonal contrasts: The neo-classical perspective. In Advances in Social Science Methodology, B. Thompson, Ed. JAI Press, 1994.Google ScholarGoogle Scholar
  37. Treisman, A., and Gelade, G. A feature-integration theory of attention. Cognitive psychology 12, 1 (1980), 97--136.Google ScholarGoogle Scholar
  38. Von Ahn, L., and Dabbish, L. Labeling images with a computer game. In Proc. CHI 2004, 319--326. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Wang, Y.-C., Burke, M., and Kraut, R. E. Gender, topic, and audience response: an analysis of user-generated content on facebook. In Proc. CHI 2013, 31--34. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Willett, W., Heer, J., and Agrawala, M. Strategies for crowdsourcing social data analysis. In Proc. CHI 2012, 227--236. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Yi, J., Jin, R., Jain, A., and Jain, S. Crowdclustering with sparse pairwise labels: A matrix completion approach. In Proc. HCOMP 2012.Google ScholarGoogle Scholar

Index Terms

  1. Crowd synthesis: extracting categories and clusters from complex data

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      CSCW '14: Proceedings of the 17th ACM conference on Computer supported cooperative work & social computing
      February 2014
      1600 pages
      ISBN:9781450325400
      DOI:10.1145/2531602

      Copyright © 2014 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 15 February 2014

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      CSCW '14 Paper Acceptance Rate134of497submissions,27%Overall Acceptance Rate1,986of7,449submissions,27%

      Upcoming Conference

      CSCW '24

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader