research-article

Crowd synthesis: extracting categories and clusters from complex data

Authors:
Paul André

Carnegie Mellon University, Pittsburgh, PA, USA

Carnegie Mellon University, Pittsburgh, PA, USA
View Profile

,
Aniket Kittur

Carnegie Mellon University, Pittsburgh, PA, USA

Carnegie Mellon University, Pittsburgh, PA, USA
View Profile

,
Steven P. Dow

Carnegie Mellon University, Pittsburgh, PA, USA

Carnegie Mellon University, Pittsburgh, PA, USA
View Profile

CSCW '14: Proceedings of the 17th ACM conference on Computer supported cooperative work & social computingFebruary 2014Pages 989–998https://doi.org/10.1145/2531602.2531653

Published:15 February 2014Publication History

CSCW '14: Proceedings of the 17th ACM conference on Computer supported cooperative work & social computing

Pages 989–998

ABSTRACT

Analysts synthesize complex, qualitative data to uncover themes and concepts, but the process is time-consuming, cognitively taxing, and automated techniques show mixed success. Crowdsourcing could help this process through on-demand harnessing of flexible and powerful human cognition, but incurs other challenges including limited attention and expertise. Further, text data can be complex, high-dimensional, and ill-structured. We address two major challenges unsolved in prior crowd clustering work: scaffolding expertise for novice crowd workers, and creating consistent and accurate categories when each worker only sees a small portion of the data. To address these challenges we present an empirical study of a two-stage approach to enable crowds to create an accurate and useful overview of a dataset: A) we draw on cognitive theory to assess how re-representing data can shorten and focus the data on salient dimensions; and B) introduce an iterative clustering approach that provides workers a global overview of data. We demonstrate a classification-plus-context approach elicits the most accurate categories at the most useful level of abstraction.

References

André, P., Bernstein, M., and Luther, K. Who gives a tweet?: evaluating microblog content value. In Proc. CSCW 2012, 471--474. Google ScholarDigital Library
André, P., Zhang, H., Kim, J., Chilton, L. B., Dow, S., and Miller, R. Community clustering: Leveraging an academic crowd to form coherent sessions. In Proc. HCOMP 2013.Google Scholar
Baker, A., van der Hoek, A., Ossher, H., and Petre, M. Guest editors' introduction: Studying professional software design. Software, IEEE 29, 1 (2012), 28--33. Google ScholarDigital Library
Blei, D., Ng, A., and Jordan, M. Latent dirichlet allocation. Journal of Machine Learning Research 3 (2003), 993--1022. Google ScholarDigital Library
Blum, A. L., and Langley, P. Selection of relevant features and examples in machine learning. Artificial intelligence 97, 1 (1997). Google ScholarDigital Library
Bowker, G. C., and Star, S. L. Sorting things out: Classification and its consequences. The MIT Press, 2000. Google ScholarCross Ref
Chaney, A. J., and Blei, D. M. Visualizing topic models. In Proc. ICWSM 2012.Google Scholar
Chi, M. T., Feltovich, P. J., and Glaser, R. Categorization and representation of physics problems by experts and novices. Cognitive science 5, 2 (1981), 121--152.Google Scholar
Chilton, L. B., Little, G., Edge, D., Weld, D., and Landay, J. Cascade: Crowdsourcing taxonomy creation. In Proc. CHI 2013. Google ScholarDigital Library
Chuang, J., Manning, C. D., and Heer, J. Termite: Visualization techniques for assessing textual topic models. In Proc. AVI 2012. Google ScholarDigital Library
Chuang, J., Ramage, D., Manning, C., and Heer, J. Interpretation and trust: Designing model-driven visualizations for text analysis. In Proc. CHI 2012, 443--452. Google ScholarDigital Library
Clapper, J. P., and Bower, G. H. Learning and applying category knowledge in unsupervised domains. Psychology of Learning and Motivation 27 (1991), 65--108.Google ScholarCross Ref
Endert, A., Fiaux, P., and North, C. Semantic interaction for visual text analytics. In Proc. CHI 2012, 473--482. Google ScholarDigital Library
Fernandez, A., and Gomez, S. Solving non-uniqueness in agglomerative hierarchical clustering using multidendrograms. Journal of Classification 25, 1 (2008), 43--65. Google ScholarDigital Library
Fried, L., and Holyoak, K. Induction of category distributions: A framework for classification learning. J Exp Psychol Learn Mem Cogn. 10, 2 (1984), 234.Google ScholarCross Ref
Gick, M., and Holyoak, K. Schema induction and analogical transfer. Cognitive psychology 15, 1 (1983), 1--38.Google Scholar
Gomes, R., Welinder, P., Krause, A., and Perona, P. Crowdclustering. In Advances in Neural Information Processing Systems (NIPS 2011).Google Scholar
Kolbe, R., and Burnett, M. Content-analysis research: An examination of applications with directives for improving research reliability and objectivity. Journal of consumer research (1991), 243--250.Google Scholar
Kriplean, T., Beschastnikh, I., and McDonald, D. Articulations of wikiwork: uncovering valued work in wikipedia through barnstars. In Proc. CSCW 2008, 47--56. Google ScholarDigital Library
Lassaline, M., and Murphy, G. Induction and category coherence. Psychonomic Bulletin & Review 3, 1 (1996), 95--99.Google ScholarCross Ref
Law, E., Settles, B., Snook, A., Surana, H., von Ahn, L., and Mitchell, T. Human computation for attribute and attribute value acquisition. In Proc. FGVC 2011.Google Scholar
Law, E., Von Ahn, L., Dannenberg, R., and Crawford, M. Tagatune: A game for music and sound annotation. In Proc. ISMIR 2007.Google Scholar
Marr, D. Vision. W. H. Freeman and Company, San Francisco, 1982.Google Scholar
Medin, D., and Bettger, J. Presentation order and recognition of categorically related examples. Psychonomic Bulletin & Review 1, 2 (1994), 250--254.Google ScholarCross Ref
Medin, D. L., and Schaffer, M. M. Context theory of classification learning. Psychological review 85, 3 (1978), 207.Google Scholar
Robertson, S., Vojnovic, M., and Weber, I. Rethinking the esp game. In Proc. CHI'09 EA, ACM, 3937--3942. Google ScholarDigital Library
Rosch, E., Mervis, C. B., Gray, W. D., Johnson, D. M., and Boyes-Braem, P. Basic objects in natural categories. Cognitive psychology 8, 3 (1976), 382--439.Google Scholar
Russell, D., Stefik, M., Pirolli, P., and Card, S. The cost structure of sensemaking. In Proc. InterCHI 1993, 269--276. Google ScholarDigital Library
Shafto, P., and Coley, J. D. Development of categorization and reasoning in the natural world: Novices to experts, naive similarity to ecological knowledge. J Exp Psychol Learn Mem Cogn. 29, 4 (2003).Google ScholarCross Ref
Snow, R., O'Connor, B., Jurafsky, D., and Ng, A. Cheap and fast--but is it good?: evaluating non-expert annotations for natural language tasks. In Proc. EMNLP 2008, 254--263. Google ScholarDigital Library
Sorokin, A., and Forsyth, D. Utility data annotation with amazon mechanical turk. In Proc. CVPR Workshops 2008, 1--8.Google ScholarCross Ref
Strehl, A., and Ghosh, J. Cluster ensembles--a knowledge reuse framework for combining multiple partitions. J. Mach. Learn. Res. 3 (2003), 583--617. Google ScholarDigital Library
Suchman, L. Do categories have politics? the language/action perspective reconsidered. In Proc. ECSCW 1993, 1--14. Google ScholarDigital Library
Tamuz, O., Liu, C., Belongie, S., Shamir, O., and Kalai, A. Adaptively learning the crowd kernel. In Proc. ICML 2011.Google Scholar
Tanaka, J. W., and Taylor, M. Object categories and expertise: Is the basic level in the eye of the beholder? Cognitive psychology 23, 3 (1991), 457--482.Google Scholar
Thompson, B. Planned versus unplanned and orthogonal versus nonorthogonal contrasts: The neo-classical perspective. In Advances in Social Science Methodology, B. Thompson, Ed. JAI Press, 1994.Google Scholar
Treisman, A., and Gelade, G. A feature-integration theory of attention. Cognitive psychology 12, 1 (1980), 97--136.Google Scholar
Von Ahn, L., and Dabbish, L. Labeling images with a computer game. In Proc. CHI 2004, 319--326. Google ScholarDigital Library
Wang, Y.-C., Burke, M., and Kraut, R. E. Gender, topic, and audience response: an analysis of user-generated content on facebook. In Proc. CHI 2013, 31--34. Google ScholarDigital Library
Willett, W., Heer, J., and Agrawala, M. Strategies for crowdsourcing social data analysis. In Proc. CHI 2012, 227--236. Google ScholarDigital Library
Yi, J., Jin, R., Jain, A., and Jain, S. Crowdclustering with sparse pairwise labels: A matrix completion approach. In Proc. HCOMP 2012.Google Scholar

Index Terms

Crowd synthesis: extracting categories and clusters from complex data
1. Human-centered computing

Recommendations

Robust crowd behaviors for distributed simulations
SCSC '08: Proceedings of the 2008 Summer Computer Simulation Conference

Crowds play a significant role in military and civil authority operations. It is clear that there is a need for models and simulations that exhibit realistic, culturally-differentiable crowd behaviors suitable for training, experimentation, or analysis. ...
Read More
Estimation of crowd density by clustering motion cues

Understanding crowd behavior using automated video analytics is a relevant research problem in recent times due to complex challenges in monitoring large gatherings. From an automated video surveillance perspective, estimation of crowd density in ...
Read More
The Propagation of Psychological Variables in Crowd: Simulation Results
AMS '12: Proceedings of the 2012 Sixth Asia Modelling Symposium

Group behavior that appears in a massive crowd can result in some uncontrolled actions that could endanger not only the members of the crowd, but also its surrounding environment. This unpredictable behavior is a result of the spread of influence and ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CSCW '14: Proceedings of the 17th ACM conference on Computer supported cooperative work & social computing
February 2014
1600 pages
ISBN:9781450325400
DOI:10.1145/2531602
General Chairs:
Susan Fussell
Cornell University
,
Wayne Lutters
University of Maryland, Baltimore County
,
Program Chairs:
Meredith Ringel Morris
Microsoft Research
,
Madhu Reddy
Penn State University
Copyright © 2014 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 15 February 2014
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
categorization
classification
clustering
crowd
synthesis
Qualifiers
- research-article
Conference

Acceptance Rates
CSCW '14 Paper Acceptance Rate134of497submissions,27%Overall Acceptance Rate1,986of7,449submissions,27%
More
Upcoming Conference
CSCW '24

Sponsor:

sigchi

CSCW '24: Computer-Supported Cooperative Work and Social Computing

November 9 - 13, 2024

San Jose , Costa Rica
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 49
  Total Citations
  View Citations
- 609
  Total Downloads
- Downloads (Last 12 months)25
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Crowd synthesis: extracting categories and clusters from complex data

CSCW '14: Proceedings of the 17th ACM conference on Computer supported cooperative work & social computing

ABSTRACT

References

Cited By

Index Terms

Recommendations

Robust crowd behaviors for distributed simulations

Estimation of crowd density by clustering motion cues

The Propagation of Psychological Variables in Crowd: Simulation Results