ABSTRACT
Analysts synthesize complex, qualitative data to uncover themes and concepts, but the process is time-consuming, cognitively taxing, and automated techniques show mixed success. Crowdsourcing could help this process through on-demand harnessing of flexible and powerful human cognition, but incurs other challenges including limited attention and expertise. Further, text data can be complex, high-dimensional, and ill-structured. We address two major challenges unsolved in prior crowd clustering work: scaffolding expertise for novice crowd workers, and creating consistent and accurate categories when each worker only sees a small portion of the data. To address these challenges we present an empirical study of a two-stage approach to enable crowds to create an accurate and useful overview of a dataset: A) we draw on cognitive theory to assess how re-representing data can shorten and focus the data on salient dimensions; and B) introduce an iterative clustering approach that provides workers a global overview of data. We demonstrate a classification-plus-context approach elicits the most accurate categories at the most useful level of abstraction.
- André, P., Bernstein, M., and Luther, K. Who gives a tweet?: evaluating microblog content value. In Proc. CSCW 2012, 471--474. Google ScholarDigital Library
- André, P., Zhang, H., Kim, J., Chilton, L. B., Dow, S., and Miller, R. Community clustering: Leveraging an academic crowd to form coherent sessions. In Proc. HCOMP 2013.Google Scholar
- Baker, A., van der Hoek, A., Ossher, H., and Petre, M. Guest editors' introduction: Studying professional software design. Software, IEEE 29, 1 (2012), 28--33. Google ScholarDigital Library
- Blei, D., Ng, A., and Jordan, M. Latent dirichlet allocation. Journal of Machine Learning Research 3 (2003), 993--1022. Google ScholarDigital Library
- Blum, A. L., and Langley, P. Selection of relevant features and examples in machine learning. Artificial intelligence 97, 1 (1997). Google ScholarDigital Library
- Bowker, G. C., and Star, S. L. Sorting things out: Classification and its consequences. The MIT Press, 2000. Google ScholarCross Ref
- Chaney, A. J., and Blei, D. M. Visualizing topic models. In Proc. ICWSM 2012.Google Scholar
- Chi, M. T., Feltovich, P. J., and Glaser, R. Categorization and representation of physics problems by experts and novices. Cognitive science 5, 2 (1981), 121--152.Google Scholar
- Chilton, L. B., Little, G., Edge, D., Weld, D., and Landay, J. Cascade: Crowdsourcing taxonomy creation. In Proc. CHI 2013. Google ScholarDigital Library
- Chuang, J., Manning, C. D., and Heer, J. Termite: Visualization techniques for assessing textual topic models. In Proc. AVI 2012. Google ScholarDigital Library
- Chuang, J., Ramage, D., Manning, C., and Heer, J. Interpretation and trust: Designing model-driven visualizations for text analysis. In Proc. CHI 2012, 443--452. Google ScholarDigital Library
- Clapper, J. P., and Bower, G. H. Learning and applying category knowledge in unsupervised domains. Psychology of Learning and Motivation 27 (1991), 65--108.Google ScholarCross Ref
- Endert, A., Fiaux, P., and North, C. Semantic interaction for visual text analytics. In Proc. CHI 2012, 473--482. Google ScholarDigital Library
- Fernandez, A., and Gomez, S. Solving non-uniqueness in agglomerative hierarchical clustering using multidendrograms. Journal of Classification 25, 1 (2008), 43--65. Google ScholarDigital Library
- Fried, L., and Holyoak, K. Induction of category distributions: A framework for classification learning. J Exp Psychol Learn Mem Cogn. 10, 2 (1984), 234.Google ScholarCross Ref
- Gick, M., and Holyoak, K. Schema induction and analogical transfer. Cognitive psychology 15, 1 (1983), 1--38.Google Scholar
- Gomes, R., Welinder, P., Krause, A., and Perona, P. Crowdclustering. In Advances in Neural Information Processing Systems (NIPS 2011).Google Scholar
- Kolbe, R., and Burnett, M. Content-analysis research: An examination of applications with directives for improving research reliability and objectivity. Journal of consumer research (1991), 243--250.Google Scholar
- Kriplean, T., Beschastnikh, I., and McDonald, D. Articulations of wikiwork: uncovering valued work in wikipedia through barnstars. In Proc. CSCW 2008, 47--56. Google ScholarDigital Library
- Lassaline, M., and Murphy, G. Induction and category coherence. Psychonomic Bulletin & Review 3, 1 (1996), 95--99.Google ScholarCross Ref
- Law, E., Settles, B., Snook, A., Surana, H., von Ahn, L., and Mitchell, T. Human computation for attribute and attribute value acquisition. In Proc. FGVC 2011.Google Scholar
- Law, E., Von Ahn, L., Dannenberg, R., and Crawford, M. Tagatune: A game for music and sound annotation. In Proc. ISMIR 2007.Google Scholar
- Marr, D. Vision. W. H. Freeman and Company, San Francisco, 1982.Google Scholar
- Medin, D., and Bettger, J. Presentation order and recognition of categorically related examples. Psychonomic Bulletin & Review 1, 2 (1994), 250--254.Google ScholarCross Ref
- Medin, D. L., and Schaffer, M. M. Context theory of classification learning. Psychological review 85, 3 (1978), 207.Google Scholar
- Robertson, S., Vojnovic, M., and Weber, I. Rethinking the esp game. In Proc. CHI'09 EA, ACM, 3937--3942. Google ScholarDigital Library
- Rosch, E., Mervis, C. B., Gray, W. D., Johnson, D. M., and Boyes-Braem, P. Basic objects in natural categories. Cognitive psychology 8, 3 (1976), 382--439.Google Scholar
- Russell, D., Stefik, M., Pirolli, P., and Card, S. The cost structure of sensemaking. In Proc. InterCHI 1993, 269--276. Google ScholarDigital Library
- Shafto, P., and Coley, J. D. Development of categorization and reasoning in the natural world: Novices to experts, naive similarity to ecological knowledge. J Exp Psychol Learn Mem Cogn. 29, 4 (2003).Google ScholarCross Ref
- Snow, R., O'Connor, B., Jurafsky, D., and Ng, A. Cheap and fast--but is it good?: evaluating non-expert annotations for natural language tasks. In Proc. EMNLP 2008, 254--263. Google ScholarDigital Library
- Sorokin, A., and Forsyth, D. Utility data annotation with amazon mechanical turk. In Proc. CVPR Workshops 2008, 1--8.Google ScholarCross Ref
- Strehl, A., and Ghosh, J. Cluster ensembles--a knowledge reuse framework for combining multiple partitions. J. Mach. Learn. Res. 3 (2003), 583--617. Google ScholarDigital Library
- Suchman, L. Do categories have politics? the language/action perspective reconsidered. In Proc. ECSCW 1993, 1--14. Google ScholarDigital Library
- Tamuz, O., Liu, C., Belongie, S., Shamir, O., and Kalai, A. Adaptively learning the crowd kernel. In Proc. ICML 2011.Google Scholar
- Tanaka, J. W., and Taylor, M. Object categories and expertise: Is the basic level in the eye of the beholder? Cognitive psychology 23, 3 (1991), 457--482.Google Scholar
- Thompson, B. Planned versus unplanned and orthogonal versus nonorthogonal contrasts: The neo-classical perspective. In Advances in Social Science Methodology, B. Thompson, Ed. JAI Press, 1994.Google Scholar
- Treisman, A., and Gelade, G. A feature-integration theory of attention. Cognitive psychology 12, 1 (1980), 97--136.Google Scholar
- Von Ahn, L., and Dabbish, L. Labeling images with a computer game. In Proc. CHI 2004, 319--326. Google ScholarDigital Library
- Wang, Y.-C., Burke, M., and Kraut, R. E. Gender, topic, and audience response: an analysis of user-generated content on facebook. In Proc. CHI 2013, 31--34. Google ScholarDigital Library
- Willett, W., Heer, J., and Agrawala, M. Strategies for crowdsourcing social data analysis. In Proc. CHI 2012, 227--236. Google ScholarDigital Library
- Yi, J., Jin, R., Jain, A., and Jain, S. Crowdclustering with sparse pairwise labels: A matrix completion approach. In Proc. HCOMP 2012.Google Scholar
Index Terms
- Crowd synthesis: extracting categories and clusters from complex data
Recommendations
Robust crowd behaviors for distributed simulations
SCSC '08: Proceedings of the 2008 Summer Computer Simulation ConferenceCrowds play a significant role in military and civil authority operations. It is clear that there is a need for models and simulations that exhibit realistic, culturally-differentiable crowd behaviors suitable for training, experimentation, or analysis. ...
Estimation of crowd density by clustering motion cues
Understanding crowd behavior using automated video analytics is a relevant research problem in recent times due to complex challenges in monitoring large gatherings. From an automated video surveillance perspective, estimation of crowd density in ...
The Propagation of Psychological Variables in Crowd: Simulation Results
AMS '12: Proceedings of the 2012 Sixth Asia Modelling SymposiumGroup behavior that appears in a massive crowd can result in some uncontrolled actions that could endanger not only the members of the crowd, but also its surrounding environment. This unpredictable behavior is a result of the spread of influence and ...
Comments