ABSTRACT
Crowdsourcing can provide a platform for evaluating software engineering research. In this paper, we aim to explore characteristics of the worker population on Amazon's Mechanical Turk, a popular microtask crowdsourcing environment, and measure the percentage of workers who are potentially qualified to perform software- or computer science-related tasks. Through a baseline survey and two replications, we measure workers' answer consistency as well as the consistency of sample characteristics. In the end, we deployed 1,200 total surveys that were completed by 1,064 unique workers. Our results show that 24% of the study participants have a computer science or IT background and most people are payment driven when choosing tasks. The sample characteristics can vary significantly, even on large samples with 300 participants. Additionally, we often observed inconsistency in workers' answers for those who completed two surveys; approximately 30% answered at least one question inconsistently between the two survey submissions. This implies a need for replication and quality controls in crowdsourced experiments.
- R. A. Cochran, L. D'Antoni, B. Livshits, D. Molnar, and M. Veanes. Program boosting: Program synthesis via crowd-sourcing. In Proceedings of the 42Nd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL '15, pages 677--688, New York, NY, USA, 2015. ACM. Google ScholarDigital Library
- E. Dolstra, R. Vliegendhart, and J. Pouwelse. Crowdsourcing gui tests. In Software Testing, Verification and Validation (ICST), 2013 IEEE Sixth International Conference on, pages 332--341, March 2013. Google ScholarDigital Library
- J. S. Downs, M. B. Holbrook, S. Sheng, and L. F. Cranor. Are your participants gaming the system?: screening Mechanical Turk workers. In Proceedings of the 28th international conference on Human factors in computing systems, 2010. Google ScholarDigital Library
- A. Finnerty, P. Kucherbaev, S. Tranquillini, and G. Convertino. Keep it simple: Reward and task design in crowdsourcing. In Proceedings of the Biannual Conference of the Italian Chapter of SIGCHI, CHItaly '13, pages 14:1--14:4, New York, NY, USA, 2013. ACM. Google ScholarDigital Library
- Z. P. Fry, B. Landau, and W. Weimer. A human study of patch maintainability. In Proceedings of the 2012 International Symposium on Software Testing and Analysis, ISSTA 2012, pages 177--187, New York, NY, USA, 2012. ACM. Google ScholarDigital Library
- Z. P. Fry and W. Weimer. A human study of fault localization accuracy. In Proceedings of the 2010 IEEE International Conference on Software Maintenance, ICSM '10, pages 1--10, Washington, DC, USA, 2010. IEEE Computer Society. Google ScholarDigital Library
- O. S. GÃşmez, N. Juristo, and S. Vegas. Understanding replication of experiments in software engineering: A classification. Information and Software Technology, 56(8):1033--1048, 2014.Google ScholarCross Ref
- S.-H. Kim, H. Yun, and J. S. Yi. How to filter out random clickers in a crowdsourcing-based study? In Proceedings of the 2012 BELIV Workshop: Beyond Time and Errors - Novel Evaluation Methods for Visualization, BELIV '12, pages 15:1--15:7, New York, NY, USA, 2012. ACM. Google ScholarDigital Library
- A. Kittur, E. H. Chi, and B. Suh. Crowdsourcing user studies with mechanical turk. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI '08, pages 453--456, New York, NY, USA, 2008. ACM. Google ScholarDigital Library
- T. D. LaToza, W. B. Towne, C. M. Adriano, and A. van der Hoek. Microtask programming: Building software with a crowd. In Proceedings of the 27th Annual ACM Symposium on User Interface Software and Technology, UIST '14, pages 43--54, New York, NY, USA, 2014. ACM. Google ScholarDigital Library
- W. Mason and D. J. Watts. Financial incentives and the performance of crowds. ACM SigKDD Explorations Newsletter, 11(2):100--108, 2010. Google ScholarDigital Library
- M. Nebeling, M. Speicher, and M. C. Norrie. Crowdstudy: General toolkit for crowdsourced evaluation of web interfaces. In Proceedings of the 5th ACM SIGCHI Symposium on Engineering Interactive Computing Systems, EICS '13, pages 255--264, New York, NY, USA, 2013. ACM. Google ScholarDigital Library
- G. Paolacci, J. Chandler, and P. G. Ipeirotis. Running experiments on amazon mechanical turk. Judgment and Decision making, 5(5):411--419, 2010.Google ScholarCross Ref
- J. Rogstadius, V. Kostakos, A. Kittur, B. Smus, J. Laredo, and M. Vukovic. An assessment of intrinsic and extrinsic motivation on task performance in crowdsourcing markets. In ICWSM, 2011.Google Scholar
- J. Ross, L. Irani, M. S. Silberman, A. Zaldivar, and B. Tomlinson. Who are the crowdworkers?: shifting demographics in Mechanical Turk. In Proceedings of the 28th of the international conference extended abstracts on Human factors in computing systems, 2010. Google ScholarDigital Library
- T. W. Schiller and M. D. Ernst. Reducing the barriers to writing verified specifications. SIGPLAN Not., 47(10):95--112, Oct. 2012. Google ScholarDigital Library
- N. Stewart, C. Ungemach, A. J. Harris, D. M. Bartels, B. R. Newell, G. Paolacci, J. Chandler, et al. The average laboratory samples a population of 7,300 amazon mechanical turk workers. Judgment and Decision Making, 10(5):479--491, 2015.Google ScholarCross Ref
- K. T. Stolee and S. Elbaum. Exploring the use of crowdsourcing to support empirical studies in software engineering. In International Symposium on Empirical Software Engineering and Measurement, 2010. Google ScholarDigital Library
- K. T. Stolee and S. Elbaum. Refactoring pipe-like mashups for end-user programmers. In International Conference on Software Engineering, 2011. Google ScholarDigital Library
- K. T. Stolee, S. Elbaum, and D. Dobos. Solving the search for source code. ACM Trans. Softw. Eng. Methodol., 23(3):26:1--26:45, June 2014. Google ScholarDigital Library
- M. Yin, Y. Chen, and Y.-A. Sun. Monetary interventions in crowdsourcing task switching. In Second AAAI Conference on Human Computation and Crowdsourcing, 2014.Google Scholar
Recommendations
Mechanical turk as an ontology engineer?: using microtasks as a component of an ontology-engineering workflow
WebSci '13: Proceedings of the 5th Annual ACM Web Science ConferenceOntology evaluation has proven to be one of the more difficult problems in ontology engineering. Researchers proposed numerous methods to evaluate logical correctness of an ontology, its structure, or coverage of a domain represented by a corpus. ...
Evaluating the accessibility of crowdsourcing tasks on Amazon's mechanical turk
ASSETS '14: Proceedings of the 16th international ACM SIGACCESS conference on Computers & accessibilityCrowd work web sites such as Amazon Mechanical Turk enable individuals to work from home, which may be useful for people with disabilities. However, the web sites for finding and performing crowd work tasks must be accessible if people with disabilities ...
Turk-Life in India
GROUP '14: Proceedings of the 2014 ACM International Conference on Supporting Group WorkPrevious studies on Amazon Mechanical Turk (AMT), the most well-known marketplace for microtasks, show that the largest population of workers on AMT is U.S. based, while the second largest is based in India. In this paper, we present insights from an ...
Comments