Skip to main content
Log in

Amazon Mechanical Turk in Organizational Psychology: An Evaluation and Practical Recommendations

  • Original Paper
  • Published:
Journal of Business and Psychology Aims and scope Submit manuscript

Abstract

Purpose

Amazon Mechanical Turk is an increasingly popular data source in the organizational psychology research community. This paper presents an evaluation of MTurk and provides a set of practical recommendations for researchers using MTurk.

Design/Methodology/Approach

We present an evaluation of methodological concerns related to the use of MTurk and potential threats to validity inferences. Based on our evaluation, we also provide a set of recommendations to strengthen validity inferences using MTurk samples.

Findings

Although MTurk samples can overcome some important validity concerns, there are other limitations researchers must consider in light of their research objectives. Researchers should carefully evaluate the appropriateness and quality of MTurk samples based on the different issues we discuss in our evaluation.

Implications

There is not a one-size-fits-all answer to whether MTurk is appropriate for a research study. The answer depends on the research questions and the data collection and analytic procedures adopted. The quality of the data is not defined by the data source per se, but rather the decisions researchers make during the stages of study design, data collection, and data analysis.

Originality/Value

The current paper extends the literature by evaluating MTurk in a more comprehensive manner than in prior reviews. Past review papers focused primarily on internal and external validity, with less attention paid to statistical conclusion and construct validity—which are equally important in making accurate inferences about research findings. This paper also provides a set of practical recommendations in addressing validity concerns when using MTurk.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Notes

  1. The ten methodological concerns are not presented in any particular order that indicates the importance or prevalence of each concern.

  2. In our own data collections, we have allowed MTurk participants a maximum of two attempts and have received positive reviews from participants about offering them a second chance.

References

  • Aguinis, H., & Lawal, S. O. (2012). Conducting field experiments using eLancing’s natural environment. Journal of Business Venturing, 27, 493–505.

    Article  Google Scholar 

  • Aguinis, H., & Lawal, S. O. (2013). eLancing: A review and research agenda for bridging the science-practice gap. Human Resource Management Review, 23, 6–17.

    Article  Google Scholar 

  • Antin, J., & Shaw, A. (2012). Social desirability bias and self-reports of motivation: A study of Amazon Mechanical Turk in the US and India. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI’12, (pp. 2925–2934).

  • Aust, F., Diedenhofen, B., Ullrich, S., & Musch, J. (2013). Seriousness checks are useful to improve data validity in online research. Behavior Research Methods, 45, 527–535.

    Article  PubMed  Google Scholar 

  • Barger, T., Behrend, T. S., Sharek, D. J., & Sinar, E. F. (2011). I-O and the crowd: Frequently asked questions about using Mechanical Turk for research. The Industrial–Organizational Psychologist, 49, 11–18.

    Google Scholar 

  • Behrend, T. S., Sharek, D. S., Meade, A. W., & Wiebe, E. N. (2011). The viability of crowdsourcing for survey research. Behavior Research Methods, 43, 800–813.

    Article  PubMed  Google Scholar 

  • Bergman, M. E., & Jean, V. A. (2016). Where have all the “workers” gone? A critical analysis the unrepresentativeness of our samples relative to the labor market in the industrial–organizational psychology literature. Industrial and Organizational Psychology: Perspectives on Science and Practice, 9, 84–113.

    Article  Google Scholar 

  • Bergvall-Kareborn, B., & Howcroft, D. (2015). Amazon Mechanical Turk and the commodification of labor. New Technology, Work and Employment, 29, 213–223.

    Article  Google Scholar 

  • Berinsky, A. J., Huber, G. A., & Lenz, G. S. (2012). Evaluating online labor markets for experimental research: Amazon.com’s Mechanical Turk. Political Analysis, 20, 351–368.

    Article  Google Scholar 

  • Buhrmester, M., Kwang, T., & Gosling, S. D. (2011). Amazon’s Mechanical Turk: A new source of inexpensive, yet high-quality, data? Perspectives on Psychological Science, 6, 3–5.

    Article  PubMed  Google Scholar 

  • Callison-Burch, C., & Dredze, M. (2010). Creating speech and language data with Amazon’s Mechanical Turk. In Proceedings of the NAACL HLT (pp. 1–12).

  • Chandler, J., Mueller, P., & Paolacci, G. (2014). Nonnaïveté among Amazon Mechanical Turk workers: Consequences and solutions for behavioral researchers. Behavior Research Methods, 46, 112–130.

    Article  PubMed  Google Scholar 

  • Chandler, J., Paolacci, G., Peer, E., Mueller, P., & Ratliff, K. A. (2015). Using nonnative participants can reduce effect sizes. Psychological Science, 26, 1131–1139.

    Article  PubMed  Google Scholar 

  • Crump, M. J. C., McDonnell, J. V., & Gureckis, T. M. (2013). Evaluating Amazon’s Mechanical Turk as a tool for experimental behavioral research. PLoS One, 8, e57410.

    Article  PubMed  PubMed Central  Google Scholar 

  • DeSimone, J. A., Harms, P. D., & DeSimone, A. J. (2015). Best practice recommendations for data screening. Journal of Organizational Behavior, 36, 171–181.

    Article  Google Scholar 

  • Fleischer, A., Mead, A. D., & Huang, J. (2015). Inattentive responding in MTurk and other online samples. Industrial and Organizational Psychology: Perspectives on Science and Practice, 8, 196–202.

    Article  Google Scholar 

  • Harms, P. D., & DeSimone, J. A. (2015). Caution! MTurk workers ahead—Fines doubled. Industrial and Organizational Psychology: Perspectives on Science and Practice, 8, 183–190.

    Article  Google Scholar 

  • Hauser, D. J., & Schwarz, N. (2016). Attentive Turkers: MTurk participants perform better on online attention checks than subject pool participants. Behavior Research Methods, 48, 400–407.

    Article  PubMed  Google Scholar 

  • Henrich, J., Heine, S. J., & Norenzayan, A. (2010). The weirdest people in the world? Behavioral and Brain Sciences, 33, 61–135.

    Article  PubMed  Google Scholar 

  • Highhouse, S., & Zhang, D. (2015). The new fruit fly for applied psychological research. Industrial and Organizational Psychology: Perspectives on Science and Practice, 8, 179–183.

    Article  Google Scholar 

  • Horton, J. J., Rand, D. G., & Zeckhauser, R. J. (2011). The online laboratory: Conducting experiments in a real labor market. Experimental Economics, 14, 399–425.

    Article  Google Scholar 

  • Huang, J. L., Bowling, N. A., Liu, M., & Li, Y. (2015a). Detecting insufficient effort responding with an infrequency scale: Evaluating validity and participant reactions. Journal of Business and Psychology, 30, 299–311.

    Article  Google Scholar 

  • Huang, J. L., Curran, P. G., Keeney, J., Poposki, E. M., & DeShon, R. P. (2012). Detecting and deterring insufficient effort responding to surveys. Journal of Business and Psychology, 27, 99–114.

    Article  Google Scholar 

  • Huang, J. L., Liu, M., & Bowling, N. A. (2015b). Insufficient effort responding: Examining an insidious confound in survey data. Journal of Applied Psychology, 100, 828–845.

    Article  PubMed  Google Scholar 

  • Hunter, J. E., Schmidt, F. L., & Le, H. (2006). Implications of direct and indirect range restriction for meta-analysis methods and findings. Journal of Applied Psychology, 91, 594–612.

    Article  PubMed  Google Scholar 

  • Ipeirotis, P. G. (2010). Demographics of Mechanical Turk. NYU Working Paper No.; CEDER-10-01. Retrieved from http://ssrn.com/abstract=1585030

  • Kam, C. C. S., & Meyer, J. P. (2015). How careless responding and acquiescence response bias can influence construct dimensionality: The case of job satisfaction. Organizational Research Methods, 18, 512–541.

    Article  Google Scholar 

  • Landers, R. N., & Behrend, T. S. (2015). An inconvenient truth: Arbitrary distinctions between organizational, Mechanical Turk, and other convenience samples. Industrial and Organizational Psychology: Perspectives on Science and Practice, 8, 142–164.

    Article  Google Scholar 

  • Mason, W., & Suri, S. (2012). Conducting behavioral research on Amazon’s Mechanical Turk. Behavior Research Methods, 44, 1–23.

    Article  PubMed  Google Scholar 

  • Matthijsse, S. M., de Leeuw, E. D., & Hox, J. J. (2015). Internet panels, professional respondents, and data quality. Methodology, 11, 81–88.

    Article  Google Scholar 

  • McGonagle, A. K. (2015). Participant motivation: A critical consideration. Industrial and Organizational Psychology: Perspectives on Science and Practice, 8, 208–214.

    Article  Google Scholar 

  • McGonagle, A. K., Huang, J. L., & Walsh, B. M. (2016). Insufficient effort survey responding: An under-appreciated problem in work and organizational health psychology research. Applied Psychology: An International Review, 65, 287–321.

    Article  Google Scholar 

  • McGrath, R. E., Mitchell, M., Kim, B. H., & Hough, L. (2010). Evidence for response bias as a source of error variance in applied assessment. Psychological Bulletin, 136, 450–470.

    Article  PubMed  Google Scholar 

  • Meade, A. W., & Craig, S. B. (2012). Identifying careless responses in survey data. Psychological Methods, 17, 437–455.

    Article  PubMed  Google Scholar 

  • Oppenheimer, D. M., Meyvis, T., & Davidenko, N. (2009). Instructional manipulation checks: Detecting satisficing to increase statistical power. Journal of Experimental Social Psychology, 45, 867–872.

    Article  Google Scholar 

  • Paolacci, G., & Chandler, J. (2014). Inside the turk: Understanding Mechanical Turk as a participant pool. Current Directions in Psychology Science, 23, 184–188.

    Article  Google Scholar 

  • Paolacci, G., Chandler, J., & Ipeirotis, P. (2010). Running experiments on Amazon Mechanical Turk. Judgment and Decision Making, 5, 411–419.

    Google Scholar 

  • Peer, E., Vosgerau, J., & Acquisti, A. (2014). Reputation as a sufficient condition for data quality on Amazon Mechanical Turk. Behavior Research Methods, 46, 1023–1031.

    Article  PubMed  Google Scholar 

  • Podsakoff, P. M., MacKenzie, S. B., Lee, J.-Y., & Podsakoff, N. P. (2003). Common method biases in behavioral research: A critical review of the literature and recommended remedies. Journal of Applied Psychology, 88, 879–903.

    Article  PubMed  Google Scholar 

  • Podsakoff, P. M., MacKenzie, S. B., & Podsakoff, N. P. (2012). Sources of method bias in social science research and recommendations on how to control it. Annual Review of Psychology, 63, 539–569.

    Article  PubMed  Google Scholar 

  • Pollack, J., & Aguinis, H. (2013). 2013 JCR journal rankings. Retrieved from https://drive.google.com/file/d/0B68LcC5lXuedZmpXSWFvcTZNck0/edit

  • Ran, S., Liu, M., Marchiondo, L. A., & Huang, J. L. (2015). Difference in response effort across sample types: Perception or reality? Industrial and Organizational Psychology: Perspectives on Science and Practice, 8, 202–208.

    Article  Google Scholar 

  • Roulin, N. (2015). Don’t throw the baby out with the bathwater: Comparing data quality of crowdsourcing, online panels, and student samples. Industrial and Organizational Psychology: Perspectives on Science and Practice, 8, 190–196.

    Article  Google Scholar 

  • Rouse, S. V. (2015). A reliability analysis of Mechanical Turk data. Computers in Human Behavior, 43, 304–307.

    Article  Google Scholar 

  • Schmidt, G. B. (2015). Fifty days as an MTurk worker: The social and motivational context for Amazon Mechanical Turk Workers. Industrial and Organizational Psychology: Perspectives on Science and Practice, 8, 165–171.

    Article  Google Scholar 

  • Schmitt, N., & Stults, D. M. (1985). Factors defined by negatively keyed items: The result of careless respondents? Applied Psychological Measurement, 9, 367–373.

    Article  Google Scholar 

  • Shadish, W. R., Cook, T. D., & Campbell, D. T. (2002). Experimental and quasi-experimental designs for generalized causal inference. Wadsworth Cengage Learning.

  • Shapiro, D. N., Chandler, J., & Mueller, P. A. (2013). Using Mechanical Turk to study clinical populations. Clinical Psychological Science, 1, 213–220.

    Article  Google Scholar 

  • Smith, N. A., Sabat, I. E., Martinez, L. R., Weaver, K., & Xu, S. (2015). A convenient solution: Using MTurk to sample from hard-to-reach populations. Industrial and Organizational Psychology: Perspectives on Science and Practice, 8, 220–228.

    Article  Google Scholar 

  • Spector, P. E. (2006). Method variance in organizational research. Organizational Research Methods, 9, 221–232.

    Article  Google Scholar 

  • Sprouse, J. (2011). A validation of Amazon Mechanical Turk for the collection of acceptability judgments in linguistic theory. Behavior Research Methods, 43, 155–167.

    Article  PubMed  Google Scholar 

  • Stewart, N., Ungemach, C., Harris, A. J., Bartels, D. M., Newell, B. R., Paolacci, G., & Chandler, J. (2015). The average laboratory samples a population of 7300 Amazon Mechanical Turk workers. Judgment and Decision Making, 10, 479–491.

    Google Scholar 

  • Stone-Romero, E. F. (2011). Research strategies in industrial and organizational psychology: Nonexperimental, quasi-experimental, and randomized experimental research in special purpose and nonspecial purpose settings. In S. Zedeck (Ed.), APA handbook of industrial and organizational psychology (Vol. 1, pp. 37–72). Building and developing the organization Washington, DC: American Psychological Association.

    Google Scholar 

  • Welcome to Requester Help. (n.d.). Retrieved from http://requester.mturk.com/help

  • Woo, S. E., Keith, M., & Thornton, M. A. (2015). Amazon Mechanical Turk for industrial and organizational psychology: Advantages, challenges and practical recommendations. Industrial and Organizational Psychology: Perspectives on Science and Practice, 8, 171–178.

    Article  Google Scholar 

  • Woods, C. M. (2006). Careless responding to reverse-worded items: Implications for confirmatory factor analysis. Journal of Psychopathology and Behavioral Assessment, 28, 186–191.

    Article  Google Scholar 

  • Zhu, X., Barnes-Farrell, J. L., & Dalal, D. K. (2015). Stop apologizing for your samples, start embracing them. Industrial and Organizational Psychology: Perspectives on Science and Practice, 8, 228–232.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Janelle H. Cheung.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (DOCX 48 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Cheung, J.H., Burns, D.K., Sinclair, R.R. et al. Amazon Mechanical Turk in Organizational Psychology: An Evaluation and Practical Recommendations. J Bus Psychol 32, 347–361 (2017). https://doi.org/10.1007/s10869-016-9458-5

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10869-016-9458-5

Keywords

Navigation