research-article

Attention Please: Your Attention Check Questions in Survey Studies Can Be Automatically Answered

Authors:
Weiping Pei

Colorado School of Mines

Colorado School of Mines
View Profile

,
Arthur Mayer

Colorado School of Mines

Colorado School of Mines
View Profile

,
Kaylynn Tu

Colorado School of Mines

Colorado School of Mines
View Profile

,
Chuan Yue

Colorado School of Mines

Colorado School of Mines
View Profile

Authors Info & Claims

WWW '20: Proceedings of The Web Conference 2020April 2020Pages 1182–1193https://doi.org/10.1145/3366423.3380195

Published:20 April 2020Publication History

WWW '20: Proceedings of The Web Conference 2020

Pages 1182–1193

ABSTRACT

Attention check questions have become commonly used in online surveys published on popular crowdsourcing platforms as a key mechanism to filter out inattentive respondents and improve data quality. However, little research considers the vulnerabilities of this important quality control mechanism that can allow attackers including irresponsible and malicious respondents to automatically answer attention check questions for efficiently achieving their goals. In this paper, we perform the first study to investigate such vulnerabilities, and demonstrate that attackers can leverage deep learning techniques to pass attention check questions automatically. We propose AC-EasyPass, an attack framework with a concrete model, that combines convolutional neural network and weighted feature reconstruction to easily pass attention check questions. We construct the first attention check question dataset that consists of both original and augmented questions, and demonstrate the effectiveness of AC-EasyPass. We explore two simple defense methods, adding adversarial sentences and adding typos, for survey designers to mitigate the risks posed by AC-EasyPass; however, these methods are fragile due to their limitations from both technical and usability perspectives, underlining the challenging nature of defense. We hope our work will raise sufficient attention of the research community towards developing more robust attention check mechanisms. More broadly, our work intends to prompt the research community to seriously consider the emerging risks posed by the malicious use of machine learning techniques to the quality, validity, and trustworthiness of crowdsourcing and social computing.

References

Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural Machine Translation by Jointly Learning to Align and Translate. In Proceedings of the International Conference on Learning Representations (ICLR).Google Scholar
Adam J Berinsky, Michele F Margolis, and Michael W Sances. 2014. Separating the shirkers from the workers? Making sure respondents pay attention on self-administered surveys. American Journal of Political Science 58, 3 (2014), 739–753.Google ScholarCross Ref
BotHitMTurk 2018. A Bot Panic Hits Amazon’s Mechanical Turk | WIRED. https://www.wired.com/story/amazon-mechanical-turk-bot-panic/.Google Scholar
Nathan A Bowling, Jason L Huang, Caleb B Bragg, Steve Khazon, Mengqiao Liu, and Caitlin E Blackmore. 2016. Who cares and who is careless? Insufficient effort responding as a reflection of respondent personality.Journal of Personality and Social Psychology 111, 2(2016), 218.Google Scholar
Elie Bursztein, Steven Bethard, Celine Fabry, John C Mitchell, and Dan Jurafsky. 2010. How good are humans at solving CAPTCHAs? A large scale evaluation. In 2010 IEEE symposium on security and privacy. IEEE, 399–413.Google ScholarDigital Library
Alessandro Checco, Jo Bates, and Gianluca Demartini. 2018. All That Glitters Is Gold?An Attack Scheme on Gold Questions in Crowdsourcing. In Proceedings of the AAAI Conference on Human Computation and Crowdsourcing.Google Scholar
Scott Clifford and Jennifer Jerit. 2015. Do attempts to improve respondent attention increase social desirability bias?Public Opinion Quarterly 79, 3 (2015), 790–802.Google Scholar
Paul G Curran. 2016. Methods for the detection of carelessly invalid responses in survey data. Journal of Experimental Social Psychology 66 (2016), 4–19.Google ScholarCross Ref
Florian Daniel, Pavel Kucherbaev, Cinzia Cappiello, Boualem Benatallah, and Mohammad Allahbakhsh. 2018. Quality control in crowdsourcing: A survey of quality attributes, assessment techniques, and assurance actions. ACM Computing Surveys (CSUR) 51, 1 (2018), 7.Google ScholarDigital Library
Alexander Philip Dawid and Allan M Skene. 1979. Maximum likelihood estimation of observer error-rates using the EM algorithm. Applied statistics (1979), 20–28.Google Scholar
Steven Dow, Anand Kulkarni, Scott Klemmer, and Björn Hartmann. 2012. Shepherding the crowd yields better work. In Proceedings of the ACM Conference on Computer Supported Cooperative Work (CSCW). 1013–1022.Google ScholarDigital Library
Minwei Feng, Bing Xiang, Michael R Glass, Lidan Wang, and Bowen Zhou. 2015. Applying deep learning to answer selection: A study and an open task. arXiv preprint arXiv:1508.01585(2015).Google Scholar
Ujwal Gadiraju, Ricardo Kawase, Stefan Dietze, and Gianluca Demartini. 2015. Understanding malicious behavior in crowdsourcing platforms: The case of online surveys. In Proceedings of the Annual ACM Conference on Human Factors in Computing Systems. 1631–1640.Google ScholarDigital Library
Haichang Gao, Jeff Yan, Fang Cao, Zhengya Zhang, Lei Lei, Mengyun Tang, Ping Zhang, Xin Zhou, Xuqin Wang, and Jiawei Li. 2016. A Simple Generic Attack on Text Captchas.. In Proceedings of the Network and Distributed System Security Symposium (NDSS).Google ScholarCross Ref
Ian J. Goodfellow, Jonathon Shlens, and Christian Szegedy. 2015. Explaining and Harnessing Adversarial Examples. In Proceedings of the International Conference on Learning Representations (ICLR).Google Scholar
Joseph K Goodman, Cynthia E Cryder, and Amar Cheema. 2013. Data collection in a flat world: The strengths and weaknesses of Mechanical Turk samples. Journal of Behavioral Decision Making 26, 3 (2013), 213–224.Google ScholarCross Ref
Sandy J. J. Gould, Anna L. Cox, and Duncan P. Brumby. 2016. Diminished Control in Crowdsourcing: An Investigation of Crowdworker Multitasking Behavior. ACM Trans. Comput.-Hum. Interact. 23, 3 (June 2016), 19:1–19:29.Google ScholarDigital Library
David J Hauser and Norbert Schwarz. 2016. Attentive Turkers: MTurk participants perform better on online attention checks than do subject pool participants. Behavior research methods 48, 1 (2016), 400–407.Google Scholar
Hossein Hosseini, Sreeram Kannan, Baosen Zhang, and Radha Poovendran. 2017. Deceiving Google’s Perspective API Built for Detecting Toxic Comments. arXiv preprint arXiv:1702.08138(2017).Google Scholar
Jason L Huang, Nathan A Bowling, Mengqiao Liu, and Yuhui Li. 2015. Detecting insufficient effort responding with an infrequency scale: Evaluating validity and participant reactions. Journal of Business and Psychology 30, 2 (2015), 299–311.Google ScholarCross Ref
Jason L Huang, Paul G Curran, Jessica Keeney, Elizabeth M Poposki, and Richard P DeShon. 2012. Detecting and deterring insufficient effort responding to surveys. Journal of Business and Psychology 27, 1 (2012), 99–114.Google ScholarCross Ref
Qatrunnada Ismail, Tousif Ahmed, Kelly Caine, Apu Kapadia, and Michael Reiter. 2017. To permit or not to permit, that is the usability question: Crowdsourcing mobile apps’ privacy permission settings. Proceedings on Privacy Enhancing Technologies 2017, 4(2017), 119–137.Google ScholarCross Ref
Srikanth Jagabathula, Lakshminarayanan Subramanian, and Ashwin Venkataraman. 2017. Identifying unreliable and adversarial workers in crowdsourced labeling tasks. Journal of Machine Learning Research 18, 1 (2017), 3233–3299.Google ScholarDigital Library
Robin Jia and Percy Liang. 2017. Adversarial Examples for Evaluating Reading Comprehension Systems. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 2021–2031.Google ScholarCross Ref
Chester Chun Seng Kam and Gabriel Hoi-huen Chan. 2018. Examination of the validity of instructed response items in identifying careless respondents. Personality and Individual Differences 129 (2018), 83–87.Google ScholarCross Ref
Jeremy Kees, Christopher Berry, Scot Burton, and Kim Sheehan. 2017. An analysis of data quality: Professional panels, student subject pools, and Amazon’s Mechanical Turk. Journal of Advertising 46, 1 (2017), 141–155.Google ScholarCross Ref
Amin Kharraz, William Robertson, and Engin Kirda. 2018. Surveylance: Automatically Detecting Online Survey Scams. In Proceedings of the IEEE Symposium on Security and Privacy (SP). 70–86.Google ScholarCross Ref
Kazuaki Kishida. 2005. Property of average precision and its generalization: An examination of evaluation indicator for information retrieval experiments. National Institute of Informatics Tokyo, Japan.Google Scholar
Aniket Kittur, Jeffrey V Nickerson, Michael Bernstein, Elizabeth Gerber, Aaron Shaw, John Zimmerman, Matt Lease, and John Horton. 2013. The future of crowd work. In Proceedings of the ACM Conference on Computer Supported Cooperative Work (CSCW). 1301–1318.Google ScholarDigital Library
Franki YH Kung, Navio Kwok, and Douglas J Brown. 2018. Are Attention Check Questions a Threat to Scale Validity?Applied Psychology 67, 2 (2018), 264–283.Google Scholar
Quoc Le and Tomas Mikolov. 2014. Distributed representations of sentences and documents. In Proceedings of the International Conference on Machine Learning. 1188–1196.Google Scholar
Hongwei Li and Bin Yu. 2014. Error rate bounds and iterative weighted majority voting for crowdsourcing. arXiv preprint arXiv:1411.4086(2014).Google Scholar
J Li, S Ji, T Du, B Li, and T Wang. 2019. TextBugger: Generating Adversarial Text Against Real-world Applications. In 26th Annual Network and Distributed System Security Symposium.Google Scholar
Minh-Thang Luong, Eugene Brevdo, and Rui Zhao. 2017. Neural Machine Translation (seq2seq) Tutorial. https://github.com/tensorflow/nmt(2017).Google Scholar
MarketResearch 2019. Market research industry - Current stats and future trends | QuestionPro. https://www.questionpro.com/blog/market-research-stats-and-trends/.Google Scholar
Adam W Meade and S Bartholomew Craig. 2012. Identifying careless responses in survey data.Psychological methods 17, 3 (2012), 437.Google Scholar
Chenglin Miao, Qi Li, Lu Su, Mengdi Huai, Wenjun Jiang, and Jing Gao. 2018. Attack under Disguise: An Intelligent Data Poisoning Attack Mechanism in Crowdsourcing. In Proceedings of the World Wide Web Conference. 13–22.Google ScholarDigital Library
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Proceedings of the Advances in neural information processing systems. 3111–3119.Google Scholar
Asako Miura and Tetsuro Kobayashi. 2016. Survey satisficing inflates stereotypical responses in online experiment: The case of immigration study. Frontiers in psychology 7 (2016), 1563.Google Scholar
Greg Mori and Jitendra Malik. 2003. Recognizing objects in adversarial clutter: Breaking a visual CAPTCHA. In 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings., Vol. 1. IEEE, I–I.Google ScholarCross Ref
MTurk 2018. Amazon Mechanical Turk (MTurk). https://www.mturk.com.Google Scholar
Daniel M Oppenheimer, Tom Meyvis, and Nicolas Davidenko. 2009. Instructional manipulation checks: Detecting satisficing to increase statistical power. Journal of Experimental Social Psychology 45, 4 (2009), 867–872.Google ScholarCross Ref
Gabriele Paolacci, Jesse Chandler, and Panagiotis G Ipeirotis. 2010. Running experiments on amazon mechanical turk. udgment and Decision Making 5, 5 (2010), 411–419.Google Scholar
Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and Percy Liang. 2016. SQuAD: 100,000+ Questions for Machine Comprehension of Text. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 2383–2392.Google ScholarCross Ref
Jinfeng Rao, Hua He, and Jimmy Lin. 2017. Experiments with convolutional neural network models for answer selection. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. 1217–1220.Google ScholarDigital Library
Reddit 2019. Reddit. https://www.reddit.com/.Google Scholar
Matthew Richardson, Christopher JC Burges, and Erin Renshaw. 2013. Mctest: A challenge dataset for the open-domain machine comprehension of text. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 193–203.Google Scholar
Suranjana Samanta and Sameep Mehta. 2018. Generating adversarial text samples. In Proceedings of the European Conference on Information Retrieval. 744–749.Google ScholarCross Ref
Mario Schaarschmidt, Stefan Ivens, Dirk Homscheid, and Pascal Bilo. 2015. Crowdsourcing for Survey Research: where Amazon Mechanical Turks deviates from conventional survey methods. Arbeitsberichte aus dem Fachbereich(2015).Google Scholar
Daniel J Simons and Christopher F Chabris. 2012. Common (mis) beliefs about memory: A replication and comparison of telephone and Mechanical Turk survey methods. PloS one 7, 12 (2012), e51876.Google ScholarCross Ref
Scott M Smith, Catherine A Roster, Linda L Golden, and Gerald S Albaum. 2016. A multi-group analysis of online survey respondent data quality: Comparing a regular USA consumer panel to MTurk samples. Journal of Business Research 69, 8 (2016), 3139–3148.Google ScholarCross Ref
Ianna Sodré and Francisco Brasileiro. 2017. An analysis of the use of qualifications on the Amazon mechanical Turk online labor market. Computer Supported Cooperative Work 26, 4-6 (2017), 837–872.Google ScholarDigital Library
SpellCheckMicrosoft 2018. Spell Check | Microsoft Azure. https://azure.microsoft.com/zh-cn/services/cognitive-services/spell-check/.Google Scholar
Peng Sun and Kathryn T Stolee. 2016. Exploring crowd consistency in a mechanical turk survey. In 2016 IEEE/ACM 3rd International Workshop on CrowdSourcing in Software Engineering (CSI-SE). IEEE, 8–14.Google ScholarDigital Library
Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian J. Goodfellow, and Rob Fergus. 2014. Intriguing Properties of Neural Networks. In Proceedings of the International Conference on Learning Representations (ICLR).Google Scholar
Makarand Tapaswi, Yukun Zhu, Rainer Stiefelhagen, Antonio Torralba, Raquel Urtasun, and Sanja Fidler. 2016. MovieQA: Understanding Stories in Movies through Question-Answering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarCross Ref
Yi Tay, Luu Anh Tuan, and Siu Cheung Hui. 2018. Hyperbolic Representation Learning for Fast and Efficient Neural Question Answering. In Proceedings of the ACM International Conference on Web Search and Data Mining. 583–591.Google ScholarDigital Library
Jeroen Vuurens, Arjen P de Vries, and Carsten Eickhoff. 2011. How much spam can you take? an analysis of crowdsourcing results to increase accuracy. In Proceedings of the ACM SIGIR Workshop on Crowdsourcing for Information Retrieval. 21–26.Google Scholar
Shuohang Wang and Jing Jiang. 2016. A compare-aggregate model for matching text sequences. arXiv preprint arXiv:1611.01747(2016).Google Scholar
Mary Kathrine Ward and Samuel B Pond III. 2015. Using virtual presence and survey instructions to minimize careless responding on Internet-based surveys. Computers in Human Behavior 48 (2015), 554–568.Google ScholarDigital Library
Alex C Williams, Joslin Goh, Charlie G Willis, Aaron M Ellison, James H Brusuelas, Charles C Davis, and Edith Law. 2017. Deja vu: Characterizing worker reliability using task consistency. In Fifth AAAI Conference on Human Computation and Crowdsourcing.Google Scholar
Steven L Wise and Xiaojing Kong. 2005. Response time effort: A new measure of examinee motivation in computer-based tests. Applied Measurement in Education 18, 2 (2005), 163–183.Google ScholarCross Ref
WMT16 2016. ACL2016 First Conference on Machine Translation (WMT16). http://www.statmt.org/wmt16/.Google Scholar
Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, 2016. Google’s neural machine translation system: Bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144(2016).Google Scholar
Liu Yang, Qingyao Ai, Jiafeng Guo, and W Bruce Croft. 2016. aNMM: Ranking short answer texts with attention-based neural matching model. In Proceedings of the ACM International on Conference on Information and Knowledge Management. 287–296.Google ScholarDigital Library
Yi Yang, Wen-tau Yih, and Christopher Meek. 2015. Wikiqa: A challenge dataset for open-domain question answering. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 2013–2018.Google ScholarCross Ref
Yuanshun Yao, Bimal Viswanath, Jenna Cryan, Haitao Zheng, and Ben Y Zhao. 2017. Automated crowdturfing attacks and defenses in online review systems. In Proceedings of the ACM SIGSAC Conference on Computer and Communications Security. 1143–1158.Google ScholarDigital Library
Guixin Ye, Zhanyong Tang, Dingyi Fang, Zhanxing Zhu, Yansong Feng, Pengfei Xu, Xiaojiang Chen, and Zheng Wang. 2018. Yet Another Text Captcha Solver: A Generative Adversarial Network Based Approach. In Proceedings of the ACM SIGSAC Conference on Computer and Communications Security. 332–348.Google ScholarDigital Library
Wen-tau Yih, Ming-Wei Chang, Christopher Meek, and Andrzej Pastusiak. 2013. Question answering using enhanced lexical semantic models. In Proceedings of the Annual Meeting of the Association for Computational Linguistics. 1744–1753.Google Scholar
Wenpeng Yin, Hinrich Schütze, Bing Xiang, and Bowen Zhou. 2016. ABCNN: Attention-Based Convolutional Neural Network for Modeling Sentence Pairs. Transactions of the Association of Computational Linguistics 4, 1(2016), 259–272.Google ScholarCross Ref
Adams Wei Yu, David Dohan, Minh-Thang Luong, Rui Zhao, Kai Chen, Mohammad Norouzi, and Quoc V Le. 2018. QANet: Combining Local Convolution with Global Self-Attention for Reading Comprehension. arXiv preprint arXiv:1804.09541(2018).Google Scholar
Lei Yu, Karl Moritz Hermann, Phil Blunsom, and Stephen Pulman. 2014. Deep learning for answer sentence selection. In Proceedings of the Deep Learning and Representation Learning Workshop.Google Scholar
Haijun Zhai, Todd Lingren, Louise Deleger, Qi Li, Megan Kaiser, Laura Stoutenborough, and Imre Solti. 2013. Web 2.0-based crowdsourcing for high-quality gold standard development in clinical natural language processing. Journal of medical Internet research 15, 4 (2013).Google ScholarCross Ref

Index Terms

Attention Please: Your Attention Check Questions in Survey Studies Can Be Automatically Answered

Index terms have been assigned to the content through auto-classification.

Recommendations

An Attentive Survey of Attention Models
Attention Model has now become an important concept in neural networks that has been researched within diverse application domains. This survey provides a structured and comprehensive overview of the developments in modeling attention. In particular, we ...
Read More
ATTENTION: ATTackEr Traceback Using MAC Layer AbNormality DetecTION
ISA '09: Proceedings of the 3rd International Conference and Workshops on Advances in Information Security and Assurance

Denial-of-Service (DoS) and Distributed DoS (DDoS) attacks can cause serious problems in wireless multi-hop networks due to limited network and host resources. Attacker traceback is a promising solution to take a proper countermeasure near the attack ...
Read More
Attention, Please! Adversarial Defense via Activation Rectification and Preservation
This study provides a new understanding of the adversarial attack problem by examining the correlation between adversarial attack and visual attention change. In particular, we observed that: (1) images with incomplete attention regions are more ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
WWW '20: Proceedings of The Web Conference 2020
April 2020
3143 pages
ISBN:9781450370233
DOI:10.1145/3366423
Editors:
Yennun Huang
Acadmica sinica, Taiwan
,
Irwin King
The Chinese University of Hong Kong, Hong Kong
,
Tie-Yan Liu
Microsoft Research Asia, China
,
Maarten van Steen
University of Twente, Netherlands
Copyright © 2020 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 20 April 2020
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Attention Check
Automatic Answer Generation
Crowdsourcing
Deep Learning
Online Survey
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate1,899of8,196submissions,23%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 15
  Total Citations
  View Citations
- 2,223
  Total Downloads
- Downloads (Last 12 months)382
- Downloads (Last 6 weeks)62
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Attention Please: Your Attention Check Questions in Survey Studies Can Be Automatically Answered

WWW '20: Proceedings of The Web Conference 2020

ABSTRACT

References

Cited By

Index Terms

Recommendations

An Attentive Survey of Attention Models

ATTENTION: ATTackEr Traceback Using MAC Layer AbNormality DetecTION

Attention, Please! Adversarial Defense via Activation Rectification and Preservation

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

Attention Please: Your Attention Check Questions in Survey Studies Can Be Automatically Answered

WWW '20: Proceedings of The Web Conference 2020

ABSTRACT

References

Cited By

Index Terms

Recommendations

An Attentive Survey of Attention Models

ATTENTION: ATTackEr Traceback Using MAC Layer AbNormality DetecTION

Attention, Please! Adversarial Defense via Activation Rectification and Preservation

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media