Attack under Disguise: An Intelligent Data Poisoning Attack Mechanism in Crowdsourcing

Authors:
Chenglin Miao

State University of New York at Buffalo, Buffalo, NY, USA

State University of New York at Buffalo, Buffalo, NY, USA
View Profile

,
Qi Li

University of Illinois at Urbana-Champaign, Urbana, IL, USA

University of Illinois at Urbana-Champaign, Urbana, IL, USA
View Profile

,
Lu Su

State University of New York at Buffalo, Buffalo, NY, USA

State University of New York at Buffalo, Buffalo, NY, USA
View Profile

,
Mengdi Huai

State University of New York at Buffalo, Buffalo, NY, USA

State University of New York at Buffalo, Buffalo, NY, USA
View Profile

,
Wenjun Jiang

State University of New York at Buffalo, Buffalo, NY, USA

State University of New York at Buffalo, Buffalo, NY, USA
View Profile

,
Jing Gao

State University of New York at Buffalo, Buffalo, NY, USA

State University of New York at Buffalo, Buffalo, NY, USA
View Profile

Authors Info & Claims

WWW '18: Proceedings of the 2018 World Wide Web ConferenceApril 2018Pages 13–22https://doi.org/10.1145/3178876.3186032

Published:23 April 2018Publication History

WWW '18: Proceedings of the 2018 World Wide Web Conference

Pages 13–22

ABSTRACT

As an effective way to solicit useful information from the crowd, crowdsourcing has emerged as a popular paradigm to solve challenging tasks. However, the data provided by the participating workers are not always trustworthy. In real world, there may exist malicious workers in crowdsourcing systems who conduct the data poisoning attacks for the purpose of sabotage or financial rewards. Although data aggregation methods such as majority voting are conducted on workers» labels in order to improve data quality, they are vulnerable to such attacks as they treat all the workers equally. In order to capture the variety in the reliability of workers, the Dawid-Skene model, a sophisticated data aggregation method, has been widely adopted in practice. By conducting maximum likelihood estimation (MLE) using the expectation maximization (EM) algorithm, the Dawid-Skene model can jointly estimate each worker»s reliability and conduct weighted aggregation, and thus can tolerate the data poisoning attacks to some degree. However, the Dawid-Skene model still has weakness. In this paper, we study the data poisoning attacks against such crowdsourcing systems with the Dawid-Skene model empowered. We design an intelligent attack mechanism, based on which the attacker can not only achieve maximum attack utility but also disguise the attacking behaviors. Extensive experiments based on real-world crowdsourcing datasets are conducted to verify the desirable properties of the proposed mechanism.

References

Scott Alfeld, Xiaojin Zhu, and Paul Barford. 2016. Data Poisoning Attacks against Autoregressive Models Proc. of AAAI. 1452--1458. Google ScholarDigital Library
Jonathan F Bard. 1998. Practical bilevel optimization: algorithms and applications. Kluwer Academic Publishers. Google ScholarDigital Library
Marco Barreno, Blaine Nelson, Russell Sears, Anthony D Joseph, and J Doug Tygar. 2006. Can machine learning be secure? .In Proc. of ASIACCS. 16--25. Google ScholarDigital Library
Battista Biggio, Blaine Nelson, and Pavel Laskov. 2012. Poisoning attacks against support vector machines. In Proc. of ICML. Google ScholarDigital Library
Marco Brambilla, Stefano Ceri, Andrea Mauri, and Riccardo Volonterio. 2014. Community-based crowdsourcing. In Proc. of WWW. 891--896. Google ScholarDigital Library
Shih-Hao Chang and Zhi-Rong Chen. 2016. Protecting Mobile Crowd Sensing against Sybil Attacks Using Cloud Based Trust Management System. Mobile Information Systems Vol. 2016 (2016).Google Scholar
Xi Chen, Qihang Lin, and Dengyong Zhou. 2013. Optimistic knowledge gradient policy for optimal budget allocation in crowdsourcing Proc. of ICML. 64--72. Google ScholarDigital Library
Nilesh Dalvi, Anirban Dasgupta, Ravi Kumar, and Vibhor Rastogi. 2013. Aggregating crowdsourced binary ratings. In Proc. of WWW. 285--294. Google ScholarDigital Library
Alexander Philip Dawid and Allan M Skene. 1979. Maximum likelihood estimation of observer error-rates using the EM algorithm. Applied statistics (1979).Google Scholar
Luca de Alfaro, Vassilis Polychronopoulos, and Michael Shavlovsky. 2015. Reliable aggregation of boolean crowdsourced tasks Proc. of HCOMP.Google Scholar
Arthur P Dempster, Nan M Laird, and Donald B Rubin. 1977. Maximum likelihood from incomplete data via the EM algorithm. Journal of the royal statistical society (1977), 1--38.Google Scholar
Djellel Eddine Difallah, Gianluca Demartini, and Philippe Cudré-Mauroux. 2012. Mechanical Cheat: Spamming Schemes and Adversarial Techniques on Crowdsourcing Platforms.. In CrowdSearch. 26--30.Google Scholar
Djellel Eddine Difallah, Gianluca Demartini, and Philippe Cudré-Mauroux. 2016. Scheduling human intelligence tasks in multi-tenant crowd-powered systems Proc. of WWW. 855--865. Google ScholarDigital Library
Carsten Eickhoff and Arjen P de Vries. 2013. Increasing cheat robustness of crowdsourcing tasks. Information retrieval Vol. 16, 2 (2013), 121--137. Google ScholarDigital Library
Ju Fan, Guoliang Li, Beng Chin Ooi, Kian-lee Tan, and Jianhua Feng. 2015. icrowd: An adaptive crowdsourcing framework. In Proc. of SIGMOD. 1015--1030. Google ScholarDigital Library
Ujwal Gadiraju, Gianluca Demartini, Ricardo Kawase, and Stefan Dietze. 2015 a. Human beyond the machine: Challenges and opportunities of microtask crowdsourcing. IEEE Intelligent Systems Vol. 30, 4 (2015), 81--85.Google ScholarDigital Library
Ujwal Gadiraju, Ricardo Kawase, Stefan Dietze, and Gianluca Demartini. 2015 b. Understanding malicious behavior in crowdsourcing platforms: The case of online surveys Proc. of CHI. 1631--1640. Google ScholarDigital Library
Matthias Hirth, Tobias Hoßfeld, and Phuoc Tran-Gia. 2010. Cheat-detection mechanisms for crowdsourcing. University of Würzburg, Tech. Rep Vol. 4 (2010).Google Scholar
Ling Huang, Anthony D Joseph, Blaine Nelson, Benjamin IP Rubinstein, and JD Tygar. 2011. Adversarial machine learning. In Proc. of AISec. 43--58. Google ScholarDigital Library
Nguyen Quoc Viet Hung, Duong Chi Thang, Matthias Weidlich, and Karl Aberer. 2015. Minimizing efforts in validating crowd answers. In Proc. of SIGMOD. 999--1014. Google ScholarDigital Library
Vittorio P Illiano and Emil C Lupu. 2015. Detecting malicious data injections in wireless sensor networks: A survey. ACM Computing Surveys (CSUR) (2015). Google ScholarDigital Library
Panagiotis G Ipeirotis, Foster Provost, and Jing Wang. 2010. Quality management on amazon mechanical turk. In Proc. of the ACM SIGKDD workshop on human computation. 64--67. Google ScholarDigital Library
Srikanth Jagabathula, Lakshminarayanan Subramanian, and Ashwin Venkataraman. 2014. Reputation-based worker filtering in crowdsourcing Proc. of NIPS. 2492--2500. Google ScholarDigital Library
Srikanth Jagabathula, Lakshminarayanan Subramanian, and Ashwin Venkataraman. 2016. Identifying Unreliable and Adversarial Workers in Crowdsourced Labeling Tasks. (2016).Google Scholar
David R Karger, Sewoong Oh, and Devavrat Shah. 2014. Budget-optimal task allocation for reliable crowdsourcing systems. Operations Research Vol. 62, 1 (2014), 1--24. Google ScholarDigital Library
Walter S Lasecki, Jaime Teevan, and Ece Kamar. 2014. Information extraction and manipulation threats in crowd-powered systems Proc. of CSCW. 248--256. Google ScholarDigital Library
Edith Law, Ming Yin, Joslin Goh, Kevin Chen, Michael A Terry, and Krzysztof Z Gajos. 2016. Curiosity killed the cat, but makes crowdwork better Proc. of CHI. 4098--4110. Google ScholarDigital Library
Bo Li, Yining Wang, Aarti Singh, and Yevgeniy Vorobeychik. 2016 b. Data poisoning attacks on factorization-based collaborative filtering Proc. of NIPS. 1885--1893. Google ScholarDigital Library
Guoliang Li, Jiannan Wang, Yudian Zheng, and Michael J Franklin. 2016 c. Crowdsourced data management: A survey. IEEE Transactions on Knowledge and Data Engineering Vol. 28, 9 (2016), 2296--2319. Google ScholarDigital Library
Hongwei Li, Bin Yu, and Dengyong Zhou. 2013. Error rate analysis of labeling by crowdsourcing. In ICML Workshop: Machine Learning Meets Crowdsourcing.Google Scholar
Qi Li, Fenglong Ma, Jing Gao, Lu Su, and Christopher J Quinn. 2016 a. Crowdsourcing high quality labels with a tight budget Proc. of WSDM. 237--246. Google ScholarDigital Library
Yaliang Li, Jing Gao, Patrick PC Lee, Lu Su, Caifeng He, Cheng He, Fan Yang, and Wei Fan. 2017. A weighted crowdsourcing approach for network quality measurement in cellular data networks. IEEE Transactions on Mobile Computing Vol. 16, 2 (2017), 300--313. Google ScholarDigital Library
Bing Liu. 2012. Sentiment analysis and opinion mining. Synthesis lectures on human language technologies Vol. 5, 1 (2012), 1--167. Google ScholarDigital Library
Qiang Liu, Jian Peng, and Alexander T Ihler. 2012. Variational inference for crowdsourcing. In Proc. of NIPS. 692--700. Google ScholarDigital Library
Yao Liu, Peng Ning, and Michael K Reiter. 2011. False data injection attacks against state estimation in electric power grids. ACM Transactions on Information and System Security Vol. 14, 1 (2011), 13. Google ScholarDigital Library
Fenglong Ma, Yaliang Li, Qi Li, Minghui Qiu, Jing Gao, Shi Zhi, Lu Su, Bo Zhao, Heng Ji, and Jiawei Han. 2015. Faitcrowd: Fine grained truth discovery for crowdsourced data aggregation Proc. of KDD. 745--754. Google ScholarDigital Library
Shike Mei and Xiaojin Zhu. 2015. Using Machine Teaching to Identify Optimal Training-Set Attacks on Machine Learners. In Proc. of AAAI. 2871--2877. Google ScholarDigital Library
Chuishi Meng, Wenjun Jiang, Yaliang Li, Jing Gao, Lu Su, Hu Ding, and Yun Cheng. 2015. Truth discovery on crowd sensing of correlated entities Proc. of SenSys. 169--182. Google ScholarDigital Library
Chenglin Miao, Wenjun Jiang, Lu Su, Yaliang Li, Suxin Guo, Zhan Qin, Houping Xiao, Jing Gao, and Kui Ren. 2015. Cloud-enabled privacy-preserving truth discovery in crowd sensing systems Proc. of SenSys. 183--196. Google ScholarDigital Library
Quoc Viet Hung Nguyen, Tam Nguyen Thanh, Ngoc Tran Lam, Son Thanh Do, and Karl Aberer. 2013. A Benchmark for Aggregation Techniques in Crowdsourcing Proc. of SIGIR. Google ScholarDigital Library
Jungseul Ok, Sewoong Oh, Jinwoo Shin, and Yung Yi. 2016. Optimality of belief propagation for crowdsourced classification Proc. of ICML. 535--544. Google ScholarDigital Library
Zhengrui Qin, Qun Li, and George Hsieh. 2013. Defending against cooperative attacks in cooperative spectrum sensing. IEEE Transactions on Wireless Communications Vol. 12, 6 (2013), 2680--2687.Google ScholarCross Ref
Vikas C Raykar and Shipeng Yu. 2012. Eliminating spammers and ranking annotators for crowdsourced labeling tasks. Journal of Machine Learning Research Vol. 13, Feb (2012), 491--518. Google ScholarDigital Library
Vikas C Raykar, Shipeng Yu, Linda H Zhao, Gerardo Hermosillo Valadez, Charles Florin, Luca Bogoni, and Linda Moy. 2010. Learning from crowds. Journal of Machine Learning Research Vol. 11, Apr (2010), 1297--1322. Google ScholarDigital Library
Mohsen Rezvani, Aleksandar Ignjatovic, Elisa Bertino, and Sanjay Jha. 2015. Secure data aggregation technique for wireless sensor networks in the presence of collusion attacks. IEEE Transactions on Dependable and Secure Computing Vol. 12, 1 (2015), 98--110.Google ScholarDigital Library
Rion Snow, Brendan O'Connor, Daniel Jurafsky, and Andrew Y Ng. 2008. Cheap and fast--but is it good?: evaluating non-expert annotations for natural language tasks. In Proc. of the EMNLP. 254--263. Google ScholarDigital Library
Norases Vesdapunt, Kedar Bellare, and Nilesh Dalvi. 2014. Crowdsourcing algorithms for entity resolution. Proceedings of the VLDB Endowment Vol. 7, 12 (2014), 1071--1082. Google ScholarDigital Library
Jeroen Vuurens, Arjen P de Vries, and Carsten Eickhoff. 2011. How much spam can you take? an analysis of crowdsourcing results to increase accuracy. In Proc. of CIR. 21--26.Google Scholar
Gang Wang, Bolun Wang, Tianyi Wang, Ana Nika, Haitao Zheng, and Ben Y Zhao. 2016. Defending against sybil devices in crowdsourced mapping services Proc. of MobiSys. 179--191. Google ScholarDigital Library
Gang Wang, Tianyi Wang, Haitao Zheng, and Ben Y Zhao. 2014. Man vs. Machine: Practical Adversarial Detection of Malicious Crowdsourcing Workers. USENIX Security Symposium. 239--254. Google ScholarDigital Library
Jiannan Wang, Tim Kraska, Michael J Franklin, and Jianhua Feng. 2012. Crowder: Crowdsourcing entity resolution. Proc. of the VLDB Endowment Vol. 5, 11 (2012), 1483--1494. Google ScholarDigital Library
Peter Welinder and Pietro Perona. 2010. Online crowdsourcing: rating annotators and obtaining cost-effective labels Proc. of CVPRW. 25--32.Google Scholar
Jacob Whitehill, Ting-fan Wu, Jacob Bergsma, Javier R Movellan, and Paul L Ruvolo. 2009. Whose vote should count more: Optimal integration of labels from labelers of unknown expertise. In Proc. of NIPS. 2035--2043. Google ScholarDigital Library
Huang Xiao, Battista Biggio, Gavin Brown, Giorgio Fumera, Claudia Eckert, and Fabio Roli. 2015. Is feature selection secure against training data poisoning? Proc. of ICML. 1689--1698. Google ScholarDigital Library
Dong Yuan, Guoliang Li, Qi Li, and Yudian Zheng. 2017. Sybil Defense in Crowdsourcing Platforms. In Proc. of CIKM. 1529--1538. Google ScholarDigital Library
Kuan Zhang, Xiaohui Liang, Rongxing Lu, and Xuemin Shen. 2014 b. Sybil attacks and their defenses in the internet of things. IEEE Internet of Things Journal Vol. 1, 5 (2014), 372--383.Google ScholarCross Ref
Yuchen Zhang, Xi Chen, Denny Zhou, and Michael I Jordan. 2014 a. Spectral methods meet EM: A provably optimal algorithm for crowdsourcing Proc. of NIPS. 1260--1268. Google ScholarDigital Library
Yudian Zheng, Guoliang Li, Yuanbing Li, Caihua Shan, and Reynold Cheng. 2017. Truth inference in crowdsourcing: is the problem solved? Proc. of the VLDB Endowment Vol. 10, 5 (2017), 541--552. Google ScholarDigital Library
Denny Zhou, Sumit Basu, Yi Mao, and John C Platt. 2012. Learning from the wisdom of crowds by minimax entropy Proc. of NIPS. 2195--2203. Google ScholarDigital Library

Index Terms

Attack under Disguise: An Intelligent Data Poisoning Attack Mechanism in Crowdsourcing
1. Human-centered computing
  1. Collaborative and social computing
2. Security and privacy
  1. Systems security

Recommendations

Data Poisoning Attack against Recommender System Using Incomplete and Perturbed Data
KDD '21: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining

Recent studies reveal that recommender systems are vulnerable to data poisoning attack due to their openness nature. In data poisoning attack, the attacker typically recruits a group of controlled users to inject well-crafted user-item interaction data ...
Read More
Practical Data Poisoning Attack against Next-Item Recommendation
WWW '20: Proceedings of The Web Conference 2020

Online recommendation systems make use of a variety of information sources to provide users the items that users are potentially interested in. However, due to the openness of the online platform, recommendation systems are vulnerable to data poisoning ...
Read More
Towards Data Poisoning Attacks in Crowd Sensing Systems
Mobihoc '18: Proceedings of the Eighteenth ACM International Symposium on Mobile Ad Hoc Networking and Computing

With the proliferation of sensor-rich mobile devices, crowd sensing has emerged as a new paradigm of collecting information from the physical world. However, the sensory data provided by the participating workers are usually not reliable. In order to ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
WWW '18: Proceedings of the 2018 World Wide Web Conference
April 2018
2000 pages
ISBN:9781450356398
General Chairs:
Pierre-Antoine Champin
Universitè Claude Bernard Lyon 1, France
,
Fabien Gandon
Inria, Université Côte d'Azur, CNRS, I3S, France
,
Lionel Médini
Université Claude Bernard Lyon 1, France
,
Program Chairs:
Mounia Lalmas
Spotify, UK
,
Panagiotis G. Ipeirotis
New York University, USA
Copyright © 2018 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
International World Wide Web Conferences Steering Committee
Republic and Canton of Geneva, Switzerland
Publication History
- Published: 23 April 2018
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
crowdsourcing
data poisoning
expectation maximization
Qualifiers
- research-article
Conference

Acceptance Rates
WWW '18 Paper Acceptance Rate170of1,155submissions,15%Overall Acceptance Rate1,899of8,196submissions,23%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 38
  Total Citations
  View Citations
- 1,912
  Total Downloads
- Downloads (Last 12 months)154
- Downloads (Last 6 weeks)12
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Attack under Disguise: An Intelligent Data Poisoning Attack Mechanism in Crowdsourcing

WWW '18: Proceedings of the 2018 World Wide Web Conference

ABSTRACT

References

Cited By

Index Terms

Recommendations

Data Poisoning Attack against Recommender System Using Incomplete and Perturbed Data

Practical Data Poisoning Attack against Next-Item Recommendation

Towards Data Poisoning Attacks in Crowd Sensing Systems