skip to main content
10.1145/3178876.3186032acmotherconferencesArticle/Chapter ViewAbstractPublication PageswwwConference Proceedingsconference-collections
research-article
Free Access

Attack under Disguise: An Intelligent Data Poisoning Attack Mechanism in Crowdsourcing

Authors Info & Claims
Published:23 April 2018Publication History

ABSTRACT

As an effective way to solicit useful information from the crowd, crowdsourcing has emerged as a popular paradigm to solve challenging tasks. However, the data provided by the participating workers are not always trustworthy. In real world, there may exist malicious workers in crowdsourcing systems who conduct the data poisoning attacks for the purpose of sabotage or financial rewards. Although data aggregation methods such as majority voting are conducted on workers» labels in order to improve data quality, they are vulnerable to such attacks as they treat all the workers equally. In order to capture the variety in the reliability of workers, the Dawid-Skene model, a sophisticated data aggregation method, has been widely adopted in practice. By conducting maximum likelihood estimation (MLE) using the expectation maximization (EM) algorithm, the Dawid-Skene model can jointly estimate each worker»s reliability and conduct weighted aggregation, and thus can tolerate the data poisoning attacks to some degree. However, the Dawid-Skene model still has weakness. In this paper, we study the data poisoning attacks against such crowdsourcing systems with the Dawid-Skene model empowered. We design an intelligent attack mechanism, based on which the attacker can not only achieve maximum attack utility but also disguise the attacking behaviors. Extensive experiments based on real-world crowdsourcing datasets are conducted to verify the desirable properties of the proposed mechanism.

References

  1. Scott Alfeld, Xiaojin Zhu, and Paul Barford. 2016. Data Poisoning Attacks against Autoregressive Models Proc. of AAAI. 1452--1458. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Jonathan F Bard. 1998. Practical bilevel optimization: algorithms and applications. Kluwer Academic Publishers. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Marco Barreno, Blaine Nelson, Russell Sears, Anthony D Joseph, and J Doug Tygar. 2006. Can machine learning be secure? .In Proc. of ASIACCS. 16--25. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Battista Biggio, Blaine Nelson, and Pavel Laskov. 2012. Poisoning attacks against support vector machines. In Proc. of ICML. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Marco Brambilla, Stefano Ceri, Andrea Mauri, and Riccardo Volonterio. 2014. Community-based crowdsourcing. In Proc. of WWW. 891--896. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Shih-Hao Chang and Zhi-Rong Chen. 2016. Protecting Mobile Crowd Sensing against Sybil Attacks Using Cloud Based Trust Management System. Mobile Information Systems Vol. 2016 (2016).Google ScholarGoogle Scholar
  7. Xi Chen, Qihang Lin, and Dengyong Zhou. 2013. Optimistic knowledge gradient policy for optimal budget allocation in crowdsourcing Proc. of ICML. 64--72. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Nilesh Dalvi, Anirban Dasgupta, Ravi Kumar, and Vibhor Rastogi. 2013. Aggregating crowdsourced binary ratings. In Proc. of WWW. 285--294. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Alexander Philip Dawid and Allan M Skene. 1979. Maximum likelihood estimation of observer error-rates using the EM algorithm. Applied statistics (1979).Google ScholarGoogle Scholar
  10. Luca de Alfaro, Vassilis Polychronopoulos, and Michael Shavlovsky. 2015. Reliable aggregation of boolean crowdsourced tasks Proc. of HCOMP.Google ScholarGoogle Scholar
  11. Arthur P Dempster, Nan M Laird, and Donald B Rubin. 1977. Maximum likelihood from incomplete data via the EM algorithm. Journal of the royal statistical society (1977), 1--38.Google ScholarGoogle Scholar
  12. Djellel Eddine Difallah, Gianluca Demartini, and Philippe Cudré-Mauroux. 2012. Mechanical Cheat: Spamming Schemes and Adversarial Techniques on Crowdsourcing Platforms.. In CrowdSearch. 26--30.Google ScholarGoogle Scholar
  13. Djellel Eddine Difallah, Gianluca Demartini, and Philippe Cudré-Mauroux. 2016. Scheduling human intelligence tasks in multi-tenant crowd-powered systems Proc. of WWW. 855--865. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Carsten Eickhoff and Arjen P de Vries. 2013. Increasing cheat robustness of crowdsourcing tasks. Information retrieval Vol. 16, 2 (2013), 121--137. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Ju Fan, Guoliang Li, Beng Chin Ooi, Kian-lee Tan, and Jianhua Feng. 2015. icrowd: An adaptive crowdsourcing framework. In Proc. of SIGMOD. 1015--1030. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Ujwal Gadiraju, Gianluca Demartini, Ricardo Kawase, and Stefan Dietze. 2015 a. Human beyond the machine: Challenges and opportunities of microtask crowdsourcing. IEEE Intelligent Systems Vol. 30, 4 (2015), 81--85.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Ujwal Gadiraju, Ricardo Kawase, Stefan Dietze, and Gianluca Demartini. 2015 b. Understanding malicious behavior in crowdsourcing platforms: The case of online surveys Proc. of CHI. 1631--1640. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Matthias Hirth, Tobias Hoßfeld, and Phuoc Tran-Gia. 2010. Cheat-detection mechanisms for crowdsourcing. University of Würzburg, Tech. Rep Vol. 4 (2010).Google ScholarGoogle Scholar
  19. Ling Huang, Anthony D Joseph, Blaine Nelson, Benjamin IP Rubinstein, and JD Tygar. 2011. Adversarial machine learning. In Proc. of AISec. 43--58. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Nguyen Quoc Viet Hung, Duong Chi Thang, Matthias Weidlich, and Karl Aberer. 2015. Minimizing efforts in validating crowd answers. In Proc. of SIGMOD. 999--1014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Vittorio P Illiano and Emil C Lupu. 2015. Detecting malicious data injections in wireless sensor networks: A survey. ACM Computing Surveys (CSUR) (2015). Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Panagiotis G Ipeirotis, Foster Provost, and Jing Wang. 2010. Quality management on amazon mechanical turk. In Proc. of the ACM SIGKDD workshop on human computation. 64--67. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Srikanth Jagabathula, Lakshminarayanan Subramanian, and Ashwin Venkataraman. 2014. Reputation-based worker filtering in crowdsourcing Proc. of NIPS. 2492--2500. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Srikanth Jagabathula, Lakshminarayanan Subramanian, and Ashwin Venkataraman. 2016. Identifying Unreliable and Adversarial Workers in Crowdsourced Labeling Tasks. (2016).Google ScholarGoogle Scholar
  25. David R Karger, Sewoong Oh, and Devavrat Shah. 2014. Budget-optimal task allocation for reliable crowdsourcing systems. Operations Research Vol. 62, 1 (2014), 1--24. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Walter S Lasecki, Jaime Teevan, and Ece Kamar. 2014. Information extraction and manipulation threats in crowd-powered systems Proc. of CSCW. 248--256. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Edith Law, Ming Yin, Joslin Goh, Kevin Chen, Michael A Terry, and Krzysztof Z Gajos. 2016. Curiosity killed the cat, but makes crowdwork better Proc. of CHI. 4098--4110. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Bo Li, Yining Wang, Aarti Singh, and Yevgeniy Vorobeychik. 2016 b. Data poisoning attacks on factorization-based collaborative filtering Proc. of NIPS. 1885--1893. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Guoliang Li, Jiannan Wang, Yudian Zheng, and Michael J Franklin. 2016 c. Crowdsourced data management: A survey. IEEE Transactions on Knowledge and Data Engineering Vol. 28, 9 (2016), 2296--2319. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Hongwei Li, Bin Yu, and Dengyong Zhou. 2013. Error rate analysis of labeling by crowdsourcing. In ICML Workshop: Machine Learning Meets Crowdsourcing.Google ScholarGoogle Scholar
  31. Qi Li, Fenglong Ma, Jing Gao, Lu Su, and Christopher J Quinn. 2016 a. Crowdsourcing high quality labels with a tight budget Proc. of WSDM. 237--246. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Yaliang Li, Jing Gao, Patrick PC Lee, Lu Su, Caifeng He, Cheng He, Fan Yang, and Wei Fan. 2017. A weighted crowdsourcing approach for network quality measurement in cellular data networks. IEEE Transactions on Mobile Computing Vol. 16, 2 (2017), 300--313. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Bing Liu. 2012. Sentiment analysis and opinion mining. Synthesis lectures on human language technologies Vol. 5, 1 (2012), 1--167. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Qiang Liu, Jian Peng, and Alexander T Ihler. 2012. Variational inference for crowdsourcing. In Proc. of NIPS. 692--700. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Yao Liu, Peng Ning, and Michael K Reiter. 2011. False data injection attacks against state estimation in electric power grids. ACM Transactions on Information and System Security Vol. 14, 1 (2011), 13. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Fenglong Ma, Yaliang Li, Qi Li, Minghui Qiu, Jing Gao, Shi Zhi, Lu Su, Bo Zhao, Heng Ji, and Jiawei Han. 2015. Faitcrowd: Fine grained truth discovery for crowdsourced data aggregation Proc. of KDD. 745--754. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Shike Mei and Xiaojin Zhu. 2015. Using Machine Teaching to Identify Optimal Training-Set Attacks on Machine Learners. In Proc. of AAAI. 2871--2877. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Chuishi Meng, Wenjun Jiang, Yaliang Li, Jing Gao, Lu Su, Hu Ding, and Yun Cheng. 2015. Truth discovery on crowd sensing of correlated entities Proc. of SenSys. 169--182. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Chenglin Miao, Wenjun Jiang, Lu Su, Yaliang Li, Suxin Guo, Zhan Qin, Houping Xiao, Jing Gao, and Kui Ren. 2015. Cloud-enabled privacy-preserving truth discovery in crowd sensing systems Proc. of SenSys. 183--196. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Quoc Viet Hung Nguyen, Tam Nguyen Thanh, Ngoc Tran Lam, Son Thanh Do, and Karl Aberer. 2013. A Benchmark for Aggregation Techniques in Crowdsourcing Proc. of SIGIR. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Jungseul Ok, Sewoong Oh, Jinwoo Shin, and Yung Yi. 2016. Optimality of belief propagation for crowdsourced classification Proc. of ICML. 535--544. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Zhengrui Qin, Qun Li, and George Hsieh. 2013. Defending against cooperative attacks in cooperative spectrum sensing. IEEE Transactions on Wireless Communications Vol. 12, 6 (2013), 2680--2687.Google ScholarGoogle ScholarCross RefCross Ref
  43. Vikas C Raykar and Shipeng Yu. 2012. Eliminating spammers and ranking annotators for crowdsourced labeling tasks. Journal of Machine Learning Research Vol. 13, Feb (2012), 491--518. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Vikas C Raykar, Shipeng Yu, Linda H Zhao, Gerardo Hermosillo Valadez, Charles Florin, Luca Bogoni, and Linda Moy. 2010. Learning from crowds. Journal of Machine Learning Research Vol. 11, Apr (2010), 1297--1322. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Mohsen Rezvani, Aleksandar Ignjatovic, Elisa Bertino, and Sanjay Jha. 2015. Secure data aggregation technique for wireless sensor networks in the presence of collusion attacks. IEEE Transactions on Dependable and Secure Computing Vol. 12, 1 (2015), 98--110.Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Rion Snow, Brendan O'Connor, Daniel Jurafsky, and Andrew Y Ng. 2008. Cheap and fast--but is it good?: evaluating non-expert annotations for natural language tasks. In Proc. of the EMNLP. 254--263. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Norases Vesdapunt, Kedar Bellare, and Nilesh Dalvi. 2014. Crowdsourcing algorithms for entity resolution. Proceedings of the VLDB Endowment Vol. 7, 12 (2014), 1071--1082. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Jeroen Vuurens, Arjen P de Vries, and Carsten Eickhoff. 2011. How much spam can you take? an analysis of crowdsourcing results to increase accuracy. In Proc. of CIR. 21--26.Google ScholarGoogle Scholar
  49. Gang Wang, Bolun Wang, Tianyi Wang, Ana Nika, Haitao Zheng, and Ben Y Zhao. 2016. Defending against sybil devices in crowdsourced mapping services Proc. of MobiSys. 179--191. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Gang Wang, Tianyi Wang, Haitao Zheng, and Ben Y Zhao. 2014. Man vs. Machine: Practical Adversarial Detection of Malicious Crowdsourcing Workers. USENIX Security Symposium. 239--254. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Jiannan Wang, Tim Kraska, Michael J Franklin, and Jianhua Feng. 2012. Crowder: Crowdsourcing entity resolution. Proc. of the VLDB Endowment Vol. 5, 11 (2012), 1483--1494. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Peter Welinder and Pietro Perona. 2010. Online crowdsourcing: rating annotators and obtaining cost-effective labels Proc. of CVPRW. 25--32.Google ScholarGoogle Scholar
  53. Jacob Whitehill, Ting-fan Wu, Jacob Bergsma, Javier R Movellan, and Paul L Ruvolo. 2009. Whose vote should count more: Optimal integration of labels from labelers of unknown expertise. In Proc. of NIPS. 2035--2043. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Huang Xiao, Battista Biggio, Gavin Brown, Giorgio Fumera, Claudia Eckert, and Fabio Roli. 2015. Is feature selection secure against training data poisoning? Proc. of ICML. 1689--1698. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Dong Yuan, Guoliang Li, Qi Li, and Yudian Zheng. 2017. Sybil Defense in Crowdsourcing Platforms. In Proc. of CIKM. 1529--1538. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Kuan Zhang, Xiaohui Liang, Rongxing Lu, and Xuemin Shen. 2014 b. Sybil attacks and their defenses in the internet of things. IEEE Internet of Things Journal Vol. 1, 5 (2014), 372--383.Google ScholarGoogle ScholarCross RefCross Ref
  57. Yuchen Zhang, Xi Chen, Denny Zhou, and Michael I Jordan. 2014 a. Spectral methods meet EM: A provably optimal algorithm for crowdsourcing Proc. of NIPS. 1260--1268. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Yudian Zheng, Guoliang Li, Yuanbing Li, Caihua Shan, and Reynold Cheng. 2017. Truth inference in crowdsourcing: is the problem solved? Proc. of the VLDB Endowment Vol. 10, 5 (2017), 541--552. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Denny Zhou, Sumit Basu, Yi Mao, and John C Platt. 2012. Learning from the wisdom of crowds by minimax entropy Proc. of NIPS. 2195--2203. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Attack under Disguise: An Intelligent Data Poisoning Attack Mechanism in Crowdsourcing

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Other conferences
        WWW '18: Proceedings of the 2018 World Wide Web Conference
        April 2018
        2000 pages
        ISBN:9781450356398

        Copyright © 2018 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        International World Wide Web Conferences Steering Committee

        Republic and Canton of Geneva, Switzerland

        Publication History

        • Published: 23 April 2018

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        WWW '18 Paper Acceptance Rate170of1,155submissions,15%Overall Acceptance Rate1,899of8,196submissions,23%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format .

      View HTML Format