skip to main content
10.1145/1600150.1600174acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Efficient human computation: the distributed labeling problem

Published:28 June 2009Publication History

ABSTRACT

Collecting large labeled data sets is a laborious and expensive task, whose scaling up requires division of the labeling workload between many teachers. When the number of classes is large, miscorrespondences between the labels given by the different teachers are likely to occur, which, in the extreme case, may reach total inconsistency. In this study we describe how globally consistent labels can be obtained, despite the absence of teacher coordination, and discuss the possible efficiency of this process in terms of human labor. We define a notion of label efficiency, measuring the ratio between the number of globally consistent labels obtained and the number of labels provided by distributed teachers. We show that the efficiency depends critically on the ratio α between the number of data instances seen by a single teacher, and the number of classes. We suggest several algorithms for the distributed labeling problem, and analyze their efficiency as a function of α. In addition, we provide an upper bound on label efficiency for the case of completely uncoordinated teachers, and show that efficiency approaches 0 as the ratio between the number of labels each teacher provides and the number of classes drops (i.e. α → 0).

References

  1. A. C. Atkinson and A. N. Donve. optimum experiment designs. Oxford University Press, 1992.Google ScholarGoogle Scholar
  2. A. Bar-Hillel, T. Hertz, N. Shental, and D. Weinshall. Learning a mahalanobis metric from equivalence constraints. Journal of Machine Learning Reseach (JMLR), 6(Jun):937--965, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. A. Bar-Hillel and D. Weinshall. Learning with equivalence constraints, and the relation to multiclass classification. In Conference on Learning Theory (COLT), 2003.Google ScholarGoogle Scholar
  4. D. Cohn, L. Atlas, and R. Ladner. Training connectionist networks with queries and selective sampling. Advanced in Neural Information Processing Systems 2, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. S. E. Decator. Efficient Learning from Faulty Data. PhD thesis, Harvard University, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman. The PASCAL Visual Object Classes Challenge 2007 (VOC2007) Results. http://www.pascal-network.org/challenges/VOC/voc2007/workshop/index.html, 2007.Google ScholarGoogle Scholar
  7. Y. Freund, H. Seung, E. Shamir, and N. Tishby. Selective sampling using the query by committee algorithm. Machine Learning, 28:133--168, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. G. Griffin, A. Holub, and P. Perona. Caltech-256 object category dataset. Technical Report 7694, California Institute of Technology, 2007.Google ScholarGoogle Scholar
  9. B. Russell, A. Torralba, K. Murphy, and W. Freeman. Labelme: a database and web-based tool for image annotation. mit ai lab memo aim-2005-025, 2005.Google ScholarGoogle Scholar
  10. L. von Ahn. Games with a purpose. IEEE Computer, 39(6):92--94, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Efficient human computation: the distributed labeling problem

                  Recommendations

                  Comments

                  Login options

                  Check if you have access through your login credentials or your institution to get full access on this article.

                  Sign in
                  • Published in

                    cover image ACM Conferences
                    HCOMP '09: Proceedings of the ACM SIGKDD Workshop on Human Computation
                    June 2009
                    87 pages
                    ISBN:9781605586724
                    DOI:10.1145/1600150

                    Copyright © 2009 ACM

                    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

                    Publisher

                    Association for Computing Machinery

                    New York, NY, United States

                    Publication History

                    • Published: 28 June 2009

                    Permissions

                    Request permissions about this article.

                    Request Permissions

                    Check for updates

                    Qualifiers

                    • research-article

                    Upcoming Conference

                    KDD '24
                  • Article Metrics

                    • Downloads (Last 12 months)2
                    • Downloads (Last 6 weeks)1

                    Other Metrics

                  PDF Format

                  View or Download as a PDF file.

                  PDF

                  eReader

                  View online with eReader.

                  eReader