research-article

Efficient human computation: the distributed labeling problem

Authors:
Ran Gilad-Bachrach

Microsoft Corporation, Redmond, WA

Microsoft Corporation, Redmond, WA
View Profile

,
Aharon Bar-Hillel

Advanced Tech. Center, Israel

Advanced Tech. Center, Israel
View Profile

,
Liat Ein-Dor

Intel Israel, Haifa, Israel

Intel Israel, Haifa, Israel
View Profile

HCOMP '09: Proceedings of the ACM SIGKDD Workshop on Human ComputationJune 2009Pages 70–76https://doi.org/10.1145/1600150.1600174

Published:28 June 2009Publication History

HCOMP '09: Proceedings of the ACM SIGKDD Workshop on Human Computation

Pages 70–76

ABSTRACT

Collecting large labeled data sets is a laborious and expensive task, whose scaling up requires division of the labeling workload between many teachers. When the number of classes is large, miscorrespondences between the labels given by the different teachers are likely to occur, which, in the extreme case, may reach total inconsistency. In this study we describe how globally consistent labels can be obtained, despite the absence of teacher coordination, and discuss the possible efficiency of this process in terms of human labor. We define a notion of label efficiency, measuring the ratio between the number of globally consistent labels obtained and the number of labels provided by distributed teachers. We show that the efficiency depends critically on the ratio α between the number of data instances seen by a single teacher, and the number of classes. We suggest several algorithms for the distributed labeling problem, and analyze their efficiency as a function of α. In addition, we provide an upper bound on label efficiency for the case of completely uncoordinated teachers, and show that efficiency approaches 0 as the ratio between the number of labels each teacher provides and the number of classes drops (i.e. α → 0).

References

A. C. Atkinson and A. N. Donve. optimum experiment designs. Oxford University Press, 1992.Google Scholar
A. Bar-Hillel, T. Hertz, N. Shental, and D. Weinshall. Learning a mahalanobis metric from equivalence constraints. Journal of Machine Learning Reseach (JMLR), 6(Jun):937--965, 2005. Google ScholarDigital Library
A. Bar-Hillel and D. Weinshall. Learning with equivalence constraints, and the relation to multiclass classification. In Conference on Learning Theory (COLT), 2003.Google Scholar
D. Cohn, L. Atlas, and R. Ladner. Training connectionist networks with queries and selective sampling. Advanced in Neural Information Processing Systems 2, 1990. Google ScholarDigital Library
S. E. Decator. Efficient Learning from Faulty Data. PhD thesis, Harvard University, 1995. Google ScholarDigital Library
M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman. The PASCAL Visual Object Classes Challenge 2007 (VOC2007) Results. http://www.pascal-network.org/challenges/VOC/voc2007/workshop/index.html, 2007.Google Scholar
Y. Freund, H. Seung, E. Shamir, and N. Tishby. Selective sampling using the query by committee algorithm. Machine Learning, 28:133--168, 1997. Google ScholarDigital Library
G. Griffin, A. Holub, and P. Perona. Caltech-256 object category dataset. Technical Report 7694, California Institute of Technology, 2007.Google Scholar
B. Russell, A. Torralba, K. Murphy, and W. Freeman. Labelme: a database and web-based tool for image annotation. mit ai lab memo aim-2005-025, 2005.Google Scholar
L. von Ahn. Games with a purpose. IEEE Computer, 39(6):92--94, 2006. Google ScholarDigital Library

Index Terms

Recommendations

A comprehensive human computation framework: with application to image labeling
MM '08: Proceedings of the 16th ACM international conference on Multimedia

Image and video labeling is important for computers to understand images and videos and for image and video search. Manual labeling is tedious and costly. Automatically image and video labeling is yet a dream. In this paper, we adopt a Web 2.0 approach ...
Read More
Efficient computation of entropy gradient for semi-supervised conditional random fields
NAACL-Short '07: Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Companion Volume, Short Papers

Entropy regularization is a straightforward and successful method of semi-supervised learning that augments the traditional conditional likelihood objective function with an additional term that aims to minimize the predicted label entropy on unlabeled ...
Read More
Selective Weakly Supervised Human Detection under Arbitrary Poses
Abstract
In this paper we study the problem of weakly supervised human detection under arbitrary poses within the framework of multi-instance learning (MIL). Our contributions are threefold: (1) we first show that in the context of weakly supervised ...
Highlights
- We propose a novel Selective Weakly Supervised Detection method which outperforms the previous state-of-the-art methods.
- We annotate a new large-scale data set called LSP/MPII-MPHB (Multiple Poses Human Body) for human body detection.
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
HCOMP '09: Proceedings of the ACM SIGKDD Workshop on Human Computation
June 2009
87 pages
ISBN:9781605586724
DOI:10.1145/1600150
Editors:
Paul Bennett
Microsoft Research
,
Raman Chandrasekar
Microsoft Research
,
Max Chickering
Microsoft Corporation
,
Panos Ipeirotis
New York University
,
Edith Law
Carnegie Mellon University
,
Anton Mityagin
Microsoft Corporation
,
Foster Provost
New York University
,
Luis von Ahn
Carnegie Mellon University
Copyright © 2009 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 28 June 2009
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- research-article
Conference
Upcoming Conference
KDD '24

Sponsor:

sigkdd

sigkdd

The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 25 - 29, 2024

Barcelona , Spain
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 158
  Total Downloads
- Downloads (Last 12 months)2
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Efficient human computation: the distributed labeling problem

HCOMP '09: Proceedings of the ACM SIGKDD Workshop on Human Computation

ABSTRACT

References

Cited By

Index Terms

Recommendations

A comprehensive human computation framework: with application to image labeling

Efficient computation of entropy gradient for semi-supervised conditional random fields

Selective Weakly Supervised Human Detection under Arbitrary Poses