ABSTRACT
Ontology evaluation has proven to be one of the more difficult problems in ontology engineering. Researchers proposed numerous methods to evaluate logical correctness of an ontology, its structure, or coverage of a domain represented by a corpus. However, evaluating whether or not ontology assertions correspond to the real world remains a manual and time-consuming task. In this paper, we explore the feasibility of using microtask crowdsourcing through Amazon Mechanical Turk to evaluate ontologies. Specifically, we look at the task of verifying the subclass--superclass hierarchy in ontologies. We demonstrate that the performance of Amazon Mechanical Turk workers (turkers) on this task is comparable to the performance of undergraduate students in a formal study. We explore the effects of the type of the ontology on the performance of turkers and demonstrate that turkers can achieve accuracy as high as 90% on verifying hierarchy statements form common-sense ontologies such as WordNet. Finally, we compare the performance of turkers to the performance of domain experts on verifying statements from an ontology in the biomedical domain. We report on lessons learned about designing ontology-evaluation experiments on Amazon Mechanical Turk. Our results demonstrate that microtask crowdsourcing can become a scalable and efficient component in ontology-engineering workflows.
- Alexander, P. R., Nyulas, C. I., Tudorache, T., Whetzel, T., Noy, N. F., and Musen, M. A. Semantic infrastructure to enable collaboration in ontology development. In International Workshop on Semantic Technologies for Information-Integrated Collaboration (STIIC 2011) (Philadelphia, PA, USA, 2011).Google ScholarCross Ref
- Auer, S., Dietzold, S., and Riechert, T. OntoWiki--a tool for social, semantic collaboration. In Fifth International Semantic Web Conference, ISWC, vol. LNCS 4273, Springer (Athens, GA, 2006). Google ScholarDigital Library
- Bernstein, M., Little, G., Miller, R., Hartmann, B., Ackerman, M., Karger, D., Crowell, D., and Panovich, K. Soylent: a word processor with a crowd inside. In The 23d annual ACM symposium on user interface software and technology, ACM (2010), 313--322. Google ScholarDigital Library
- Cooper, S., Khatib, F., Treuille, A., Barbero, J., Lee, J., Beenen, M., Leaver-Fay, A., Baker, D., and Popovi?, Z. Predicting protein structures with a multiplayer online game. Nature 466, 7307 (2010), 756--760.Google ScholarCross Ref
- Demartini, G., Difallah, D. E., and Cudr-Mauroux, P. Zencrowd: leveraging probabilistic reasoning and crowdsourcing techniques for large-scale entity linking. In 21st World Wide Web Conference WWW2012 (Lyon, France, 2012), 469--478. Google ScholarDigital Library
- Evermann, J., and Fang, J. Evaluating ontologies: Towards a cognitive measure of quality. Information Systems 35 (2010), 391403. Google ScholarDigital Library
- Ghidini, C., Kump, B., Lindstaedt, S., Mahbub, N., Pammer, V., Rospocher, M., and Serafini, L. Moki: The enterprise modelling wiki. In European Semantic Web Conference (ESWC-2009), Springer Berlin/Heidelberg (Heraklion, Greece, 2009), 831835. Google ScholarDigital Library
- GOConsortium. Creating the Gene Ontology resource: design and implementation. Genome Res 11, 8 (2001), 1425--33.Google Scholar
- Green, P., and Rosemann, M. Integrated process modeling: An ontological evaluation. Information Systems 25, 2 (2000), 73--87. Google ScholarDigital Library
- Haendel, M., Neuhaus, F., Osumi-Sutherland, D., Mabee, P., Mejino, J., Mungall, C., and Smith, B. Carothe common anatomy reference ontology. Anatomy Ontologies for Bioinformatics (2008), 327--349.Google Scholar
- Hausenblas, M., Troncy, R., Raimond, Y., and Brger, T. Interlinking multimedia: How to apply linked data principles to multimedia fragments. In WWW 2009 Workshop: Linked Data on the Web (2009).Google Scholar
- Kittur, A., Chi, E., and Suh, B. Crowdsourcing user studies with Mechanical Turk. In 26th annual SIGCHI conference on human factors in computing systems (2008), 453--456. Google ScholarDigital Library
- Lin, C. H., Mausam, and Weld, D. S. Dynamically switching between synergistic workows for crowdsourcing. In Twenty-Sixth AAAI Conference on Artificial Intelligence (2012).Google Scholar
- Markotschi, T., and Völker, J. GuessWhat?! - Human Intelligence for Mining Linked Data. In Proceedings of the Workshop on Knowledge Injection into and Extraction from Linked Data at EKAW (2010).Google Scholar
- Mason, W., and Watts, D. Financial incentives and the "Performance of Crowds". In ACM SIGKDD workshop on human computation, ACM (2009), 77--85. Google ScholarDigital Library
- McCann, R., Shen, W., and Doan, A. Matching schemas in online communities: A Web 2.0 approach. In The 24th International Conference on Data Engineering (ICDE-08) (Cancun, Mexico, 2008). Google ScholarDigital Library
- Minder, P., Seuken, S., Bernstein, A., and Zollinger, M. Crowdmanager-combinatorial allocation and pricing of crowdsourcing tasks with time constraints. In Workshop on Social Computing and User Generated Content in conjunction with ACM Conference on Electronic Commerce (ACM-EC 2012) (Valencia, Spain, 2012), 1--18.Google Scholar
- Musen, M. A., Noy, N. F., Shah, N. H., Whetzel, P. L., Chute, C. G., Storey, M.-A., Smith, B., and team, T. N. The national center for biomedical ontology. Journal of American Medical Informatics Association 19 (2012), 190--195.Google ScholarCross Ref
- Niles, I., and Pease, A. Towards a standard upper ontology. In The 2nd International Conference on Formal Ontology in Information Systems (FOIS-2001) (Ogunquit, Maine, 2001). Google ScholarDigital Library
- Noy, N. F., Griffith, N., and Musen, M. A. Collecting community-based mappings in an ontology repository. In 7th International Semantic Web Conference (ISWC 2008) (Karlsruhe, Germany, 2008). Google ScholarDigital Library
- Noy, N. F., Mortensen, J., Alexander, P. R., and Musen, M. A. Ontology engineering through microtask crowdsourcing. Under review (2013).Google Scholar
- Quinn, A., and Bederson, B. Human computation: a survey and taxonomy of a growing field. In Annual Conference on Human Factors in Computing Systems (CHI 2011), ACM (Vancouver, BC, 2011), 1403--1412. Google ScholarDigital Library
- Raddick, M., Bracey, G., Gay, P., Lintott, C., Murray, P., Schawinski, K., Szalay, A., and Vandenberg, J. Galaxy zoo: exploring the motivations of citizen science volunteers. arXiv preprint arXiv:0909.2925 (2009).Google Scholar
- Sarasua, C., Simperl, E., and Noy, N. F. Crowdmap: Crowdsourcing ontology alignment with microtasks. In 11th International Semantic Web Conference (ISWC), Springer (Boston, MA, 2012). Google ScholarDigital Library
- Schwarz, N. Self-reports: How the questions shape the answers. American Psychologist 54, 2 (1999), 93--105.Google ScholarCross Ref
- Sebastian, A., Noy, N. F., Tudorache, T., and Musen, M. A. A generic ontology for collaborative ontology-development workflows. In 16th International Conference on Knowledge Engineering and Knowledge Management (EKAW 2008), Springer (Catania, Italy, 2008). Google ScholarDigital Library
- Simperl, E., Norton, B., and Vrandecic, D. Crowdsourcing tasks in linked data management. In 2nd workshop on consuming Linked Data COLD2011 co-located with the 10th International Semantic Web Conference ISWC 2011 (Bonn, Germany, 2011).Google Scholar
- Tanur, J. M. Questions about Questions: Inquiries Into the Coqnitive Bases of Surveys. Russell Sage Foundation Publications, 1992.Google Scholar
- Thaler, S., Siorpaes, K., and Simperl, E. SpotTheLink: A Game for Ontology Alignment. In 6th Conference for Professional Knowledge Management (2011).Google ScholarDigital Library
- Tudorache, T., Nyulas, C., Noy, N. F., and Musen, M. A. Webprotégé: A distributed ontology editor and knowledge acquisition tool for the web. Semantic Web Journal 11-165 (2011).Google Scholar
- von Ahn, L., and Dabbish, L. Labeling images with a computer game. In SIGCHI conference on Human factors in computing systems, ACM Press New York, NY, USA (2004), 319--326. Google ScholarDigital Library
- Wang, J., Ghose, A., and Ipeirotis, P. Bonus, disclosure, and choice: What motivates the creation of high-quality paid reviews? In Thirty Third International Conference on Information Systems (ICIS) (Orlando, FL, 2012).Google Scholar
- Waterhouse, T. P. Pay by the bit: an information-theoretic metric for collective human judgment. In Conference on Computer supported cooperative work (CSCW), ACM (2013), 623--638. Google ScholarDigital Library
- Whetzel, P. L., Noy, N. F., Shah, N. H., Alexander, P. R., Nyulas, C. I., Tudorache, T., and Musen, M. A. Bioportal: Enhanced functionality via new web services from the national center for biomedical ontology to access and use ontologies in software applications. Nucleic Acids Research (NAR) 39, Web Server issue (2011), W541--5.Google Scholar
- Zhdanova, A., and Shvaiko, P. Community-driven ontology matching. In 3rd European Semantic Web Conference (Budva, Montenegro, 2006), 3449. Google ScholarDigital Library
Index Terms
- Mechanical turk as an ontology engineer?: using microtasks as a component of an ontology-engineering workflow
Recommendations
How many crowdsourced workers should a requester hire?
Recent years have seen an increased interest in crowdsourcing as a way of obtaining information from a potentially large group of workers at a reduced cost. The crowdsourcing process, as we consider in this paper, is as follows: a requester hires a ...
Investigating the Amazon Mechanical Turk Market Through Tool Design
We developed TurkBench to better understand the work of crowdworkers on the Amazon Mechanical Turk (AMT) marketplace. While we aimed to reduce the amount of invisible, unpaid work that these crowdworkers performed, we also probed the day-to-day ...
A Data-Driven Analysis of Workers' Earnings on Amazon Mechanical Turk
CHI '18: Proceedings of the 2018 CHI Conference on Human Factors in Computing SystemsA growing number of people are working as part of on-line crowd work. Crowd work is often thought to be low wage work. However, we know little about the wage distribution in practice and what causes low/high earnings in this setting. We recorded 2,676 ...
Comments