skip to main content
10.1145/2858036.2858535acmconferencesArticle/Chapter ViewAbstractPublication PageschiConference Proceedingsconference-collections
research-article
Best Paper

Empath: Understanding Topic Signals in Large-Scale Text

Published:07 May 2016Publication History

ABSTRACT

Human language is colored by a broad range of topics, but existing text analysis tools only focus on a small number of them. We present Empath, a tool that can generate and validate new lexical categories on demand from a small set of seed terms (like "bleed" and "punch" to generate the category violence). Empath draws connotations between words and phrases by deep learning a neural embedding across more than 1.8 billion words of modern fiction. Given a small set of seed words that characterize a category, Empath uses its neural embedding to discover new related terms, then validates the category with a crowd-powered filter. Empath also analyzes text across 200 built-in, pre-validated categories we have generated from common topics in our web dataset, like neglect, government, and social media. We show that Empath's data-driven, human validated categories are highly correlated (r=0.906) with similar categories in LIWC.

References

  1. Project Gutenberg. In https://www.gutenberg.org/.Google ScholarGoogle Scholar
  2. Johan Bollen, Alberto Pepe, and Huina Mao. 2009. Modeling public mood and emotion: Twitter sentiment and socio-economic phenomena. In arXiv preprint arXiv:0911.1583.Google ScholarGoogle Scholar
  3. Margaret M Bradley and Peter J Lang. 1999. Affective norms for English words (ANEW): Instruction manual and affective ratings. In Technical Report C-1, The Center for Research in Psychophysiology, University of FL.Google ScholarGoogle Scholar
  4. Nathanael Chambers and Dan Jurafsky. Unsupervised Learning of Narrative Schemas and Their Participants. In Proc. ACL 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Cristian Danescu-Niculescu-Mizil, Moritz Sudhof, Dan Jurafsky, Jure Leskovec, and Christopher Potts. A computational approach to politeness with application to social factors. In Proc of ACL 2013.Google ScholarGoogle Scholar
  6. Hannah Davis and Saif M Mohammad. 2014. Generating music from literature. In arXiv preprint arXiv:1403.2124.Google ScholarGoogle Scholar
  7. Munmun De Choudhury. You're Happy, I'm Happy: Diffusion of Mood Expression on Twitter. In Proc. of HCI Korea 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Munmun De Choudhury, Michael Gamon, Scott Counts, and Eric Horvitz. Predicting Depression via Social Media.. In Proc. ICWSM 2013.Google ScholarGoogle Scholar
  9. Thomas G Dietterich. 1998. Approximate statistical tests for comparing supervised classification learning algorithms. In Neural computation, Vol. 10. MIT Press, 1895-1923. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Andrea Esuli and Fabrizio Sebastiani. Sentiwordnet: A publicly available lexical resource for opinion mining. In Proceedings of LREC 2006.Google ScholarGoogle Scholar
  11. Ethan Fast, Will McGrath, Pranav Rajpurkar, and Michael Bernstein. Mining Human Behaviors from Fiction to Power Interactive Systems. In Proc. CHI 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Ethan Fast, Daniel Steffe, Lucy Wang, Michael Bernstein, and Joel Brandt. Emergent, Crowd-scale Programming Practice in the IDE. In Proc. CHI 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Tong Gao, Mira Dontcheva, Eytan Adar, Zhicheng Liu, and Karrie Karahalios. DataTone: Managing Ambiguity in Natural Language Interfaces for Data Visualization.Google ScholarGoogle Scholar
  14. Scott A. Golder and Michael W. Macy. 2011. Diurnal and Seasonal Mood Vary with Work, Sleep, and Daylength Across Diverse Cultures. In Science, Vol. 333. 1878-1881.Google ScholarGoogle ScholarCross RefCross Ref
  15. Vasileios Hatzivassiloglou and Kathleen R McKeown. 1997. Predicting the semantic orientation of adjectives. In Proceedings of the 35th annual meeting of the association for computational linguistics and eighth conference of the european chapter of the association for computational linguistics. Association for Computational Linguistics, 174-181. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. C. Hutto and Eric Gilbert. VADER: A Parsimonious Rule-Based Model for Sentiment Analysis of Social Media Text. In Proc. AAAI 2014.Google ScholarGoogle Scholar
  17. Sepandar D Kamvar and Jonathan Harris. 2011. We feel fine and searching the emotional web. In Proceedings of the fourth ACM international conference on Web search and data mining. ACM, 117-126. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Emre Kiciman. 2015. Towards Learning a Knowledge Base of Actions from Experiential Microblogs. In AAAI Spring Symposium.Google ScholarGoogle Scholar
  19. Suin Kim, JinYeong Bak, and Alice Haeyun Oh. 2012. Do You Feel What I Feel? Social Aspects of Emotions in Twitter Conversations.. In ICWSM.Google ScholarGoogle Scholar
  20. Adam D. I. Kramer, Jamie E. Guillory, and Jeffrey T. Hancock. 2014. Experimental evidence of massive-scale emotional contagion through social networks. In Proceedings of the National Academy of Sciences, Vol. 111. 8788-8790.Google ScholarGoogle ScholarCross RefCross Ref
  21. Ranjitha Kumar, Arvind Satyanarayan, Cesar Torres, Maxine Lim, Salman Ahmad, Scott R Klemmer, and Jerry O Talton. Webzeitgeist: design mining the web. In Proc. CHI 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. H. Liu and P. Singh. ConceptNet - A Practical Commonsense Reasoning Tool-Kit. In BT Technology Journal 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Qun Luo and Weiran Xu. Learning Word Vectors Efficiently Using Shared Representations and Document Representations. In Proc. AAAI 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S. Corrado, and Jeff Dean. Distributed Representations of Words and Phrases and their Compositionality. In Proc. NIPS 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Tomas Mikolov, Wen tau Yih, and Geoffrey Zweig. Linguistic regularities in continuous space word representations. In Proc. NAACL-HLT 2013.Google ScholarGoogle Scholar
  26. George A. Miller. WordNet: A Lexical Database for English. In In Commun. ACM 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Tanushree Mitra, C.J. Hutto, and Eric Gilbert. Comparing Person- and Process-centric Strategies for Obtaining Quality Data on Amazon Mechanical Turk. In Proc. CHI '15. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Saif M. Mohammad and Peter D. Turney. 2013. Crowdsourcing a Word-Emotion Association Lexicon. In Computational Intelligence, Vol. 29. 436-465.Google ScholarGoogle ScholarCross RefCross Ref
  29. Saif M Mohammad, Xiaodan Zhu, Svetlana Kiritchenko, and Joel Martin. 2014. Sentiment, emotion, purpose, and style in electoral tweets. In Information Processing & Management. Elsevier. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Alena Neviarouskaya, Helmut Prendinger, and Mitsuru Ishizuka. 2007. Narrowing the Social Gap among People Involved in Global Dialog: Automatic Emotion Detection in Blog Posts.. In ICWSM. Citeseer.Google ScholarGoogle Scholar
  31. Vlad Niculae, Srijan Kumar, Jordan L. Boyd-Graber, and Cristian Danescu-Niculescu-Mizil. 2015. Linguistic Harbingers of Betrayal: A Case Study on an Online Strategy Game. CoRR abs/1506.04744 (2015).Google ScholarGoogle Scholar
  32. Myle Ott, Yejin Choi, Claire Cardie, and Jeffrey T Hancock. Finding deceptive opinion spam by any stretch of the imagination. In Proc. ACL 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Bo Pang, Lillian Lee, and Shivakumar Vaithyanathan. Thumbs up : sentiment classification using machine learning techniques. Proc. ACL 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. James W Pennebaker, Martha E Francis, and Roger J Booth. Linguistic inquiry and word count: LIWC 2001. In Mahway: Lawrence Erlbaum Associates 71 2001.Google ScholarGoogle Scholar
  35. Steve Rubin and Maneesh Agrawala. Generating emotionally relevant musical scores for audio stories. In Proc. UIST 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Niloufar Salehi, Lilly C Irani, and Michael S Bernstein. We Are Dynamo: Overcoming Stalling and Friction in Collective Action for Crowd Workers. In Proc. CHI 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Phillip Shaver, Judith Schwartz, Donald Kirson, and Cary O'connor. 1987. Emotion knowledge: further exploration of a prototype approach. In Journal of personality and social psychology, Vol. 52. American Psychological Association, 1061.Google ScholarGoogle Scholar
  38. Victor S Sheng, Foster Provost, and Panagiotis G Ipeirotis. Get another label? improving data quality and data mining using multiple, noisy labelers. In Proc. SIGKDD 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Tamara A Small. What the hashtag? A content analysis of Canadian politics on Twitter. In Information, Communication & Society 2011.Google ScholarGoogle Scholar
  40. Richard Socher, Alex Perelygin, Jean Y. Wu, Jason Chuang, Christopher D. Manning, Andrew Y. Ng, and Christopher Potts. Proc. EMNLP 2013. Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank.Google ScholarGoogle Scholar
  41. Philip J Stone, Dexter C Dunphy, and Marshall S Smith. 1966. The General Inquirer: A Computer Approach to Content Analysis. MIT press.Google ScholarGoogle Scholar
  42. Yla R Tausczik and James W Pennebaker. 2010. The psychological meaning of words: LIWC and computerized text analysis methods. In Journal of language and social psychology, Vol. 29. Sage Publications, 24-54.Google ScholarGoogle Scholar
  43. Catalina L Toma, Jeffrey T Hancock, and Nicole B Ellison. 2008. Separating fact from fiction: An examination of deceptive self-presentation in online dating profiles. In Personality and Social Psychology Bulletin, Vol. 34. Sage Publications, 1023-1036.Google ScholarGoogle ScholarCross RefCross Ref
  44. Peter D. Turney. Thumbs Up or Thumbs Down?: Semantic Orientation Applied to Unsupervised Classification of Reviews. In Proc. ACL 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Froma I Zeitlin. 1996. Playing the other: gender and society in classical Greek literature. University of Chicago Press.Google ScholarGoogle Scholar

Index Terms

  1. Empath: Understanding Topic Signals in Large-Scale Text

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      CHI '16: Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems
      May 2016
      6108 pages
      ISBN:9781450333627
      DOI:10.1145/2858036

      Copyright © 2016 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 7 May 2016

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      CHI '16 Paper Acceptance Rate565of2,435submissions,23%Overall Acceptance Rate6,199of26,314submissions,24%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader