research-article

Empath: Understanding Topic Signals in Large-Scale Text

Authors:
Ethan Fast

Stanford University, Stanford, CA, USA

Stanford University, Stanford, CA, USA
View Profile

,
Binbin Chen

Stanford University, Stanford, CA, USA

Stanford University, Stanford, CA, USA
View Profile

,
Michael S. Bernstein

Stanford University, Palo Alto, CA, USA

Stanford University, Palo Alto, CA, USA
View Profile

CHI '16: Proceedings of the 2016 CHI Conference on Human Factors in Computing SystemsMay 2016Pages 4647–4657https://doi.org/10.1145/2858036.2858535

Published:07 May 2016Publication History

CHI '16: Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems

Pages 4647–4657

ABSTRACT

Human language is colored by a broad range of topics, but existing text analysis tools only focus on a small number of them. We present Empath, a tool that can generate and validate new lexical categories on demand from a small set of seed terms (like "bleed" and "punch" to generate the category violence). Empath draws connotations between words and phrases by deep learning a neural embedding across more than 1.8 billion words of modern fiction. Given a small set of seed words that characterize a category, Empath uses its neural embedding to discover new related terms, then validates the category with a crowd-powered filter. Empath also analyzes text across 200 built-in, pre-validated categories we have generated from common topics in our web dataset, like neglect, government, and social media. We show that Empath's data-driven, human validated categories are highly correlated (r=0.906) with similar categories in LIWC.

References

Project Gutenberg. In https://www.gutenberg.org/.Google Scholar
Johan Bollen, Alberto Pepe, and Huina Mao. 2009. Modeling public mood and emotion: Twitter sentiment and socio-economic phenomena. In arXiv preprint arXiv:0911.1583.Google Scholar
Margaret M Bradley and Peter J Lang. 1999. Affective norms for English words (ANEW): Instruction manual and affective ratings. In Technical Report C-1, The Center for Research in Psychophysiology, University of FL.Google Scholar
Nathanael Chambers and Dan Jurafsky. Unsupervised Learning of Narrative Schemas and Their Participants. In Proc. ACL 2009. Google ScholarDigital Library
Cristian Danescu-Niculescu-Mizil, Moritz Sudhof, Dan Jurafsky, Jure Leskovec, and Christopher Potts. A computational approach to politeness with application to social factors. In Proc of ACL 2013.Google Scholar
Hannah Davis and Saif M Mohammad. 2014. Generating music from literature. In arXiv preprint arXiv:1403.2124.Google Scholar
Munmun De Choudhury. You're Happy, I'm Happy: Diffusion of Mood Expression on Twitter. In Proc. of HCI Korea 2014. Google ScholarDigital Library
Munmun De Choudhury, Michael Gamon, Scott Counts, and Eric Horvitz. Predicting Depression via Social Media.. In Proc. ICWSM 2013.Google Scholar
Thomas G Dietterich. 1998. Approximate statistical tests for comparing supervised classification learning algorithms. In Neural computation, Vol. 10. MIT Press, 1895-1923. Google ScholarDigital Library
Andrea Esuli and Fabrizio Sebastiani. Sentiwordnet: A publicly available lexical resource for opinion mining. In Proceedings of LREC 2006.Google Scholar
Ethan Fast, Will McGrath, Pranav Rajpurkar, and Michael Bernstein. Mining Human Behaviors from Fiction to Power Interactive Systems. In Proc. CHI 2016. Google ScholarDigital Library
Ethan Fast, Daniel Steffe, Lucy Wang, Michael Bernstein, and Joel Brandt. Emergent, Crowd-scale Programming Practice in the IDE. In Proc. CHI 2014. Google ScholarDigital Library
Tong Gao, Mira Dontcheva, Eytan Adar, Zhicheng Liu, and Karrie Karahalios. DataTone: Managing Ambiguity in Natural Language Interfaces for Data Visualization.Google Scholar
Scott A. Golder and Michael W. Macy. 2011. Diurnal and Seasonal Mood Vary with Work, Sleep, and Daylength Across Diverse Cultures. In Science, Vol. 333. 1878-1881.Google ScholarCross Ref
Vasileios Hatzivassiloglou and Kathleen R McKeown. 1997. Predicting the semantic orientation of adjectives. In Proceedings of the 35th annual meeting of the association for computational linguistics and eighth conference of the european chapter of the association for computational linguistics. Association for Computational Linguistics, 174-181. Google ScholarDigital Library
C. Hutto and Eric Gilbert. VADER: A Parsimonious Rule-Based Model for Sentiment Analysis of Social Media Text. In Proc. AAAI 2014.Google Scholar
Sepandar D Kamvar and Jonathan Harris. 2011. We feel fine and searching the emotional web. In Proceedings of the fourth ACM international conference on Web search and data mining. ACM, 117-126. Google ScholarDigital Library
Emre Kiciman. 2015. Towards Learning a Knowledge Base of Actions from Experiential Microblogs. In AAAI Spring Symposium.Google Scholar
Suin Kim, JinYeong Bak, and Alice Haeyun Oh. 2012. Do You Feel What I Feel? Social Aspects of Emotions in Twitter Conversations.. In ICWSM.Google Scholar
Adam D. I. Kramer, Jamie E. Guillory, and Jeffrey T. Hancock. 2014. Experimental evidence of massive-scale emotional contagion through social networks. In Proceedings of the National Academy of Sciences, Vol. 111. 8788-8790.Google ScholarCross Ref
Ranjitha Kumar, Arvind Satyanarayan, Cesar Torres, Maxine Lim, Salman Ahmad, Scott R Klemmer, and Jerry O Talton. Webzeitgeist: design mining the web. In Proc. CHI 2013. Google ScholarDigital Library
H. Liu and P. Singh. ConceptNet - A Practical Commonsense Reasoning Tool-Kit. In BT Technology Journal 2004. Google ScholarDigital Library
Qun Luo and Weiran Xu. Learning Word Vectors Efficiently Using Shared Representations and Document Representations. In Proc. AAAI 2015. Google ScholarDigital Library
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S. Corrado, and Jeff Dean. Distributed Representations of Words and Phrases and their Compositionality. In Proc. NIPS 2013.Google ScholarDigital Library
Tomas Mikolov, Wen tau Yih, and Geoffrey Zweig. Linguistic regularities in continuous space word representations. In Proc. NAACL-HLT 2013.Google Scholar
George A. Miller. WordNet: A Lexical Database for English. In In Commun. ACM 1995. Google ScholarDigital Library
Tanushree Mitra, C.J. Hutto, and Eric Gilbert. Comparing Person- and Process-centric Strategies for Obtaining Quality Data on Amazon Mechanical Turk. In Proc. CHI '15. Google ScholarDigital Library
Saif M. Mohammad and Peter D. Turney. 2013. Crowdsourcing a Word-Emotion Association Lexicon. In Computational Intelligence, Vol. 29. 436-465.Google ScholarCross Ref
Saif M Mohammad, Xiaodan Zhu, Svetlana Kiritchenko, and Joel Martin. 2014. Sentiment, emotion, purpose, and style in electoral tweets. In Information Processing & Management. Elsevier. Google ScholarDigital Library
Alena Neviarouskaya, Helmut Prendinger, and Mitsuru Ishizuka. 2007. Narrowing the Social Gap among People Involved in Global Dialog: Automatic Emotion Detection in Blog Posts.. In ICWSM. Citeseer.Google Scholar
Vlad Niculae, Srijan Kumar, Jordan L. Boyd-Graber, and Cristian Danescu-Niculescu-Mizil. 2015. Linguistic Harbingers of Betrayal: A Case Study on an Online Strategy Game. CoRR abs/1506.04744 (2015).Google Scholar
Myle Ott, Yejin Choi, Claire Cardie, and Jeffrey T Hancock. Finding deceptive opinion spam by any stretch of the imagination. In Proc. ACL 2011. Google ScholarDigital Library
Bo Pang, Lillian Lee, and Shivakumar Vaithyanathan. Thumbs up : sentiment classification using machine learning techniques. Proc. ACL 2002. Google ScholarDigital Library
James W Pennebaker, Martha E Francis, and Roger J Booth. Linguistic inquiry and word count: LIWC 2001. In Mahway: Lawrence Erlbaum Associates 71 2001.Google Scholar
Steve Rubin and Maneesh Agrawala. Generating emotionally relevant musical scores for audio stories. In Proc. UIST 2014. Google ScholarDigital Library
Niloufar Salehi, Lilly C Irani, and Michael S Bernstein. We Are Dynamo: Overcoming Stalling and Friction in Collective Action for Crowd Workers. In Proc. CHI 2015. Google ScholarDigital Library
Phillip Shaver, Judith Schwartz, Donald Kirson, and Cary O'connor. 1987. Emotion knowledge: further exploration of a prototype approach. In Journal of personality and social psychology, Vol. 52. American Psychological Association, 1061.Google Scholar
Victor S Sheng, Foster Provost, and Panagiotis G Ipeirotis. Get another label? improving data quality and data mining using multiple, noisy labelers. In Proc. SIGKDD 2008. Google ScholarDigital Library
Tamara A Small. What the hashtag? A content analysis of Canadian politics on Twitter. In Information, Communication & Society 2011.Google Scholar
Richard Socher, Alex Perelygin, Jean Y. Wu, Jason Chuang, Christopher D. Manning, Andrew Y. Ng, and Christopher Potts. Proc. EMNLP 2013. Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank.Google Scholar
Philip J Stone, Dexter C Dunphy, and Marshall S Smith. 1966. The General Inquirer: A Computer Approach to Content Analysis. MIT press.Google Scholar
Yla R Tausczik and James W Pennebaker. 2010. The psychological meaning of words: LIWC and computerized text analysis methods. In Journal of language and social psychology, Vol. 29. Sage Publications, 24-54.Google Scholar
Catalina L Toma, Jeffrey T Hancock, and Nicole B Ellison. 2008. Separating fact from fiction: An examination of deceptive self-presentation in online dating profiles. In Personality and Social Psychology Bulletin, Vol. 34. Sage Publications, 1023-1036.Google ScholarCross Ref
Peter D. Turney. Thumbs Up or Thumbs Down?: Semantic Orientation Applied to Unsupervised Classification of Reviews. In Proc. ACL 2002. Google ScholarDigital Library
Froma I Zeitlin. 1996. Playing the other: gender and society in classical Greek literature. University of Chicago Press.Google Scholar

Index Terms

Empath: Understanding Topic Signals in Large-Scale Text
1. Human-centered computing
  1. Human computer interaction (HCI)

Recommendations

Sentic medoids: organizing affective common sense knowledge in a multi-dimensional vector space
ISNN'11: Proceedings of the 8th international conference on Advances in neural networks - Volume Part III

Existing approaches to opinion mining and sentiment analysis mainly rely on parts of text in which opinions and sentiments are explicitly expressed such as polarity terms and affect words. However, opinions and sentiments are often conveyed implicitly ...
Read More
Disambiguation of medline abstracts using topic models
DTMBIO '11: Proceedings of the ACM fifth international workshop on Data and text mining in biomedical informatics

Topic models are an established technique for generating information about the subjects discussed in collections of documents. Latent Dirichlet Allocation (LDA) is a widely applied topic model. The topic models generated by LDA consist of sets of terms ...
Read More
Unsupervised method for sentiment analysis in online texts

Method to predict sentiment in informal texts using unsupervised dependency parsing.Algorithm based on sentiment propagation using linguistic content without training.Method to create lexicon using polarity expansion algorithm for specific domains.Our ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CHI '16: Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems
May 2016
6108 pages
ISBN:9781450333627
DOI:10.1145/2858036
General Chairs:
Jofish Kaye
Yahoo
,
Allison Druin
University of Maryland / National Park Service
,
Program Chairs:
Cliff Lampe
University of Michigan
,
Dan Morris
Microsoft
,
Juan Pablo Hourcade
University of Iowa
Copyright © 2016 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 7 May 2016
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Badges
- Best Paper
Author Tags
NLP
computational social science
fiction
social computing
Qualifiers
- research-article
Conference

Acceptance Rates
CHI '16 Paper Acceptance Rate565of2,435submissions,23%Overall Acceptance Rate6,199of26,314submissions,24%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 198
  Total Citations
  View Citations
- 1,747
  Total Downloads
- Downloads (Last 12 months)201
- Downloads (Last 6 weeks)19
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Empath: Understanding Topic Signals in Large-Scale Text

CHI '16: Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems

ABSTRACT

References

Cited By

Index Terms

Recommendations

Sentic medoids: organizing affective common sense knowledge in a multi-dimensional vector space

Disambiguation of medline abstracts using topic models

Unsupervised method for sentiment analysis in online texts