ABSTRACT
Human language is colored by a broad range of topics, but existing text analysis tools only focus on a small number of them. We present Empath, a tool that can generate and validate new lexical categories on demand from a small set of seed terms (like "bleed" and "punch" to generate the category violence). Empath draws connotations between words and phrases by deep learning a neural embedding across more than 1.8 billion words of modern fiction. Given a small set of seed words that characterize a category, Empath uses its neural embedding to discover new related terms, then validates the category with a crowd-powered filter. Empath also analyzes text across 200 built-in, pre-validated categories we have generated from common topics in our web dataset, like neglect, government, and social media. We show that Empath's data-driven, human validated categories are highly correlated (r=0.906) with similar categories in LIWC.
- Project Gutenberg. In https://www.gutenberg.org/.Google Scholar
- Johan Bollen, Alberto Pepe, and Huina Mao. 2009. Modeling public mood and emotion: Twitter sentiment and socio-economic phenomena. In arXiv preprint arXiv:0911.1583.Google Scholar
- Margaret M Bradley and Peter J Lang. 1999. Affective norms for English words (ANEW): Instruction manual and affective ratings. In Technical Report C-1, The Center for Research in Psychophysiology, University of FL.Google Scholar
- Nathanael Chambers and Dan Jurafsky. Unsupervised Learning of Narrative Schemas and Their Participants. In Proc. ACL 2009. Google ScholarDigital Library
- Cristian Danescu-Niculescu-Mizil, Moritz Sudhof, Dan Jurafsky, Jure Leskovec, and Christopher Potts. A computational approach to politeness with application to social factors. In Proc of ACL 2013.Google Scholar
- Hannah Davis and Saif M Mohammad. 2014. Generating music from literature. In arXiv preprint arXiv:1403.2124.Google Scholar
- Munmun De Choudhury. You're Happy, I'm Happy: Diffusion of Mood Expression on Twitter. In Proc. of HCI Korea 2014. Google ScholarDigital Library
- Munmun De Choudhury, Michael Gamon, Scott Counts, and Eric Horvitz. Predicting Depression via Social Media.. In Proc. ICWSM 2013.Google Scholar
- Thomas G Dietterich. 1998. Approximate statistical tests for comparing supervised classification learning algorithms. In Neural computation, Vol. 10. MIT Press, 1895-1923. Google ScholarDigital Library
- Andrea Esuli and Fabrizio Sebastiani. Sentiwordnet: A publicly available lexical resource for opinion mining. In Proceedings of LREC 2006.Google Scholar
- Ethan Fast, Will McGrath, Pranav Rajpurkar, and Michael Bernstein. Mining Human Behaviors from Fiction to Power Interactive Systems. In Proc. CHI 2016. Google ScholarDigital Library
- Ethan Fast, Daniel Steffe, Lucy Wang, Michael Bernstein, and Joel Brandt. Emergent, Crowd-scale Programming Practice in the IDE. In Proc. CHI 2014. Google ScholarDigital Library
- Tong Gao, Mira Dontcheva, Eytan Adar, Zhicheng Liu, and Karrie Karahalios. DataTone: Managing Ambiguity in Natural Language Interfaces for Data Visualization.Google Scholar
- Scott A. Golder and Michael W. Macy. 2011. Diurnal and Seasonal Mood Vary with Work, Sleep, and Daylength Across Diverse Cultures. In Science, Vol. 333. 1878-1881.Google ScholarCross Ref
- Vasileios Hatzivassiloglou and Kathleen R McKeown. 1997. Predicting the semantic orientation of adjectives. In Proceedings of the 35th annual meeting of the association for computational linguistics and eighth conference of the european chapter of the association for computational linguistics. Association for Computational Linguistics, 174-181. Google ScholarDigital Library
- C. Hutto and Eric Gilbert. VADER: A Parsimonious Rule-Based Model for Sentiment Analysis of Social Media Text. In Proc. AAAI 2014.Google Scholar
- Sepandar D Kamvar and Jonathan Harris. 2011. We feel fine and searching the emotional web. In Proceedings of the fourth ACM international conference on Web search and data mining. ACM, 117-126. Google ScholarDigital Library
- Emre Kiciman. 2015. Towards Learning a Knowledge Base of Actions from Experiential Microblogs. In AAAI Spring Symposium.Google Scholar
- Suin Kim, JinYeong Bak, and Alice Haeyun Oh. 2012. Do You Feel What I Feel? Social Aspects of Emotions in Twitter Conversations.. In ICWSM.Google Scholar
- Adam D. I. Kramer, Jamie E. Guillory, and Jeffrey T. Hancock. 2014. Experimental evidence of massive-scale emotional contagion through social networks. In Proceedings of the National Academy of Sciences, Vol. 111. 8788-8790.Google ScholarCross Ref
- Ranjitha Kumar, Arvind Satyanarayan, Cesar Torres, Maxine Lim, Salman Ahmad, Scott R Klemmer, and Jerry O Talton. Webzeitgeist: design mining the web. In Proc. CHI 2013. Google ScholarDigital Library
- H. Liu and P. Singh. ConceptNet - A Practical Commonsense Reasoning Tool-Kit. In BT Technology Journal 2004. Google ScholarDigital Library
- Qun Luo and Weiran Xu. Learning Word Vectors Efficiently Using Shared Representations and Document Representations. In Proc. AAAI 2015. Google ScholarDigital Library
- Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S. Corrado, and Jeff Dean. Distributed Representations of Words and Phrases and their Compositionality. In Proc. NIPS 2013.Google ScholarDigital Library
- Tomas Mikolov, Wen tau Yih, and Geoffrey Zweig. Linguistic regularities in continuous space word representations. In Proc. NAACL-HLT 2013.Google Scholar
- George A. Miller. WordNet: A Lexical Database for English. In In Commun. ACM 1995. Google ScholarDigital Library
- Tanushree Mitra, C.J. Hutto, and Eric Gilbert. Comparing Person- and Process-centric Strategies for Obtaining Quality Data on Amazon Mechanical Turk. In Proc. CHI '15. Google ScholarDigital Library
- Saif M. Mohammad and Peter D. Turney. 2013. Crowdsourcing a Word-Emotion Association Lexicon. In Computational Intelligence, Vol. 29. 436-465.Google ScholarCross Ref
- Saif M Mohammad, Xiaodan Zhu, Svetlana Kiritchenko, and Joel Martin. 2014. Sentiment, emotion, purpose, and style in electoral tweets. In Information Processing & Management. Elsevier. Google ScholarDigital Library
- Alena Neviarouskaya, Helmut Prendinger, and Mitsuru Ishizuka. 2007. Narrowing the Social Gap among People Involved in Global Dialog: Automatic Emotion Detection in Blog Posts.. In ICWSM. Citeseer.Google Scholar
- Vlad Niculae, Srijan Kumar, Jordan L. Boyd-Graber, and Cristian Danescu-Niculescu-Mizil. 2015. Linguistic Harbingers of Betrayal: A Case Study on an Online Strategy Game. CoRR abs/1506.04744 (2015).Google Scholar
- Myle Ott, Yejin Choi, Claire Cardie, and Jeffrey T Hancock. Finding deceptive opinion spam by any stretch of the imagination. In Proc. ACL 2011. Google ScholarDigital Library
- Bo Pang, Lillian Lee, and Shivakumar Vaithyanathan. Thumbs up : sentiment classification using machine learning techniques. Proc. ACL 2002. Google ScholarDigital Library
- James W Pennebaker, Martha E Francis, and Roger J Booth. Linguistic inquiry and word count: LIWC 2001. In Mahway: Lawrence Erlbaum Associates 71 2001.Google Scholar
- Steve Rubin and Maneesh Agrawala. Generating emotionally relevant musical scores for audio stories. In Proc. UIST 2014. Google ScholarDigital Library
- Niloufar Salehi, Lilly C Irani, and Michael S Bernstein. We Are Dynamo: Overcoming Stalling and Friction in Collective Action for Crowd Workers. In Proc. CHI 2015. Google ScholarDigital Library
- Phillip Shaver, Judith Schwartz, Donald Kirson, and Cary O'connor. 1987. Emotion knowledge: further exploration of a prototype approach. In Journal of personality and social psychology, Vol. 52. American Psychological Association, 1061.Google Scholar
- Victor S Sheng, Foster Provost, and Panagiotis G Ipeirotis. Get another label? improving data quality and data mining using multiple, noisy labelers. In Proc. SIGKDD 2008. Google ScholarDigital Library
- Tamara A Small. What the hashtag? A content analysis of Canadian politics on Twitter. In Information, Communication & Society 2011.Google Scholar
- Richard Socher, Alex Perelygin, Jean Y. Wu, Jason Chuang, Christopher D. Manning, Andrew Y. Ng, and Christopher Potts. Proc. EMNLP 2013. Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank.Google Scholar
- Philip J Stone, Dexter C Dunphy, and Marshall S Smith. 1966. The General Inquirer: A Computer Approach to Content Analysis. MIT press.Google Scholar
- Yla R Tausczik and James W Pennebaker. 2010. The psychological meaning of words: LIWC and computerized text analysis methods. In Journal of language and social psychology, Vol. 29. Sage Publications, 24-54.Google Scholar
- Catalina L Toma, Jeffrey T Hancock, and Nicole B Ellison. 2008. Separating fact from fiction: An examination of deceptive self-presentation in online dating profiles. In Personality and Social Psychology Bulletin, Vol. 34. Sage Publications, 1023-1036.Google ScholarCross Ref
- Peter D. Turney. Thumbs Up or Thumbs Down?: Semantic Orientation Applied to Unsupervised Classification of Reviews. In Proc. ACL 2002. Google ScholarDigital Library
- Froma I Zeitlin. 1996. Playing the other: gender and society in classical Greek literature. University of Chicago Press.Google Scholar
Index Terms
- Empath: Understanding Topic Signals in Large-Scale Text
Recommendations
Sentic medoids: organizing affective common sense knowledge in a multi-dimensional vector space
ISNN'11: Proceedings of the 8th international conference on Advances in neural networks - Volume Part IIIExisting approaches to opinion mining and sentiment analysis mainly rely on parts of text in which opinions and sentiments are explicitly expressed such as polarity terms and affect words. However, opinions and sentiments are often conveyed implicitly ...
Disambiguation of medline abstracts using topic models
DTMBIO '11: Proceedings of the ACM fifth international workshop on Data and text mining in biomedical informaticsTopic models are an established technique for generating information about the subjects discussed in collections of documents. Latent Dirichlet Allocation (LDA) is a widely applied topic model. The topic models generated by LDA consist of sets of terms ...
Unsupervised method for sentiment analysis in online texts
Method to predict sentiment in informal texts using unsupervised dependency parsing.Algorithm based on sentiment propagation using linguistic content without training.Method to create lexicon using polarity expansion algorithm for specific domains.Our ...
Comments