ABSTRACT
In developing an Information Extraction (IE) system for a new class of events or relations, one of the major tasks is identifying the many ways in which these events or relations may be expressed in text. This has generally involved the manual analysis and, in some cases, the annotation of large quantities of text involving these events. This paper presents an alternative approach, based on an automatic discovery procedure, EXDISCO, which identifies a set of relevant documents and a set of event patterns from un-annotaled text, starting from a small set of "seed patterns." We evaluate EXDISCO by comparing the performance of discovered patterns against that of manually constructed systems on actual extraction tasks.
- David Fisher, Stephen Soderland, Joseph MeCarthy, Fangfang Feng, and Wendy Lehnert. 1995. Description of the UMass system as used for MUC-6. In Proc. Sixth Message. Understanding Conf. (MUC-6), Columbia, MD, November. Morgan Kaufmann. Google ScholarDigital Library
- Ralph Grishman. 1995. The NYU system for MUC-6, or where's the syntax? In Proc. Sixth Message Understanding Conf. (MUC-6), pages 167--176, Columbia, MD, November. Morgan Kaufmann. Google ScholarDigital Library
- W. Lehnert, C. Cardie, D. Fisher, J. McCarthy, E. Riloff, and S. Soderland. 1992. University of massachusetts: MUC-4 test results and analysis. In Proc. Fourth Message Understanding Conf., McLean, VA, June. Morgan Kaufmann. Google ScholarDigital Library
- Scott Miller, Michael Crystal, Heidi Fox, Lance Ramshaw, Richard Schwartz, Rebecca Stone, Ralph Weischedel, and the Annotation Group. 1998. Algorithms that learn to extract information; BBN: Description of the SIFT system as used for MUC-7. In Proc. 7th Message Understanding Conf., Fairfax, VA.Google Scholar
- 1993. Proceedings of the Fifth Message Understanding Conference (MUC-5), Baltimore, MD, August. Morgan Kaufmann.Google Scholar
- 1995. Proceedings of the Sixth Message Understanding Conference (MUC-6), Columbia, MD, November. Morgan Kaufmann.Google Scholar
- Ellen Riloff and Rosie Jones. 1999. Learning dictionaries for information extraction by multi-level bootstrapping. In Proc. 16th Nat'l Conference on Artificial Intelligence (AAAI-99), Orlando, Florida. Google ScholarDigital Library
- Ellen Riloff. 1996. Automatically generating extraction patterns from untagged text. In Proc. 13th Nat'l Conf. on Artificial Intelligence (AAAI-96). The AAAI Press/MIT Press. Google ScholarDigital Library
- Pasi Tapanainen and Timo Järvinen. 1997. A non-projective dependency parser. In Proc. 5th Conf. on Applied Natural Language Processing, pages 64--71, Washington, D. C. ACL. Google ScholarDigital Library
- Roman Yangarber and Ralph Grishman. 1997. Customization of information extraction systems. In Paola Velardi, editor, Int'l Workshop on Lexically Driven Information Extraction, Frascati, Italy. Università di Roma.Google Scholar
- Roman Yangarber and Ralph Grishman. 1998. NYU: Description of the Proteus/PET system as used for MUC-7 ST. In 7th Message Understanding Conference, Columbia, MD.Google Scholar
- Roman Yangarber, Ralph Grishman, Pasi Tapanainen, and Silja Huttunen. 2000. Unsupervised discovery of scenario-level patterns for information extraction. In Proc. Conf. on Applied Natural Language Processing (ANLP-NAACL), Seattle, WA. Google ScholarDigital Library
- Automatic acquisition of domain knowledge for Information Extraction
Recommendations
Learning domain-specific information extraction patterns from the Web
IEBeyondDoc '06: Proceedings of the Workshop on Information Extraction Beyond The DocumentMany information extraction (IE) systems rely on manually annotated training data to learn patterns or rules for extracting information about events. Manually annotating data is expensive, however, and a new data set must be annotated for each domain. ...
Acquisition of Linguistic Patterns for Knowledge-Based Information Extraction
This paper presents an automatic acquisition of linguistic patterns that can be used for knowledge-based information extraction from texts. In knowledge-based approach to information extraction, linguistic patterns play a central role in the recognition ...
Automatic pattern acquisition for Japanese information extraction
HLT '01: Proceedings of the first international conference on Human language technology researchOne of the central issues for information extraction is the cost of customization from one scenario to another. Research on the automated acquisition of patterns is important for portability and scalability. In this paper, we introduce Tree-Based ...
Comments