ABSTRACT
Information Extraction (IE) systems are commonly based on pattern matching. Adapting an IE system to a new scenario entails the construction of a new pattern base---a time-consuming and expensive process. We have implemented a system for finding patterns automatically from un-annotated text. Starting with a small initial set of seed patterns proposed by the user, the system applies an incremental discovery procedure to identify new patterns. We present experiments with evaluations which show that the resulting patterns exhibit high precision and recall.
- Ido Dagan, Shaul Marcus, and Shaul Markovitch. 1993. Contextual word similarity and estimation from sparse data. In Proceedings of the 31st Annual Meeting of the Assn. for Computational Linguistics, pages 31--37, Columbus, OH, June. Google ScholarDigital Library
- David Fisher, Stephen Soderland, Joseph McCarthy, Fangfang Feng, and Wendy Lehnert. 1995. Description of the UMass system as used for MUC-6. In Proc. Sixth Message Understanding Conf. (MUC-6), Columbia, MD, November. Morgan Kaufmann. Google ScholarDigital Library
- R. Grishman, L. Hirschman, and N.T. Nhan. 1986. Discovery procedures for sublanguage selectional patterns: Initial experiments. Computational Linguistics, 12(3):205--16. Google ScholarDigital Library
- Lynette Hirschman, Ralph Grishman, and Naomi Sager. 1975. Grammatically-based automatic word class formation. Information Processing and Management, 11(1/2):39--57.Google ScholarCross Ref
- Timo Järvinen and Pasi Tapanainen. 1997. A dependency parser for English. Technical Report TR-1, Department of General Linguistics, University of Helsinki, Finland, February.Google Scholar
- Martin Kay and Martin Röscheisen. 1993. Text-translation alignment. Computational Linguistics, 19(1). Google ScholarDigital Library
- W. Lehnert, C. Cardie, D. Fisher, J. McCarthy, E. Riloff, and S. Soderland. 1992. University of massachusetts: MUC-4 test results and analysis. In Proc. Fourth Message Understanding Conf., McLean, VA, June. Morgan Kaufmann. Google ScholarDigital Library
- Scott Miller, Michael Crystal, Heidi Fox, Lance Ramshaw, Richard Schwartz, Rebecca Stone, Ralph Weischedel, and the Annotation Group. 1998. Algorithms that learn to extract information; BBN: Description of the SIFT system as used for MUC-7. In Proc. of the Seventh Message Understanding Conference, Fairfax, VA.Google Scholar
- 1993. Proceedings of the Fifth Message Understanding Conference (MUC-5), Baltimore, MD, August. Morgan Kaufmann.Google Scholar
- 1995. Proceedings of the Sixth Message Understanding Conference (MUC-6), Columbia, MD, November. Morgan Kaufmann.Google Scholar
- Johanna Nichols. 1978. Secondary predicates. Proceedings of the 4th Annual Meeting of Berkeley Linguistics Society, pages 114--127.Google ScholarCross Ref
- Maria Teresa Pazienza, editor. 1997. Information Extraction. Springer-Verlag, Lecture Notes in Artificial Intelligence, Rome.Google Scholar
- Fernando Pereira, Naftali Tishby, and Lillian Lee. 1993. Distributional clustering of English words. In Proceedings of the 31st Annual Meeting of the Assn. for Computational Linguistics, pages 183--190, Columbus, OH, June. Google ScholarDigital Library
- Ellen Riloff and Rosie Jones. 1999. Learning dictionaries for information extraction by multi-level bootstrapping. In Proceedings of Sixteenth National Conference on Artificial Intelligence (AAAI-99), Orlando, Florida. Google ScholarDigital Library
- Ellen Riloff. 1993. Automatically constructing a dictionary for information extraction tasks. In Proceedings of Eleventh National Conference on Artificial Intelligence (AAAI-93), pages 811--816. The AAAI Press/MIT Press.Google ScholarDigital Library
- Ellen Riloff. 1996. Automatically generating extraction patterns from untagged text. In Proceedings of Thirteenth National Conference on Artificial Intelligence (AAAI-96), pages 1044--1049. The AAAI Press/MIT Press. Google ScholarDigital Library
- Pasi Tapanainen and Timo Jävinen. 1997. A non-projective dependency parser. In Proceedings of the 5th Conference on Applied Natural Language Processing, pages 64--71, Washington, D.C., April. ACL. Google ScholarDigital Library
- Roman Yangarber and Ralph Grishman. 1997. Customization of information extraction systems. In Paola Velardi, editor, International Workshop on Lexically Driven Information Extraction, pages 1--11, Frascati, Italy, July. Università di Roma.Google Scholar
- Unsupervised discovery of scenario-level patterns for Information Extraction
Recommendations
Learning domain-specific information extraction patterns from the Web
IEBeyondDoc '06: Proceedings of the Workshop on Information Extraction Beyond The DocumentMany information extraction (IE) systems rely on manually annotated training data to learn patterns or rules for extracting information about events. Manually annotating data is expensive, however, and a new data set must be annotated for each domain. ...
Unsupervised event extraction from biomedical literature using co-occurrence information and basic patterns
IJCNLP'04: Proceedings of the First international joint conference on Natural Language ProcessingIn this paper, we propose a new unsupervised method of extracting events from biomedical literature, which uses the score measures of events and patterns having reciprocal effects on each other. We, first, generate candidate events by performing ...
Comments