Abstract
With the continuous digitisation of medical knowledge, information extraction tools become more and more important for practitioners of the medical domain. In this paper we tackle semantic relationships extraction from medical texts. We focus on the relations that may occur between diseases and treatments. We propose an approach relying on two different techniques to extract the target relations: (i) relation patterns based on human expertise and (ii) machine learning based on SVM classification. The presented approach takes advantage of the two techniques, relying more on manual patterns when few relation examples are available and more on feature values when a sufficient number of examples are available. Our approach obtains an overall 94.07% F-measure for the extraction of cure, prevent and side effect relations.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Hearst, M.: Automatic acquisition of hyponyms from large text corpora. In: Proceedings of the 14th International Conference on Computational Linguistics (COLING 1992), pp. 539–545 (1992)
Agichtein, E., Gravano, L.: Snowball: Extracting relations from large plain-text collections. In: Proceedings of the 5th ACM International Conference on Digital Libraries, pp. 85–94 (2000)
Fleischman, M., Hovy, E., Echihabi, A.: Offline strategies for online question answering: Answering questions before they are asked. In: Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics, Sapporo, Japan, pp. 1–7. Association for Computational Linguistics (2003)
Rindflesch, T.C., Bean, C.A., Sneiderman, C.A.: Argument identification for arterial branching predications asserted in cardiac catheterization reports. In: AMIA Annu Symp Proc., pp. 704–708 (2000)
Blaschke, C., Andrade, M.A., Ouzounis, C., Valencia, A.: Automatic extraction of biological information from scientific text: protein-protein interactions. In: ISMB 1999, pp. 60–67 (1999)
Zweigenbaum, P.: Question answering in biomedicine. In: de Rijke, M., Webber, B. (eds.) Proceedings Workshop on Natural Language Processing for Question Answering, EACL 2003, Budapest, pp. 1–4. ACL (2003)
Shadow, G., MacDonald, C.: Extracting structured information from free text pathology reports. In: AMIA Annu Symp Proc., Washington, DC (2003)
Embarek, M., Ferret, O.: Learning patterns for building resources about semantic relations in the medical domain. In: LREC 2008 (May 2008)
Hindle, D.: Noun classification from predicate argument structures. In: Proc. 28th Annual Meeting of the Association for Computational Linguistics (ACL 1990), Berkeley, USA (1990)
Wang, T., Li, Y., Bontcheva, K., Cunningham, H., Wang, J.: Automatic extraction of hierarchical relations from text. In: Sure, Y., Domingue, J. (eds.) ESWC 2006. LNCS, vol. 4011, pp. 215–229. Springer, Heidelberg (2006)
Suchanek, F.M., Ifrim, G., Weikum, G.: Combining linguistic and statistical analysis to extract relations from Web documents. In: KDD 2006: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (April 2006)
Zhou, G., Su, J., Zhang, J., Zhang, M.: Combining various knowledge in relation extraction. In: Proceedings of the 43th Annual Meeting of the Association for Computational Linguistics (2005)
Stapley, B., Benoit, G.: Biobibliometrics: information retrieval and visualization from co-occurrences of gene names in MEDLINE abstracts. In: Proceedings of the Pacific Symposium on Biocomputing, Hawaii, USA, pp. 529–540 (2000)
Cimino, J.J., Barnett, G.O.: Automatic knowledge acquisition from MEDLINE. Methods Inf. Med. 32(2), 120–130 (1993)
Khoo, C.S.G., Chan, S., Niu, Y.: Extracting causal knowledge from a medical database using graphical patterns. In: Proc. 38th Annual Meeting of the Association for Computational Linguistics (ACL 2000), pp. 336–343 (2000)
Schneider, G., Kaljurand, K., Rinaldi, F.: Detecting protein-protein interactions in biomedical texts using a parser and linguistic resources. In: Gelbukh, A. (ed.) CICLing 2009. LNCS, vol. 5449, pp. 406–417. Springer, Heidelberg (2009)
Xiao, J., Su, J., Zhou, G., Tan, C.: Protein-protein interaction extraction: a supervised learning approach. In: Proceedings of the 1st International Symposium on Semantic Mining in Biomedicine (SMBM) (2005)
Roberts, A., Gaizauskas, R., Hepple, M.: Extracting clinical relationships from patient narratives. In: BioNLP 2008 (2008)
Grouin, C., BenAbacha, A., Bernhard, D., Cartoni, B., Deléger, L., Grau, B., Ligozat, A.L., Minard, A.L., Rosset, S., Zweigenbaum, P.: CARAMBA: Concept, assertion, and relation annotation using machine-learning based approaches. In: Uzuner, Ö., et al. (eds.) i2b2 Medication Extraction Challenge Workshop (2010)
Lee, C., Khoo, C., Na, J.: Automatic identification of treatment relations for medical ontology learning: An exploratory study. In: McIlwaine, I. (ed.) Knowledge Organization and the Global Information Society: Proceedings of the Eighth International ISKO Conference (2004)
Ben Abacha, A., Zweigenbaum, P.: Automatic extraction of semantic relations between medical entities: Application to the treatment relation. In: Collier, N., Hahn, U. (eds.) Proceedings of the Fourth International Symposium on Semantic Mining in Biomedicine (SMBM), Hinxton, Cambridgeshire, UK, pp. 4–11 (2010)
Rosario, B., Hearst, M.A.: Classifying semantic relations in bioscience text. In: Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL 2004), Barcelona (July 2004)
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: An update. SIGKDD Explorations 11(1) (2009)
Frunza, O., Inkpen, D.: Extraction of disease-treatment semantic relations from biomedical sentences. In: Proceedings of the 2010 Workshop on Biomedical Natural Language Processing, Uppsala, Sweden, pp. 91–98. Association for Computational Linguistics (2010)
Joachims, T.: Text categorization with support vector machines: Learning with many relevant features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398. Springer, Heidelberg (1998)
Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines (2001), Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm
Schmid, H.: Probabilistic part-of-speech tagging using decision trees. In: Proceedings of the International Conference on New Methods in Language Processing, Manchester, UK, pp. 44–49 (1994)
Lindberg, D.A., Humphreys, B., McCray, A.T.: The Unified Medical Language System. Methods of Information in Medicine 32(4), 281–291 (1993)
Kipper, K., Dang, H.T., Palmer, M.: Class-based construction of a verb lexicon. In: AAAI/IAAI, pp. 691–696
Levin, B.: English verb classes and alternation: A preliminary investigation. The University of Chicago Press, Chicago (1993)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ben Abacha, A., Zweigenbaum, P. (2011). A Hybrid Approach for the Extraction of Semantic Relations from MEDLINE Abstracts. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2011. Lecture Notes in Computer Science, vol 6609. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-19437-5_11
Download citation
DOI: https://doi.org/10.1007/978-3-642-19437-5_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-19436-8
Online ISBN: 978-3-642-19437-5
eBook Packages: Computer ScienceComputer Science (R0)