Abstract
We apply word hierarchical clustering techniques to collect the occurrences of the lemma forma that show a similar contextual behaviour in the works of Thomas Aquinas into the same or closely related groups. Our results will support the lexicographers of a data-driven new lexicon of Thomas Aquinas in their task of writing the lexical entry of forma. We use two datasets: the Index Thomisticus (IT), a corpus containing the opera omnia of Thomas Aquinas, and the Index Thomisticus Treebank, a syntactically annotated subset of the IT.
Results are evaluated against a manually labeled subset of the occurrences of forma.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
The IT was lemmatised manually. Participles were always reduced to verbs unless they feature a separate lexical entry in the Latin dictionary provided by Forcellini (1771; extended in 1896 by R. Klotz, G. Freund & L. Doderlein); for instance, the word falsus is always lemmatised as a form of the adjective falsus and not of the verb fallo. Disambiguation of the homographs is partly available in the IT and it is completed in the Index Thomisticus Treebank.
- 2.
Sentences in the IT-TB were splitted automatically by strong punctuation marks (period, colon, semicolon, question mark, exclamation mark). At times, manual modifications of automatic sentence splitting were made by annotators.
References
Busa, R. (1974–1980). Index Thomisticus. Stuttgart-Bad Cannstatt: Frommann-Holzboog
Deferrari, R. J., & Barry, M. I. (1948–1949). A Lexicon of St. Thomas Aquinas: based on the Summa Theologica and selected passages of his other works. Washington, DC: Catholic University of America Press
Firth, J. R. (1957). Papers in linguistics 1934–1951. London: London University Press.
Forcellini, A. (1771). Totius Latinitatis lexicon, consilio et cura Jacobi Facciolati opera et studio Aegidii Forcellini, lucubratum, typis Seminarii, Patavii.
Kaufman, L., & Rousseeuw, P. J. (1990). Finding groups in data: an introduction to cluster analysis. New York: Wiley.
Maechler, M., Rousseeuw, P. J., Struyf, A., Hubert, M., & Hornik, K. (2012). Cluster: Cluster analysis basics and extensions. R package version 1.14.3. http://CRAN.R-project.org/package=cluster.
McGillivray, B., Passarotti, M., & Ruffolo, P. (2009). The Index Thomisticus treebank project: Annotation, parsing and valency lexicon. Traitement Automatique des Langues, 50(2), 103–127.
Minozzi, S. (2008). La costruzione di una base di conoscenza lessicale per la lingua latina: Latinwordnet. In G. Sandrini (Ed.), Studi in onore di Gilberto Lonardi (pp. 243–258). Verona: Fiorini.
Pedersen, T. (2006). Unsupervised corpus-based methods for WSD. In E. Agirre & P. Edmonds (Eds.), Word sense disambiguation: algorithms and applications (pp. 133–166). New York: Springer.
R Core Team (2012). R: a language and environment for statistical computing. Vienna: R Foundation for Statistical Computing. ISBN: 3-900051-07-0. http://www.R-project.org/.
Sokal, R. R., & Michener, C. D. (1958). A statistical method for evaluating systematic relationships. University of Kansas Science Bulletin, 38, 1409–1438.
Van Rijsbergen, ‘Keith’ C. J. (1979) Information retrieval. London: Butterworths
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Cantaluppi, G., Passarotti, M. (2014). The Meaning of forma in Thomas Aquinas: Hierarchical Clustering from the Index Thomisticus Treebank. In: Vicari, D., Okada, A., Ragozini, G., Weihs, C. (eds) Analysis and Modeling of Complex Data in Behavioral and Social Sciences. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Cham. https://doi.org/10.1007/978-3-319-06692-9_10
Download citation
DOI: https://doi.org/10.1007/978-3-319-06692-9_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-06691-2
Online ISBN: 978-3-319-06692-9
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)