The Meaning of forma in Thomas Aquinas: Hierarchical Clustering from the Index Thomisticus Treebank

Cantaluppi, Gabriele; Passarotti, Marco

doi:10.1007/978-3-319-06692-9_10

Gabriele Cantaluppi²² &
Marco Passarotti²³

Part of the book series: Studies in Classification, Data Analysis, and Knowledge Organization ((STUDIES CLASS))

2246 Accesses

Abstract

We apply word hierarchical clustering techniques to collect the occurrences of the lemma forma that show a similar contextual behaviour in the works of Thomas Aquinas into the same or closely related groups. Our results will support the lexicographers of a data-driven new lexicon of Thomas Aquinas in their task of writing the lexical entry of forma. We use two datasets: the Index Thomisticus (IT), a corpus containing the opera omnia of Thomas Aquinas, and the Index Thomisticus Treebank, a syntactically annotated subset of the IT.

Results are evaluated against a manually labeled subset of the occurrences of forma.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
The IT was lemmatised manually. Participles were always reduced to verbs unless they feature a separate lexical entry in the Latin dictionary provided by Forcellini (1771; extended in 1896 by R. Klotz, G. Freund & L. Doderlein); for instance, the word falsus is always lemmatised as a form of the adjective falsus and not of the verb fallo. Disambiguation of the homographs is partly available in the IT and it is completed in the Index Thomisticus Treebank.
2.
Sentences in the IT-TB were splitted automatically by strong punctuation marks (period, colon, semicolon, question mark, exclamation mark). At times, manual modifications of automatic sentence splitting were made by annotators.

References

Busa, R. (1974–1980). Index Thomisticus. Stuttgart-Bad Cannstatt: Frommann-Holzboog
Google Scholar
Deferrari, R. J., & Barry, M. I. (1948–1949). A Lexicon of St. Thomas Aquinas: based on the Summa Theologica and selected passages of his other works. Washington, DC: Catholic University of America Press
Google Scholar
Firth, J. R. (1957). Papers in linguistics 1934–1951. London: London University Press.
Google Scholar
Forcellini, A. (1771). Totius Latinitatis lexicon, consilio et cura Jacobi Facciolati opera et studio Aegidii Forcellini, lucubratum, typis Seminarii, Patavii.
Google Scholar
Kaufman, L., & Rousseeuw, P. J. (1990). Finding groups in data: an introduction to cluster analysis. New York: Wiley.
Book Google Scholar
Maechler, M., Rousseeuw, P. J., Struyf, A., Hubert, M., & Hornik, K. (2012). Cluster: Cluster analysis basics and extensions. R package version 1.14.3. http://CRAN.R-project.org/package=cluster.
McGillivray, B., Passarotti, M., & Ruffolo, P. (2009). The Index Thomisticus treebank project: Annotation, parsing and valency lexicon. Traitement Automatique des Langues, 50(2), 103–127.
Google Scholar
Minozzi, S. (2008). La costruzione di una base di conoscenza lessicale per la lingua latina: Latinwordnet. In G. Sandrini (Ed.), Studi in onore di Gilberto Lonardi (pp. 243–258). Verona: Fiorini.
Google Scholar
Pedersen, T. (2006). Unsupervised corpus-based methods for WSD. In E. Agirre & P. Edmonds (Eds.), Word sense disambiguation: algorithms and applications (pp. 133–166). New York: Springer.
Chapter Google Scholar
R Core Team (2012). R: a language and environment for statistical computing. Vienna: R Foundation for Statistical Computing. ISBN: 3-900051-07-0. http://www.R-project.org/.
Sokal, R. R., & Michener, C. D. (1958). A statistical method for evaluating systematic relationships. University of Kansas Science Bulletin, 38, 1409–1438.
Google Scholar
Van Rijsbergen, ‘Keith’ C. J. (1979) Information retrieval. London: Butterworths
Google Scholar

Download references

Author information

Authors and Affiliations

Dipartimento di Scienze Statistiche, Università Cattolica del Sacro Cuore, Milano, Italy
Gabriele Cantaluppi
Centro Interdisciplinare di Ricerche per la Computerizzazione dei Segni dell’Espressione, Università Cattolica del Sacro Cuore, Milano, Italy
Marco Passarotti

Authors

Gabriele Cantaluppi
View author publications
You can also search for this author in PubMed Google Scholar
Marco Passarotti
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Gabriele Cantaluppi .

Editor information

Editors and Affiliations

Department of Statistical Science, University of Rome "La Sapienza", Rome, Italy
Donatella Vicari
and Information Sciences, Tama University Graduate School of Management, Tokyo, Japan
Akinori Okada
Department of Political Science, University of Naples "Federico II", Naples, Italy
Giancarlo Ragozini
Fakultät Statistik, Technische Universität Dortmund, Dortmund, Germany
Claus Weihs

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Cantaluppi, G., Passarotti, M. (2014). The Meaning of forma in Thomas Aquinas: Hierarchical Clustering from the Index Thomisticus Treebank. In: Vicari, D., Okada, A., Ragozini, G., Weihs, C. (eds) Analysis and Modeling of Complex Data in Behavioral and Social Sciences. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Cham. https://doi.org/10.1007/978-3-319-06692-9_10

Download citation

DOI: https://doi.org/10.1007/978-3-319-06692-9_10
Published: 17 June 2014
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-06691-2
Online ISBN: 978-3-319-06692-9
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics