Abstract
We present a new database of Dutch word frequencies based on film and television subtitles, and we validate it with a lexical decision study involving 14,000 monosyllabic and disyllabic Dutch words. The new SUBTLEX frequencies explain up to 10% more variance in accuracies and reaction times (RTs) of the lexical decision task than the existing CELEX word frequency norms, which are based largely on edited texts. As is the case for English, an accessibility measure based on contextual diversity explains more of the variance in accuracy and RT than does the raw frequency of occurrence counts. The database is freely available for research purposes and may be downloaded from the authors’ university site at http://crr.ugent.be/subtlex-nl or from http://brm psychonomic-journals.org/content/supplemental.
Article PDF
Similar content being viewed by others
References
Adelman, J. S., Brown, G. D. A., & Quesada, J. F. (2006). Contextual diversity, not word frequency, determines word-naming and lexical decision times. Psychological Science, 17, 814–823. doi:10.1111/j.1467-9280.2006.01787.x
Baayen, R. H., Feldman, L. B., & Schreuder, R. (2006). Morphological influences on the recognition of monosyllabic monomorphemic words. Journal of Memory & Language, 55, 290–313.
Baayen, R. H., Piepenbrock, R., & van Rijn, H. (1993). The CELEX Lexical Database [CD-ROM]. Philadelphia: Linguistic Data Consortium, University of Pennsylvania.
Balota, D. A., Cortese, M. J., & Pilotti, M. (1999). Item-level analyses of lexical decision performance: Results from a mega-study. Abstracts of the 40th Annual Meeting of the Psychonomic Society, 4, 44.
Balota, D. A., Cortese, M. J., Sergent-Marshall, S. D., Spieler, D. H., & Yap, M. J. (2004). Visual word recognition of single-syllable words. Journal of Experimental Psychology: General, 133, 283–316.
Balota, D. A., Yap, M. J., Cortese, M. J., Hutchison, K. A., Kessler, B., Loftis, B., et al. (2007). The English Lexicon Project. Behavior Research Methods, 39, 445–459.
Bontrager, T. (1991). The development of word frequency lists prior to the 1944 Thorndike-Lorge list. Reading Psychology, 12, 91–116. doi:10.1080/0270271910120201
Brants, T., & Franz, A. (2006). Web 1T 5-Gram Corpus (Version 1). Philadelphia: Linguistic Data Consortium, University of Pennsylvania.
Brysbaert, M., & New, B. (2009). Moving beyond Kučera and Francis: A critical evaluation of current word frequency norms and the introduction of a new and improved word frequency measure for American English. Behavior Research Methods, 41, 977–990. doi:10.3758/ BRM.41.4.977
Burgess, C., & Livesay, K. (1998). The effect of corpus size in predicting reaction time in a basic word recognition task: Moving on from Kučera and Francis. Behavior Research Methods, Instruments, & Computers, 30, 272–277.
Cassel, D. (2007, May 17). Police raid Polish subtitle site [Online article]. Retrieved from http://tech.blorge.com/Structure:%20 /2007/05/17/police-raid-polish-subtitle-site/.
Cortese, M. J., & Khanna, M. M. (2007). Age of acquisition predicts naming and lexical-decision performance above and beyond 22 other predictor variables: An analysis of 2,342 words. Quarterly Journal of Experimental Psychology, 60, 1072–1082.
Enigmax (2009, February 5). Hackers hit anti-pirates to avenge subsite takedown [Online article]. Retrieved from http://torrentfreak .com/hackers-hit-anti-pirates-to-avenge-sub-site-takedown-090205/.
Ghyselinck, M., Lewis, M. B., & Brysbaert, M. (2004). Age of acquisition and the cumulative-frequency hypothesis: A review of the literature and a new multi-task investigation. Acta Psychologica, 115, 43–67.
Johnston, R. A., & Barry, C. (2006). Age of acquisition and lexical processing. Visual Cognition, 13, 789–845.
Juhasz, B. J. (2005). Age-of-acquisition effects in word and picture identification. Psychological Bulletin, 131, 684–712.
Keuleers, E., & Brysbaert, M. (2010). Wuggy: A multilingual pseudoword generator. Behavior Research Methods, 42, 627–633.
Kuera, H., & Francis, W. (1967). Computational analysis of presentday American English. Providence, RI: Brown University Press.
New, B., Brysbaert, M., Veronis, J., & Pallier, C. (2007). The use of film subtitles to estimate word frequencies. Applied Psycholinguistics, 28, 661–677.
Shaoul, C., & Westbury, C. (2009). A USENET corpus (2005–2009). Edmonton: University of Alberta. Retrieved from www.psych.ualberta.ca/~westburylab/downloads/usenetcorpus.download.html.
Stevens, M., Lammertyn, J., Verbruggen, F., & Vandierendonck, A. (2006). Tscope: A C library for programming cognitive experiments on the MS Windows platform. Behavior Research Methods, 38, 280–286.
Thorndike, E. L., & Lorge, I. (1944). The teacher’s word book of 30,000 words. New York: Columbia University, Teachers College.
Uit den Boogaart, P. C. (Ed.) (1975). Woordfrequenties in geschreven en gesproken Nederlands. Utrecht: Oosthoek, Scheltema Holkema.
van Berckel, J., Brandt Corstius, H., Mokken, R., & van Wijngaarden, A. (1965). Formal properties of newspaper Dutch. Amsterdam: Mathematisch Centrum Amsterdam.
van den Bosch, A., Busser, B., Canisius, S., & Daelemans, W. (2007). An efficient memory-based morpho-syntactic tagger and parser for Dutch. In P. Dirix, I. Schuurman, V. Vandeghinste, & F. Van Eynde (Eds.), Computational linguistics in the Netherlands: Selected papers from the Seventeenth CLIN Meeting (pp. 99-114). Leuven.
Yap, M. J., & Balota, D. A. (2009). Visual word recognition of multisyllabic words. Journal of Memory & Language, 60, 502–529. doi:10.1016/j.jml.2009.02.001
Yarkoni, T., Balota, D., & Yap, M. (2008). Moving beyond Coltheart’s N: A new measure of orthographic similarity. Psychonomic Bulletin & Review, 15, 971–979.
Zeno, S. M., Ivens, S. H., Millard, R. T., & Duvvuri, R. (1995). The educator’s word frequency guide. Brewster, NJ: Touchstone Applied Science Associates.
Zevin, J. D., & Seidenberg, M. S. (2002). Age of acquisition effects in word reading and other tasks. Journal of Memory & Language, 47, 1–29. doi:10.1006/jmla.2001.2834
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
Rights and permissions
About this article
Cite this article
Keuleers, E., Brysbaert, M. & New, B. SUBTLEX-NL: A new measure for Dutch word frequency based on film subtitles. Behavior Research Methods 42, 643–650 (2010). https://doi.org/10.3758/BRM.42.3.643
Received:
Accepted:
Issue Date:
DOI: https://doi.org/10.3758/BRM.42.3.643