The Word Frequency Effect
A Review of Recent Developments and Implications for the Choice of Frequency Estimates in German
Abstract
We review recent evidence indicating that researchers in experimental psychology may have used suboptimal estimates of word frequency. Word frequency measures should be based on a corpus of at least 20 million words that contains language participants in psychology experiments are likely to have been exposed to. In addition, the quality of word frequency measures should be ascertained by correlating them with behavioral word processing data. When we apply these criteria to the word frequency measures available for the German language, we find that the commonly used Celex frequencies are the least powerful to predict lexical decision times. Better results are obtained with the Leipzig frequencies, the dlexDB frequencies, and the Google Books 2000–2009 frequencies. However, as in other languages the best performance is observed with subtitle-based word frequencies. The SUBTLEX-DE word frequencies collected for the present ms are made available in easy-to-use files and are free for educational purposes.
References
2006). Morphological influences on the recognition of monosyllabic monomorphemic words. Journal of Memory and Language, 55, 290–313. doi: 10.1016/j.jml.2006.03.008
(1995). The CELEX Lexical Database [CD-ROM]. Philadelphia, PA: Linguistic Data Consortium.
(2004). Visual word recognition of single-syllable words. Journal of Experimental Psychology: General, 133, 283–316. doi: 10.1037/0096-3445.133.2.283
(2007). The English lexicon project. Behavior Research Methods, 39, 445–459.
(2007). The Leipzig Corpora Collection – Monolingual corpora of standard size. Proceedings of Corpus Linguistics 2007, Birmingham, UK.
(2007). German compounds in language comprehension and production. Doctoral dissertation, Westfälische Wilhelms-Universität Münster, Germany. Retrieved from miami.uni-muenster.de/servlets/DerivateServlet/Derivate-4107/diss_boehl.pdf on 20 November, 2010.
(1991). The development of word frequency lists prior to the 1944 Thorndike-Lorge list. Reading Psychology: An International Quarterly, 12, 91–116.
(2011). Do the effects of subjective frequency and age of acquisition survive better word frequency norms? Quarterly Journal of Experimental Psychology, 64, 545–559.
(2011). Assessing the usefulness of Google Books’ word frequencies for psycholinguistic research on word processing. Frontiers in Psychology, 2, 27. doi: 10.3389/fpsyg.2011.00027
(2009). Moving beyond Kučera and Francis: A critical evaluation of current word frequency norms and the introduction of a new and improved word frequency measure for American English. Behavior Research Methods, Instruments & Computers, 41, 977–990.
(1998). The effect of corpus size in predicting reaction time in a basic word recognition task: Moving on from Kučera and Francis. Behavior Research Methods, Instruments, & Computers, 30, 272–277.
(2010). SUBTLEX-CH: Chinese word and character frequencies based on film subtitles. PLoS ONE, 5, e10729.
(2011). Face (and nose) priming for book. Experimental Psychology, 58, 62–70.
(2007). Phonology as the source of syllable frequency effects in visual word recognition: Evidence from French. Memory & Cognition, 35, 974–983.
(2007). Age of acquisition predicts naming and lexical decision performance above and beyond 22 other predictor variables: An analysis of 2,342 words. Quarterly Journal of Experimental Psychology, 60, 1072–1082.
(2010). Recognition memory for 2, 578 monosyllabic words. Memory, 18, 595–609. doi: 10.1080/09658211.2010.493892
(2011). SUBTLEX-ESP: Spanish word frequencies based on film subtitles. Psicologica, 32, 133–143.
(2010). Subtitle-based word frequencies as the best estimate of reading behavior: The case of Greek. Frontiers in Psychology, 1, 218. doi: 10.3389/fpsyg.2010.00218.
(2010). Masked translation priming with highly proficient simultaneous bilinguals. Experimental Psychology, 57, 98–107.
(2009). Translation priming between the native language and a second language. New evidence from Dutch-French bilinguals. Experimental Psychology, 56, 173–179.
(2010). The French Lexicon Project: Lexical decision data for 38,840 French words and 38, 840 pseudowords. Behavior Research Methods, 42, 488–496.
(1996). Orthographic processing in visual word recognition: A multiple read-out model. Psychological Review, 103, 518–565.
(2006). Recognition memory and awareness: A high-frequency advantage in the accuracy of knowing. Memory, 14, 265–275. doi: 10.1080/09658210544000051
(2010). Lexical access problems lead to disfluencies in speech. Experimental Psychology, 57, 169–177.
(2011). dlexDB – eine lexikalische Datenbank für die psychologische und linguistische Forschung
([dlexDB – a lexical database for psychological and linguistic research] . Psychologische Rundschau, 62, 10–20.2010). Effects of study list composition on the word frequency effect and metacognitive attributions in recognition memory. Memory, 18, 883–899.
(1951). Visual duration threshold as a function of word-probability. Journal of Experimental Psychology, 41, 401–410.
(2010). Age/order of acquisition effects and cumulative learning of foreign words: a word training study. Journal of Memory and Language, 64, 32–58.
(2006). A frequency dictionary of German: Core vocabulary for learners. London: Routledge.
(1897/1898). Häufigkeitswörterbuch der deutschen Sprache: Festgestellt durch einen Arbeitsausschuss der deutschen Stenographiesysteme
([Word frequency dictionary of the German language: Compiled by a working committee of the German stenographic system] . Berlin: Steglitz.2009). Pathway control in visual word processing: Converging evidence from recognition memory. Psychonomic Bulletin & Review, 16, 692–698. doi: 10.375&/PBR.16.4.692
(2010a). SUBTLEX-NL: A new measure for Dutch word frequency based on film subtitles. Behavior Research Methods, 42, 643–650. doi: 10.3758/BRM.42.3.643
(2010b). Practice effects in large-scale visual word recognition studies: A lexical decision study on 14,000 Dutch mono- and disyllabic words and nonwords. Frontiers in Psychology. 1–15. doi: 10.3389/fpsyg.2010.00174
(1967). Computational analysis of present day American English. Providence, RI: Brown University Press.
(2001). Word frequencies in written and spoken English: Based on the British National Corpus. London: Longman.
(2011). Quantitative analysis of culture using millions of digitized books. Science, 331, 176–182.
(2010). Modeling reading development: Cumulative, incremental learning in a computational model of word naming. Journal of Memory and Language, 63, 506–525.
(2004). Serial mechanisms in lexical access: The rank hypothesis. Psychological Review, 111, 721–756.
(2007). The use of film subtitles to estimate word frequencies. Applied Psycholinguistics, 28, 661–677. doi: 10.1017/S014271640707035X
(ERP effects of emotional valence and arousal during word reading. Manuscript in preparation.
2004). Age-of-acquisition effects in visual word recognition: Evidence from expert vocabularies. Cognition, 93, B11–B26. doi: 10.1016/j.cognition.2003.10.009
(2010). Irrelevant words trigger an attentional blink. Experimental Psychology, 57, 301–307.
(1944). The teacher’s word book of 30,000 words. Teachers College, Columbia University, 1944.
(2009). The Berlin Affective Word List reloaded (BAWL-R). Behavior Research Methods, 41, 534–539.
(1959). The comparison of regression variables. Journal of the Royal Statistical Society: Series B, 21, 395–399.
(2009). Visual word recognition of multisyllabic words. Journal of Memory and Language, 60, 502–529. doi: 10.1016/j.jml.2009.02.001
(2008). Moving beyond Coltheart’s N: A new measure of orthographic similarity. Psychonomic Bulletin & Review, 15, 971–979. doi: 10.3758/PBR.15.5.971
(2002). The nature of recollection and familiarity: A review of 30 years of research. Journal of Memory and Language, 46, 441–517.
(1995). The educator’s word frequency guide. Brewster, NY: Touchstone Applied Science.
(2002). Age of acquisition effects in word reading and other tasks. Journal of Memory and Language, 47, 1–29.
(2008). Feedback consistency effects in visual and auditory word recognition: Where do we stand after more than a decade? Journal of Experimental Psychology: Learning, Memory, and Cognition, 34, 643–661. doi: 10.1037/0278-7393.34.3.643
(