Elsevier

Intelligence

Volume 50, May–June 2015, Pages 93-99
Intelligence

A general factor of intelligence fails to account for changes in tests’ scores after cognitive practice: A longitudinal multi-group latent-variable study

https://doi.org/10.1016/j.intell.2015.02.004Get rights and content

Highlights

  • 477 participants completed twice a set of standardized intelligence tests.

  • Participants were randomly assigned to three groups.

  • The three groups showed remarkable improvements in tests’ scores.

  • Longitudinal multi-group latent variable analyses were computed.

  • The tapped common latent factor fails to account for the improvements.

Abstract

As a general rule, the repeated administration of tests measuring a given cognitive ability in the same participants reveals increased scores. This brings to life the well-known practice effect and it must be taken into account in research aimed at the proper assessment of changes after the completion of cognitive training programs. Here we focus in one specific research question: Are changes in test scores accounted for by the tapped underlying cognitive construct/factor? The evaluation of the factor of interest by several measures is required for that purpose. 477 university students completed twice a battery of four heterogeneous standardized intelligence tests within a time lapse of four weeks. Between the pre-test and the post-test sessions, some participants completed eighteen practice sessions based on memory span tasks, other participants completed eighteen practice sessions based on processing speed tasks, and a third group of participants did nothing between testing sessions. The three groups showed remarkable changes in test scores from the pre-test to the post-test intelligence session. However, results from multi-group longitudinal latent variable analyses revealed that the identified latent factor tapped by the specific intelligence measures fails to account for the observed changes.

Introduction

Practice effects are broadly acknowledged in the cognitive abilities literature (Anastasi, 1934, Colom et al., 2010, Hunt, 2011, Jensen, 1980, Reeve and Lam, 2005). When the same individuals complete the same (or parallel) standardized tests, their scores show remarkable improvements. However, as discussed by Jensen (1998) among others (Colom et al., 2002, Colom et al., 2006, te Nijenhuis et al., 2007), specific measures tap cognitive abilities at three levels: general ability (such as the general factor of intelligence, or g), group abilities (such as verbal or spatial ability), and concrete skills required by the measure (such as vocabulary or mental rotation of 2D objects).

Within this general framework, recent research aimed at testing changes after the completion of cognitive training programs has produced heated discussions regarding the nature of the changes observed in the measures administered before and after the training regime (Buschkuehl and Jaeggi, 2010, Conway and Getz, 2010, Haier, 2014, Moody, 2009, Shipstead et al., 2010, Shipstead et al., 2012, Tidwell et al., 2013). The changes may or may not be accounted for by the underlying construct of interest. Thus, for instance, the pioneering work by Jaeggi, Buschkuehl, Jonides, and Perrig (2008) observed changes in fluid intelligence measures after completion of a challenging cognitive training program based on the dual n-back task. This report stimulated a number of investigations aimed at replicating the finding (Buschkuehl et al., 2014, Colom et al., 2013, Chooi and Thompson, 2012, Harrison et al., 2013, Jaušovec and Jaušovec, 2012, Redick et al., 2012, Rudebeck et al., 2012, Shipstead et al., 2012, Stephenson and Halpern, 2013, von Bastian and Oberauer, 2013).

The meta-analysis published by Melby-Lervåg and Hulme (2012) concluded that short-term cognitive training fails to improve performance on far-transfer measures. Nevertheless, their results support the conclusion that the completed programs might improve performance on near-transfer measures, meaning that specific cognitive skills seem sensitive to training. The meta-analysis by Au et al. (2014), focused on reports analyzing the effect of cognitive training programs based on the n-back task, showed a positive, albeit small, impact on fluid intelligence measures. The weighted average effect size was .24, which is equivalent to 3.6 IQ points. The authors suggested that even these small increments might impact performance on real-life settings (see also Herrnstein & Murray, 1994 for a similar argument).

Importantly, it has been proposed that the latter statement may be relevant if and only if observed increments in test scores are accounted for by the tapped latent factor representing the construct of interest. In this regard, te Nijenhuis et al. (2007) reported a meta-analysis of sixty-four studies using a test-retest design, finding a perfect negative correlation between the vectors defined by tests’ scores changes and the g loadings of these tests. Their main conclusion was that observed improvements on scores were test-specific and unrelated to the general factor of intelligence (g). However, this study was based on the method of correlated vectors, which has been questioned on several grounds (Ashton & Lee, 2005). Latent-variable analyses are more robust and, therefore, may provide better answers to the question of interest (Dolan and Hamaker, 2001, Haier, 2014).

These latent-variable analyses require appropriate sample sizes. Published research reports analyzing changes after completion of cognitive training programs consider small samples on a regular basis. Thus, for instance, the meta-analysis by Au et al. (2014) is based on reports with sample sizes ranging from 3 to 30 participants (see their Table 1) and this precludes the application of the recommended latent-variable analyses.

To help fill this gap, here we report a study considering a large number of participants (477). All these participants completed screen versions of four heterogeneous standardized intelligence tests on two occasions, separated by four weeks. Participants were randomly assigned to three groups comprising more than one hundred participants each. Between the pre-test and post-test sessions, the first group completed 18 practice sessions based on memory span tasks, the second group completed 18 practice sessions based on processing speed tasks, whereas the third group did nothing. These three groups were systematically compared using multi-group longitudinal latent-variable analyses in order to examine the main research question, namely, are changes in tests’ scores from a pre-test to a post-test intelligence session accounted for by the tapped latent trait?

Section snippets

Participants

477 psychology undergraduates took part in the study (82% were females). The mean age was 20.13 (SD = 3.74). They participated to fulfill a course requirement. Participants were randomly assigned to three groups. The first group (memory span) comprised 170 students, the second group (processing speed) comprised 114 students, and the third group (passive control) comprised 193 students.1

Results

Table 1 summarizes the constraints across all models. When the parameters are fixed, the values 0 (factor means) and 1 (DAT-SR loadings) are shown. When they are freely estimated (i.e., same value across groups), the parameter estimates are shown.

The differences between the five models are determined by the parameters constrained to be equal for the three groups in each model. Those parameters are shaded in Table 1. The increased restrictions entail greater invariance at each step. None of the

Discussion

In this report we examined whether changes across testing sessions in a set of four standardized intelligence measures can be accounted for by a common latent factor representing general intelligence. This was evaluated considering three groups of participants who completed cognitive practice sessions (or did nothing, as in a passive control group) between pre-test and post-test intelligence sessions. The three groups showed generalized improvements in the tests’ scores (see Appendix 2), but

References (50)

  • D.E. Moody

    Can intelligence be increased by training on a task of working memory?

    Intelligence

    (2009)
  • C.L. Reeve et al.

    The psychometric paradox of practice effects due to retesting: Measurement invariance and stable ability estimates in the face of observed score changes

    Intelligence

    (2005)
  • C.L. Stephenson et al.

    Improved matrix reasoning is limited to training on tasks with a visuospatial component

    Intelligence

    (2013)
  • J. te Nijenhuis et al.

    Score gains on g-loaded tests: No g

    Intelligence

    (2007)
  • J.M. Wicherts et al.

    Are intelligence tests measurement invariant over time? Investigating the nature of the Flynn effect

    Intelligence

    (2004)
  • A. Anastasi

    Practice and variability

    Psychological Monographs

    (1934)
  • J. Au et al.

    Improving fluid intelligence with training on working memory: A meta-analysis

    Psychonomic Bulletin & Review

    (2014)
  • G.K. Bennett et al.

    Differential Aptitude Test

    (1990)
  • P.M. Bentler

    Comparative fit indexes in structural models

    Psychological Bulletin

    (1990)
  • M.W. Browne et al.

    Alternative ways of assessing model fit

  • M. Buschkuehl et al.

    Neural effects of short-term training on working memory

    Cognitive, Affective, and Behavioral Neuroscience

    (2014)
  • M. Buschkuehl et al.

    Improving intelligence: A literature review

    Swiss Medical Weekly

    (2010)
  • A.R.A. Conway et al.

    Cognitive ability: Does working memory training enhance intelligence?

    Current Biology

    (2010)
  • C.V. Dolan et al.

    Investigating black–white differences in psychometric IQ: Multi-group confirmatory factor analyses of the WISC-R and K-ABC and a critique of the method of correlated vectors

  • J.R. Flynn

    What is intelligence? Beyond the Flynn effect

    (2007)
  • Cited by (31)

    • The role of strategy use in working memory training outcomes

      2020, Journal of Memory and Language
      Citation Excerpt :

      Another major strength of our study is its large sample size, including altogether 258 participants. Thus, together with only a few other studies (De Simoni & von Bastian, 2018; Estrada, Ferrer, Abad, Román, & Colom, 2015; Guye & von Bastian, 2017; Sprenger et al., 2013; Strobach & Huestegge, 2017), it can be listed amongst the most rigorously conducted WM training trials. Another strength was the addition of the intermediate test that helped to unveil the early, gradual development of task-specific near transfer.

    • Measurement and structural invariance of cognitive ability tests after computer-based training

      2019, Computers in Human Behavior
      Citation Excerpt :

      However, as a result of different criteria for defining the groups, these are not directly comparable to the present study. For example, previous studies analyzed invariance across classes of training prior to admission testing (Arendasy et al., 2016), practice with different cognitive tasks between repeated intelligence measurements (Estrada, Ferrer, Abad, Román, & Colom, 2015), different methods used to construct alternate test forms (Arendasy & Sommer, 2013), or authors analyzed invariance across different measurement occasions (Arendasy & Sommer, 2017; Freund & Holling, 2011; Lievens et al., 2007; Reeve & Lam, 2005; Sommer, Arendasy, & Schützhofer, 2017). To our knowledge, the present study is the first that analyzes measurement and structural invariance across different amounts of test-specific training.

    View all citing articles on Scopus
    View full text