Executive function on the Psychology Experiment Building Language tests

Piper, Brian J.; Li, Victoria; Eiwaz, Massarra A.; Kobel, Yuliyana V.; Benice, Ted S.; Chu, Alex M.; Olsen, Reid H. J.; Rice, Douglas Z.; Gray, Hilary M.; Mueller, Shane T.

doi:10.3758/s13428-011-0096-6

Executive function on the Psychology Experiment Building Language tests

Published: 02 May 2011

Volume 44, pages 110–123, (2012)
Cite this article

Download PDF

Behavior Research Methods Aims and scope Submit manuscript

Executive function on the Psychology Experiment Building Language tests

Download PDF

Brian J. Piper¹,
Victoria Li¹,
Massarra A. Eiwaz¹,
Yuliyana V. Kobel¹,
Ted S. Benice¹,
Alex M. Chu¹,
Reid H. J. Olsen¹,
Douglas Z. Rice¹,
Hilary M. Gray¹ &
…
Shane T. Mueller²

9539 Accesses
83 Citations
4 Altmetric
Explore all metrics

An Erratum to this article was published on 19 November 2011

Abstract

The measurement of executive function has a long history in clinical and experimental neuropsychology. The goal of the present report was to determine the profile of behavior across the lifespan on four computerized measures of executive function contained in the recently developed Psychology Experiment Building Language (PEBL) test battery http://pebl.sourceforge.net/ and evaluate whether this pattern is comparable to data previously obtained with the non-PEBL versions of these tests. Participants (N = 1,223; ages, 5–89 years) completed the PEBL Trail Making Test (pTMT), the Wisconsin Card Sort Test (pWCST; Berg, Journal of General Psychology, 39, 15–22, 1948; Grant & Berg, Journal of Experimental Psychology, 38, 404–411, 1948), the Tower of London (pToL), or a time estimation task (Time-Wall). Age-related effects were found over all four tests, especially as age increased from young childhood through adulthood. For several tests and measures (including pToL and pTMT), age-related slowing was found as age increased in adulthood. Together, these findings indicate that the PEBL tests provide valid and versatile new research tools for measuring executive functions.

The Assessment of Executive Functioning Using the Barkley Deficits in Executive Functioning Scales

The common factor of executive functions measures nothing but speed of information uptake

Article Open access 19 February 2024

Ben Wright, Rasch Measurement, and Cognitive Psychology

The personal computer has become a ubiquitous tool in the neurobehavioral sciences over the last three decades. Test administration to human participants has particularly benefited from this technological advance because the instructions can readily be presented in a standardized fashion, potentially in multiple languages, large volumes of objective data can be efficiently collected with lower probabilities of experimenter error, and tests can be administered by various individuals after brief training. Relative to their paper-and-pencil counterparts, computerized tests can also be more easily adapted to a neuroimaging environment (Kubo et al., 2009) and require less manual effort for scoring and data analysis. The measurement of executive functions has especially benefitted from computerized test developments (Conners, 1985; Greenberg & Waldmant, 1993; Gur et al., 2010; Wild, Howieson, Webbe, Seelye, & Kaye, 2008). Executive functions are adaptive, goal-directed actions that allow an individual to override automatic or established behaviors. Tasks that involve set shifting, response inhibition, and working memory, especially those that require solving a novel problem, are thought to provide indices of executive function (Garon, Bryson, & Smith, 2008).

However, two challenges continue to face those interested in adopting computerized behavioral testing. First, although the price for individual tests may be reasonable, many researchers prefer to measure a broad array of sensory, motor, and higher-order cognitive functions using a battery approach (Piper, Acevedo, Craytor, Murray, & Raber, 2010; Wild et al., 2008). Some of the better studied batteries (e.g., the Cambridge Neuropsychological Test Automated Battery (CANTAB; Fray & Robbins, 1996) not only charge for the initial setup with specialized equipment, but also have substantial annual or per-use license fees to keep the software operational. While this price is partially understandable to offset the resources needed for test development, excessive costs may limit the potential of smaller laboratories or investigators in developing nations from being full participants in the research process. Second, the computer code that underlies the commercial tests either may not be readily available or may be insufficiently documented so that other researchers can interpret the operations or verify their correctness.

The Psychology Experiment Building Language (PEBL) was developed to overcome both of these limitations. The software is freely downloadable (http://pebl.sourceforge.net/) and has been documented so that others with moderate programming skills can modify the individual tests to meet their experimental needs (Mueller, 2010b). The current PEBL battery (version 0.6) consists of approximately 50 tests, including many classic ones in experimental psychology and behavioral neurology (Mueller, 2010a, 2010c). The objective of the present study was to report on findings from a subset of PEBL battery tests that assess several core capacities of executive function across the lifespan, including attention, planning, decision making, and cognitive flexibility.

Characterization of age-related performance profiles can be useful for several purposes, including providing fundamental data to help determine the etiology of these changes, enabling diagnostic tests of specific impairments to be developed, and, eventually, identifying interventions for both younger and older individuals that may optimize neurocognitive functioning. Many neurobehavioral capacities improve in young children as they gain maturity, including fine motor function, reaction time, sustained attention, and working memory (Kail & Salthouse, 1994; Piper, 2011). As age progresses, there is often an age-related loss in processing speed and the emergence of generalized cognitive reduction, with deficits in visuospatial skills, working memory, and executive function (see Mahncke, Bronstone, & Merzenich, 2006; Park, 2000). In the present study, we sought to establish whether these general aging effects would occur across a set of four distinct and complementary behavioral tests. Each of the four tests employed, including their historical antecedents, is described below. Three of these tests are versions of some of the most widely used measures in clinically orientated behavioral neuropsychological research, while the fourth has seen limited prior use. To provide some context, we will first review the historical origins of the tests we completed.

The Trail Making Test (TMT) was originally part of the Army Individual Test Battery (1944) but was later adopted into the Halstead–Reitan Neuropsychological Test Battery (Reitan, 1955). Traditionally, the experimenter used a stopwatch to record how long participants took to connect dots on paper that were either numbered (Trails Part A: 1–2–3–4–5) or alternated between numbers and letters (Trails Part B: 1–A–2–B–3). The TMT is simple to administer and is used as an index of visual attention (Trails A) and cognitive flexibility (Trails B). The TMT has also been employed as a sensitive indicator of brain damage (Davidson, Gao, Mason, Winocur, & Anderson, 2008; Periáñez et al., 2007; Reitan, 1955, 1958; Stuss et al., 2001). In addition, the TMT has been employed with a wide variety of other populations, including those with age-associated memory impairment (Hänninen et al., 1997), alcoholics (O’Leary, Radford, Chaney, & Schau, 1977), and children with temporal lobe epilepsy (Guimarães et al., 2007).

The Wisonsin Card Sorting Test (WCST) is another prevalent measure of executive function in contemporary neuropsychological practice and research. This test was originally conceptualized by Berg (1948; Grant & Berg, 1948). The original design of the task involved physically placing cards in one of four piles on the basis of the characteristics of the stimuli. The rule for correctly sorting the stimuli changes regularly, and the ability to switch strategies based on the shape, color, or number of stimuli is recorded. A response in which the earlier rule is incorrectly employed is a perseverative error. Like the TMT, the WCST has been used with various clinical conditions, including schizophrenics (Shad, Tamminga, Cullum, Haas, & Keshavan, 2006) and children with attention deficit hyperactivity disorder (Tsuchiya, Oki, Yahara, & Fujieda, 2005).

The Tower of London (ToL) is a test of planning in which colored disks or balls on pegs are moved individually from an initial state to match a goal state. Optimal performance involves forming, retaining, and implementing a plan to make as few moves as possible. This test was originally developed as a simplified version of the Tower of Hanoi by Shallice (1982). The cognitive and neurophysiological substrates of ToL performance have been frequently and thoroughly examined (Phillips, Wynn, Gilhooly, Della Sala, & Logie, 1999; Ward & Allport, 1997). Both lesion and neuroimaging findings have identified the prefrontal cortex as integral in performing the ToL, as well as the TMT and WCST (Davidson et al., 2008; Kubo et al., 2009; Schall et al., 2003; Zakzanis, Mraz, & Graham, 2005).

The “Time-Wall” task is a recently developed nonverbal decision-making test modeled after a task originally included in the Unified Tri-Services Cognitive Performance Assessment Battery, which was used by the military of the United States for personnel testing (Perez, Masline, Ramsey, & Urban, 1987; Snyder & Rice, 1990). The objective of the original Time-Wall task was to assess the ability to estimate the time at which a target, moving vertically at a fixed rate, will have traveled a specified distance. Thus, it draws on skills relating to both motion perception (Sekuler, Watamaniuk, & Blake, 2002) and prediction. An interesting aspect of performance on Time-Wall type tasks is that this skill appears to be a stable characteristic that does not improve (Jerison, Crannell, & Pownall, 1957), even with extensive practice (Perez et al., 1987).

Although factor-analytic studies indicate that executive function is not a simple, unitary process (Fisk & Sharp, 2004; Miyake et al., 2000), performance across the lifespan on the aforementioned tests generally exhibits a U-shaped association with age. Table 1 provides an overall framework for the present endeavor by outlining the contributions of age to executive function, using the non-PEBL test versions. The Reitan Neuropsychological Laboratory, the originator and distributor of the Reitan TMT, has recognized this developmental profile and has constructed different versions of the test for preadolescent (ages, 9–14 years) versus older (ages, 15+) participants. Among adults, the time to complete Part B increases with age (Yeudall, Reddon, Gill, & Stefanyk, 1987; Tombaugh, 2004). Similarly, the number of perseverative errors decreases with age in children, is stable from ages 20 to 60, and is elevated at senescence on the WCST (Chelune & Baer, 1986; Fisk & Sharp, 2004; Hartman, Bolton, & Fehnel, 2001; Huizinga, Dolan, & Van der Molan, 2006; Strauss, Sherman, & Spreen, 2006). Young adults were most efficient at solving of the CANTAB ToL, relative to either children or older adults (DeLuca et al. 2003).

Table 1 Performance across the lifespan on measures of executive function

Full size table

To demonstrate the validity of the PEBL executive function tests, we set out to determine whether the tests demonstrate the expected U-shaped relationship between age and behavior, with a progression during childhood, optimal performance (i.e., lowest scores/errors) during late-adolescence/early-adulthood, and then a regression during senescence.

Method

Attendees of the Oregon Museum of Science and Industry were first asked their age and handedness and then were asked to complete a short computer-administered task that typically lasted from 3 to 12 min. The tests were implemented using the PEBL platform (all scripts are in the supplemental materials) and were typically identical or slightly adapted from versions distributed in version 0.5 of the PEBL Test Battery (exact methods are described below, and the scripts are available as supplemental materials). Testing was performed on one of ten personal computers running the Microsoft Windows operating system. The minimum age (5–7 years) to participate was based on the complexity of each test and preliminary testing. Children participating in an experiment were tested without the assistance of their parents or guardians, who instead were encouraged to take part in the study on a separate workstation while their children completed the test. Testing was completed in a semiprivate area with partitions between computer stations to limit any visual distractions or viewing of the monitors at the adjacent station. The instructions were displayed and read by the experimenter to each participant. All procedures were approved by the Institutional Review Board of Oregon Health and Science University. Each participant completed only one test, and so, consequently, we have no direct correlational measures between performance on different tests. During the data collection period (May–September 2010), data were obtained using eight different PEBL tests, of which four are reported here and two are available elsewhere (Berteau-Pavy & Piper, 2011; Piper & Miller, 2011).

PEBL Trail Making Test (pTMT)

The participants (N = 384; ages, 5–76 years; 51.3% female; 13.0% left-handed) completed a computerized version modeled generically after the Halstead–Reitan Trail Making Test. The PEBL version uses an automated algorithm to generate the problems, rather than using the specific set of layouts in the Halstead–Reitan test. The test administered was slightly modified from the one contained in version 0.5 of the PEBL battery, so that the same five test forms were used for all participants (instead of being generated randomly). The instructions below were displayed prior to testing:

In this experiment, your goal is to click on each circle, in sequence, as quickly as you can. When you click on the correct circle, its number will change to boldface, and a line will be drawn from the previous circle to the new circle. On some trials, the circles will be numbered from 1 to 25, and you should click on them in numerical order (1–2–3–4). On other trials, the circles will have both numbers (1 to 13) and letters (A through L), and you should click on them in an alternating order (1–A–2–B–3–C). If you click the wrong circle, no line will be drawn. The trial will continue until you have successfully clicked on all of the circles in the correct order. After the display appears, you can examine the circles as long as you want. Timing will not begin until you click on the first circle, which is labeled '1' on every trial.

The pTMT contained ten trails and alternated between five trials with only numbers (Part A) and five trials with alternating numbers and letters (Part B; see Fig. 1a). Each Part A trial had a corresponding Trail B (an isomorphic problem, mirrored along the vertical axis) with an equal distance to connect all the items. The primary dependent measure was the total time to complete each part. The B:A ratio and accuracy, defined as the minimum number of clicks necessary to complete each trial divided by the number made, were also calculated.

PEBL Wisconsin (Berg) Card Sorting Test (pWCST)

Participants (N = 246; ages, 7–89 years; 45.5% female; 10.7% left-handed) completed a card sorting task (Fig. 2a) modeled after Berg (1948) and described more fully elsewhere (Lyvers & Tobias-Webb, 2010). The instructions were as follows:

You are about to take part in an experiment in which you need to categorize cards based on the pictures appearing on them. To begin, you will see four piles. Each pile has a different number, color, and shape. You will see a series of cards and need to determine which pile each belongs to.... The correct answer depends upon a rule, but you will not know what the rule is. But, we will tell you on each trial whether or not you were correct. Finally, the rule may change during the task, so when it does, you should figure out what the rule is as quickly as possible and change with it. Press any key to begin.
Fig. 2
Wisconsin Card Sorting (Berg, 1948) performance in children (ages, 7–12 years; N = 71), adolescents (ages, 13–19 years; N = 63), early adults (ages, 20–49 years; N = 81), and late adults (ages, 50–89 years; N = 30). a Screen shot: The lower card is placed into one of the four piles on the basis of similarity of shape, color, or number. b Percentages of perseverative errors by age (^lp < .05 vs. late adults). c Response times on correct and incorrect trials by age (*p < .0005 vs. correct; ^c p < .05 vs. children; ^l p < .05 vs. late adults)
Full size image

After each trial, feedback of “correct!” or “incorrect” was displayed for 500 ms. The maximum number of trials was 128 (i.e., two decks of 64 cards) but could be shorter (100) on the basis of optimal category completions. The rule (color, shape, or number) could switch as quickly as every tenth trial. The primary dependent measure was the percentage of the total number of trials with perseverative errors. A perseverative error was defined as an incorrect response to a shifted or new category that would have been correct for the immediately preceding category. Response time was also obtained for correct and incorrect decisions for each participant, although excessively short (<100 ms) or long (>10 s) trial times were excluded prior to calculating the mean for each participant.

PEBL Tower of London (pTOL)

The participants (N = 325; ages, 6–82 years; 44.0% female; 12.3% left-handed) completed eight trials with stimuli based on set A from Phillips et al. (1999). The instructions were as follows:

You are about to perform a task called the 'Tower of London'. Your goal is to move a pile of disks from their original configuration to the configuration shown on the top of the screen. You can only move one disk at a time, and you cannot move a disk onto a pile that has no more room (indicated by the size of the grey rectangle). To move a disk, click on the pile you want to move a disk off of, and it will move up into the hand. Then, click on another pile, and the disk will move down to that pile.

Notably, unlike some versions of the ToL, the version we tested placed no restrictions on the height of the pegs or the number of moves allowed to solve the problem. The primary dependent measure was the total number of extra moves across the seven trials (moves made minus 48, the minimum necessary to solve the problems), although the total time was also recorded (Fig. 3).

Time-Wall

The participants (N = 268; ages, 5–79 years; 47.8% female; 12.6% left-handed) completed a time estimation task based on Perez et al. (1987). The females in this sample were older (25.2 ± 1.8 years) than the males (19.9 ± 1.3 years), t(236.6) = 2.42, p < .05, so sex differences were evaluated with the sample stratified into age categories. The instructions, slightly modified from Snyder and Rice (1990), were as follows:

This is an experiment to see how well you can estimate the speed of a moving square target. The target will always start at the top of the screen and descend at a constant rate toward the bottom. After the target is two-thirds of the way down, it will pass behind a wall and become invisible. Your task is to press a button at the exact moment the moving target would pass through the notch marked at the very bottom of the display. In making this judgement, you are not to count or use any other rhythm method to facilitate your judgement. Instead, follow the target with your eyes and imagine it continuing straight down behind the wall to the notch. After you have pressed the button, you will receive feedback as to where the target actually was and whether you over or underestimated the time interval. When you are ready, press a key on the keyboard and the next target shall emerge from the top.

The participants underwent a brief practice, followed by 18 trials on which the correct completion time ranged from 2,000 to 9,200 ms (M = 5,822.4 ms, SEM = 558.4). Figure 4a shows a screenshot from Time-Wall. The primary dependent measure was inaccuracy, defined as the absolute value of the participant’s response time minus the correct time divided by the correct time for that trial. Since the vast majority of responses on tests of this type are too early (Jerison & Arginteanu, 1958; Jerison et al., 1957), the percentage of trials on which the response time was greater than the correct time was determined. These two values map roughly onto precision and bias, where optimal Time-Wall performance would result in a smaller values for inaccuracy (ideally, 0.00), and unbiased performance would result in a proportion of late responses close to 50%.

Statistical analysis

All analyses were conducted using SPSS, version 16.0 (SPSS Inc., Chicago, IL). Mixed ANOVAs were completed where applicable, with age divided into four groups: children (7 [or whatever the lower limit was for that test] to 12 years), adolescents (13–19 years), early adults (20–49 years), and late adults (50+ years). If Mauchly’s sphericity test was significant on repeated measures ANOVAs, results of the Greenhouse–Geisser were reported, with the corresponding reduction in the degrees of freedom. Pearson correlation coefficients were completed among measures on the same tests. Mean data are presented with the SEM, and nonlinear regressions depict the 95% confidence intervals. Effect sizes are expressed as partial η ² for ANOVAs and Cohen’s d for two-group comparisons.