Introduction

Reaction time has been used as a psychological task since the mid-19th century. Originally a result of astronomers’ noticing that observers made different responses to star transit times, Donders (1868, 1969) was early in introducing the technique to scientific psychology. Thereafter, it became a staple of scientific study in psychology. Famously, James McKeen Cattell (1890) suggested reaction time as one of the ‘mental tests’ that he introduced in 1890. This received endorsement from Francis Galton (1890), who used reaction time to test thousands of subjects (see Johnson et al., 1985). The use of reaction time grew and has persisted during the whole of the 20th century and into the 21st century (for example, as described in Aufdembrinke, Hindmarch and Ott 1988; Deary, 2000; Jensen, 2006). There are many different reaction time devices, and reaction times are taken in response to many psychological and other manipulations. However, two common and useful procedures are to measure simple reaction time and choice reaction time (here, we shall concentrate on four-choice reaction time). Simple reaction time involves making a response as quickly as possible in response to a single stimulus. Choice reaction time is complicated by requiring the subject to make the appropriate response to one of a number of stimuli. The experimental variables that are most commonly derived from both of these are some measure of the central tendency (mean or median usually), and a measure of intraindividual variability, typically the raw standard deviation of a number of trials or the coefficient of variation (Hultsch, MacDonald, & Dixon, 2002).

Simple and choice reaction times are relatively straightforward in conception and to perform, compared to many other mental tasks that are used within experimental and differential psychology. Of course, this should not be taken to mean that even such simple psychological tasks are not founded on a number of more basic psychological operations and parameters, which can be bound in complex models (e.g., Luce, 1991; Ratcliff, 2008). The stimulus-response contingencies of reaction time procedures are such that, when no time pressure is applied, errors are rare, and the time to complete an item is much less than a typical IQ-type test item. Despite the apparent lack of cognitive demand required to perform reaction time tasks, they have produced an interesting set of findings. Reaction times—especially choice reaction times—show marked slowing with age, which begins from young adulthood and accelerates after middle adulthood (Deary & Der, 2005a; Der & Deary, 2006). Indeed, reaction times have been viewed as capturing the capacity of processing speed that is a major foundation of the age-related declines in higher-level cognitive functions (Madden, 2001; Salthouse, 1996). Reaction times—especially choice reaction times—are moderately to strongly correlated with measures of general fluid intelligence (Jensen, 2006). For example, in one large (n = 900), representative sample of 55-year-olds in Scotland, four-choice reaction time correlated 0.49 with a measure of general intelligence (the Alice Heim 4 test; Deary, Der, & Ford, 2001). Reaction times—simple and choice, and their means and individual variability, are associated with survival. For example, in the same large group of 55-year-olds from Scotland, four-choice reaction time mean was strongly associated with survival over the next 15 years (Deary & Der, 2005b); and this was replicated in a sample of about 7,000 individuals aged from 18 to 80 (Shipley, Der, Taylor, & Deary, 2006). These are just a few empirical associations that make reaction time valuable in studying aspects of human psychology and health. In addition to these, reaction times are widely used in experimental psychology, psychopharmacology, medical studies, and areas beyond these (e.g., Strachan et al., 2001). Therefore, reaction time is a much-valued predictor and outcome variable in psychology. The examples cited above are just a few—using some from our own work—to provide examples of the range of psychological research—basic and applied—situations in which reaction times are used.

In view of the long period over which reaction times have been used, and their importance with regard to key aspects of human life, it is surprising that there is no standard reaction time measure. For example, when we reviewed the literature on something as straightforward as reaction time and age, it was remarkable that each study had used a different reaction time procedure, making comparisons difficult or impossible (Deary & Der, 2005a; Der & Deary, 2006). Therefore, it would be useful for a broad range of psychological disciplines and applications if there were a freely available reaction time test with some basic stimulus-response associations, a set of parameters which could be varied, and all set on a common platform. This lack and need were argued strongly by Jensen (2006, p. 241): “it would also be advantageous to provide standardized computer programs for a number of classical paradigms, which were originally intended to measure the speed of various information processes”. The purpose of the present study is to fill this gap. It aims to provide a free-to-all, easy-to-use programme that will allow means and standard deviations to be derived from simple and four-choice reaction times. We provide some initial reliability and validity data for the task. We also provide a location from which other researchers can download the reaction time programme and instructions.

Method

Participants

Fifty young adults aged between 18 and 25 years (mean = 20.5, SD = 2.6), 50 middle-aged adults aged between 45 and 60 (mean = 53.7, SD = 4.9), and 50 older adults aged between 61 and 80 (mean = 69.1, SD = 6.2) took part in the study. Participants were either students at the University of Edinburgh or residents from the City of Edinburgh. The students received course credit for their participation and all other adults were paid a small sum for taking part.

Reaction time tasks and other mental tests

The digit-symbol coding subtest of the Wechsler adult intelligence scale III (Wechsler, 1997), the matrix reasoning subtest of the Wechsler abbreviated scale of intelligence (Psychological Corporation (The), 1999), and the Wechsler test of adult reading (WTAR) (Psychological Corporation (The), 2001) were used as higher-level cognitive measures. Digit-symbol coding was included as a test of processing speed, matrix reasoning as a fluid-type (age-sensitive) intelligence task, and WTAR as a test of crystallised-type (age-insensitive) intelligence. The tests were applied according to instructions in the tests’ manuals.

Two reaction time tasks were used. These will be referred to as the Deary-Liewald reaction time task, and the numbers reaction time box. The Deary-Liewald task is the new, computer-based task of principal interest. The numbers reaction time box was employed for comparison, because there is much previous information about it: it has been used in large, epidemiological surveys in the UK, and its parameters’ associations with age, intelligence and mortality are known and replicated (Cox, Huppert, & Whittington, 1993; Deary et al., 2001; Deary & Der, 2005a, 2005b; Der & Deary, 2006; Huppert & Whittington, 1993; Shipley et al., 2006). Simple reaction time (SRT) and four-choice reaction time (CRT) means and standard deviations were measured for each participant on both tasks. In the SRT, participants had to press a button or key in response to a single stimulus. In the CRT, there were four stimuli and participants had to press the button that corresponded to the correct response. For both reaction time tasks, the SRT involved eight practice trials and twenty test trials. The CRT for both tasks involved eight practice trials and forty test trials. Subjects undertook a third reaction time task, but it is not reported further here.

Deary-Liewald reaction time task

This was designed by IJD and programmed by DL, with several iterations between the initial design and the final programme that was used here. The programme was run on a screen with a vertical refresh rate of 60 Hz. For the SRT, one white square was positioned approximately in the centre of a computer screen, set against a blue background (see Fig. 1). The stimulus to respond is the appearance of a diagonal cross within the square. Each time a cross appeared, participants had to respond by pressing a key as quickly as possible. Each cross remained on the screen until the key was pressed, after which it disappeared and another cross appeared shortly after. The inter-stimulus interval (the time interval between each response and when the next cross appeared) ranged between 1 and 3 s and was randomised within these boundaries.Footnote 1 The computer programme recorded the response time and the inter-stimulus interval for each trial.

Fig. 1
figure 1

Screen shots of the Deary-Liewald task for the simple reaction time task (left) and the choice reaction time task (right)

For the CRT, four white squares were positioned in a horizontal line across approximately the middle of the computer screen, set against a blue background (see Fig. 1). Four keys on a standard computer keyboard corresponded to the different squares. The position of the keys corresponded in alignment to the position of the squares on the screen: the ‘z’ key corresponded to the square on the far left, the ‘x’ key to the square second from the left, the ‘comma’ key to the square second from the right and the ‘full-stop’ key to the square on the far right. The stimulus to respond was the appearance of a diagonal cross within one of the squares. Participants were instructed to gently rest the index and middle fingers of their left hand on the ‘z’ and the ‘x’ keys, and the index and middle fingers of their right hand on the ‘comma’ and ‘full stop’ keys. A cross appeared randomly in one of the squares and participants were asked to respond as quickly as possible by pressing the corresponding key on the keyboard. Each cross remained on the screen until one of the four keys was pressed, after which it disappeared and another cross appeared shortly after. The inter-stimulus interval ranged between 1 and 3 s and was randomised within these boundaries. The computer programme recorded the response times for each cross, the inter-stimulus interval for each trial, which key was pressed and, in the case of four-choice reaction time, whether the response was correct or wrong. It also calculated the mean, median, variance, standard deviation, skewness, and kurtosis of the response times.

Numbers-based reaction time box

The numbers reaction time box was a rectangular, stand-alone box, originally designed for the UK Health and Lifestyle Survey (Cox et al., 1993; Fig. 2). It provided the data on ageing, correlations with intelligence, and associations with mortality that were summarised in the Introduction. On the top surface, there was a liquid crystal display (LCD) screen and five response buttons, each with a number written above it. The buttons were arranged underneath the LCD screen in a gentle curve to fit the natural position of the participant’s fingers. From left to right, the buttons were labelled with the numbers 1, 2, 0, 3, and 4 (see Fig. 2). The stimulus for response was the appearance of a number on the LCD screen. Subjects were asked to respond as quickly as possible when a number appeared. A number remained on the screen until participants made a response, after which it disappeared and another number appeared shortly after. The inter-stimulus interval ranged between 1 and 3 s and was randomised within these boundaries.

Fig. 2
figure 2

Illustration of the top surface of the numbers task box

For the SRT, only the number ‘0’ appeared on the screen. Participants were instructed gently to rest the index finger of their preferred hand on the button labelled ‘0’, and told that they would only be using this button. For the CRT, one of the numbers 1, 2, 3, or 4 appeared on the screen. Participants were instructed gently to rest the index and middle fingers of their left hand on the buttons labelled ‘1’ and ‘2’, and the index and middle fingers of their right hand on the buttons labelled ‘3’ and ‘4’, and to press the button which corresponded to the number that appeared on the screen. For the SRT, the box recorded mean and standard deviation of response times. For the CRT, the box recorded the number of errors and the means and standard deviations of response times for correct and incorrect responses. The numbers box does not record individual trial data.

Procedure

Participants first completed a short social and demographic questionnaire which asked questions about their age, gender, education (number of years in full-time education), and occupation (graded according to the SOC2000, based on the UK’s standard classification of occupations; Rose & Pevalin, 2003). The younger group was asked about their parents’ occupations. They then completed the tasks in the following order: reaction time task (a), matrix reasoning, reaction time task (b), WTAR, digit-symbol coding, reaction time task (c). The order in which the different reaction time tasks were completed was varied equally among the participants.

Results

Background and cognitive measures

Table 1 describes the means (SD) and Table 2 describes the frequencies for the background measures, cognitive measures and the reaction time results for the total sample and for different age groups. Percentiles of the Deary-Liewald reaction time task scores for the different age groups are shown in Appendix 1. The mean (SD) overall age was 47.7 years (20.9). The mean (SD) number of years in full time education was 15.1 (2.9). There was a significant difference between the age groups with regard to the Standard Occupational Classification (SOC2000; χ2[12, n = 150] = 24.46, p < 0.009; see Table 2). With regard to the cognitive measures, the mean (SD) total score for the WTAR was 44.3 (5.4), the mean (SD) total score for the Matrix Reasoning test was 24.6 (4.7), and the mean (SD) total score for the Digit-Symbol Coding test was 74.8 (15.4). One-way ANOVAs with a between subjects factor of age (three levels: young, middle-aged and old) revealed a significant effect of Age on the WTAR (F[2, 147] = 13.05, p < 0.01, η 2 = .15), the Matrix Reasoning test (F[2, 147)] = 33.73, p < 0.01, η 2 = .32), and the Digit-Symbol Coding test (F[2, 147] = 22.73, p < 0.01, η 2 = .24). Younger adults scored higher on the Matrix Reasoning and Digit-Symbol Coding tests, and lower on the WTAR, than the middle-aged and older adults. There was no difference between the middle-aged and old groups in any of these tests (see Table 1). The full correlation matrix for these variables is shown in Table 3. Most notable are the strong inverse correlations between age and Matrix Reasoning and Digit-Symbol Coding tests, and a substantial positive correlation between age and WTAR.

Table 1 Means and standard deviations (SD) for background, cognitive and reaction time task measures
Table 2 Frequencies, percentages and non-parametric tests for gender, handedness and occupational classification
Table 3 Pearson correlations among background and cognitive measures

Reaction time tasks

Comparison of the two reaction time tasks

With regard to the SRT measures, repeated measures t-tests revealed that the mean response time for the Deary-Liewald task (274.4 ms) was significantly longer than the numbers task (255.7 ms; t[149] = –6.30, p < 0.01). The mean SRT SD was lower for the Deary-Liewald task (45.3 ms) than for the numbers task (49.7 ms); t[149] = 2.24, p < 0.05). With regard to the CRT measures, mean response time was lower for the Deary-Liewald task (474.5 ms) than the numbers box (555.8 ms; t[149] = 18.08, p < 0.01). This may be due to the different stimuli used in the two tasks. The stimulus-response arrangement in the Deary-Liewald task was designed to rely on spatial coding, and hence may have been more straightforward than the numbers box, which required participants to recode a centrally placed number into the appropriate response. The mean SD of CRT response times was slightly lower for the Deary-Liewald task (100.1 ms), than the numbers task (108.2 ms; t[149] = 3.25, p < 0.01). The mean number of errors made with the Deary-Liewald task was 2.4, and with the numbers box was 2.5; there was no significant difference between them.

The correlations between the reaction time measures are shown in Table 4. With regard to the Simple reaction time (SRT) tasks, there was a large, significant positive correlation between the mean response times of the Deary-Liewald task and the numbers task (r[148] = .68, p < 0.01). There was also a significant positive correlation between the standard deviations (SD) of response times of the Deary-Liewald task and the numbers task (r[148] = .40, p < 0.01). The correlations between the means and SDs within both reaction time tasks were also significant: Deary-Liewald task (r[148] = .56, p < 0.01); numbers task (r[148] = .56, p < 0.01).

Table 4 Correlations among the measures of the simple and choice reaction time tasks for the Deary-Liewald task and numbers task

With regard to the choice reaction time (CRT) tasks, there was a very large, significant positive correlation between the mean response times of the Deary-Liewald task and the numbers task (r[148] = .82, p < 0.01). There was a large, significant positive correlation between the standard deviations (SD) of response times for the Deary-Liewald task and the numbers task (r[148] = .64, p < 0.01). The correlations between the means and SD within each task were also large and significant: Deary-Liewald task (r[148] = .82, p < 0.01); numbers task (r[148] = .78, p < 0.01). Faster participants were less variable. There was a small, significant positive correlation between the number of errors made in the Deary-Liewald task and the numbers task (r[148] = .18, p < 0.05). There were few errors overall. The number of errors and mean response times within each task were slightly negatively correlated: Deary-Liewald task (r[148] = –.24, p < 0.01); numbers task (r[148] = –.25, p < 0.01). Faster participants made more mistakes.

Reliability of the Deary-Liewald task

Internal consistency for the Deary-Liewald task was measured using Cronbach’s alpha and was very high for both the SRT (α = .94) and for correct responses on the CRT (α = .97). Reliability of the SD of response times was measured using a split-half analysis. A correlation was conducted between the SD of the first half of responses and the SD of the second half of responses, which revealed a high significant correlation for correct responses on the CRT (r[148] = .64, p < 0.01). The correlation was not significant for the SRT (r[148] =.15, p = 0.07). A further experiment on 20 participants was conducted to provide another measure of period-free reliability. Each participant completed the SRT and CRT twice immediately one after the other. Means and SDs for these tests are shown in Table 5. Correlations between the first test and second test were significant for the SRT mean (r[18] = .64, p < 0.01) and SRT SD (r[18] = .47, p < 0.05), and highly significant for the CRT mean (r[18] = .83, p < 0.01) and CRT SD (r[18] = .62, p < 0.01). The correlation was not significant for the number of errors made in the CRT (r[18] = .34 , p = 0.14).

Table 5 Means and standard deviations (SD) for the test-restest reliability study for the Deary-Liewald task

Reaction time correlations with age and intelligence

Table 6 shows the correlations between the background and cognitive variables with the measures from the two reaction time tasks. Age correlated significantly with all of the reaction time measures. Older people were slower and more variable, and made fewer errors. Education did not correlate significantly with any reaction time measure. People in more professional occupations (S0C2000) had faster SRT and CRT, and less variable CRT in both tasks. For the cognitive measures (WTAR, matrix reasoning and digit-symbol coding), we report both raw and age-adjusted correlations because of these measures’ different correlations with age (see Tables 3 and 5). The WTAR showed near-to-zero raw correlations. When age-adjusted, there were significant negative correlations with the CRT means and SDs for the Deary-Liewald and numbers tasks, and the SRT variables in the Deary-Liewald task. Matrix Reasoning was negatively correlated with most of the SRT and CRT variables. The effect sizes were reduced when age was controlled. Digit-symbol coding correlated negatively with the majority of reaction time measures, except errors, and these persisted, though reduced in effect size, when age was controlled. In all instances, the correlations with cognitive tasks were very similar for the Deary-Liewald task and the numbers task.

Table 6 Correlations between background and cognitive variables and the measures of the simple and choice reaction time tasks for the Deary-Liewald task and numbers task

Discussion

We have devised a new reaction time programme that allows the user to conduct simple and four-choice reaction time procedures. It allows certain experimental parameters to be adjusted. It collects data in a file that is straightforward to transfer for analysis. The programme is free, easy to use, and needs no special software. This report aims to let people know about the programme and invites them to use it. It also reports some data from a wide range of ages, spanning 18 to 80 years. The Deary-Liewald reaction time task provides reliable and valid measures. We found the expected associations between reaction time and age, and similarly with fluid intelligence and a psychometric test of processing speed. As expected, there was less association with crystallised intelligence. The associations with the same parameters from a very well studied reaction time device were very high, especially for choice reaction time.

With respect to investigations in intelligence differences (Der & Deary, 2003), ageing (Der & Deary, 2006), mortality (Shipley et al., 2006), and psychopharmacology (Strachan et al., 2001), it would be the four-choice reaction time measures (mean and standard deviation) that are recommended. Simple reaction time measures have lower associations with other variables generally, the distribution of simple reaction time means is less normal and the bivariate distribution with intelligence more problematic (Der & Deary, 2003), and simple reaction time standard deviations (intraindividual variability) have lower reliability here and elsewhere (Deary & Der, 2005a).

This report is intended to meet the need for a reaction time platform that is easily accessible to all relevant researchers. It also attempts to negotiate a tricky combination: of, on the one hand, being flexible enough to allow different researchers to run the test that they wish; and, on the other hand, of being sufficiently restricted so that different researchers can compare data because they are running the same basic task. Intentionally, there is no special software needed to run the test. We understand that many psychologists will wish to use reaction times that are tailor-made, with their own stimulus-response contingencies and manipulations, in order to test specific hypotheses. The Deary-Liewald task is not intended for them. It is intended for the large group of researchers who wish to have a standard simple or four-choice reaction time test to be used as a predictor or outcome variable.

We do not provide norms, and neither should we. We envisage slight between-study differences in overall levels of reaction times, based on their hardware (but see Appendix 3). However, within studies that use the same equipment for all subjects, the results will be useful: for making between-group comparisons, and for examining correlations.

We encourage researchers to download and use this reaction time programme in their studies (Appendix 2) and we offer to provide a summary of their findings on our website to provide a cumulative record of the findings with the task. As it becomes widely used, the validity and reliability data will accrue, and, after more than a century, it will be possible to compare studies that have used basically the same reaction time task.

Characteristics of the Deary-Liewald reaction time programme

The programme is deigned to run on all laptop and desktop computers, requiring no special software. We recommend using a monitor with a vertical refresh rate of 60 Hz or better and with a pixel response time of 5 ms or faster (nearly all modern monitors fit this description). A simple, single screen page for the experimenter provides the following with respect to task set-up. The subject identity can be entered and the location for the saved data file. For SRT the experimenter can: indicate the number of practice and experimental trials required, the range (in milliseconds) for acceptable responses, and the range for the inter-stimulus interval. The experimenter can select to run a practice or the experiment proper. For CRT, the experimenter has the same control. Additionally, the response keys that correspond to each stimulus box may be programmed, simply by typing them into boxes on the screen. The programme allows the experimenter to save default settings. Data from the programme are saved to a database on the computer, from where they can be exported easily to a .csv file.

The location for downloading this programme is given in Appendix 2. There, the user will find the fully operational programme and brief instructions for use. The standard operating procedure for this task is in the supplementary online information.