Introduction
Eyewitnesses’ memory for the face of a perpetrator is commonly tested by means of an identification procedure, for example, a lineup or showup. It is well established that eyewitnesses who are submitted to such a procedure can help solving a crime by pointing out the actual perpetrator, but it is equally well known that eyewitnesses can err. In the worst case, a wrongful identification decision can lead to a wrongful conviction while allowing the guilty party to remain free and reoffend. Wrongful identifications were involved in 70% of the wrongful convictions uncovered by the innocence project (innocenceproject.org; cf. Kassin et al.,
2012; Wells et al.,
1998). While identification accuracy can vary widely across conditions, different meta-analyses show that, on average, accuracy for six-person lineups (i.e., seven answer options: all six lineup members and the option to reject the lineup) revolves around 50% (e.g., Clark et al.,
2008; Fitzgerald & Price,
2015; Steblay et al.,
2011). Although proper lineup construction and administration can increase accuracy rates (e.g., Brewer & Palmer,
2010), the risk of false identifications remains and continues to be a major concern in the field. Scholars have recently questioned researchers’ sustained confinement to the traditional eyewitness identification paradigm (Brewer & Wells,
2011; Wells et al.,
2006). More specifically, it has been argued that existing research may not be radical enough, with new procedures merely constituting adaptations of existing ones (Dupuis & Lindsay,
2007), rather than generating fundamentally new approaches for testing eyewitnesses’ memory for faces. Existing procedures rely on explicit identification, often after some deliberation. One possible source of error is the constructive identification through reasoning (i.e., the culprit is likely to be included and number 4 looks most like him, so it must be number 4). More gross errors in explicit identification may come from uncooperative eyewitnesses that deliberately point to the wrong person (e.g., to protect someone else; being bribed; after being threatened; see Leach et al.,
2009; Parliament & Yarmey,
2002). In other words, explicit identification is prone to subtle biases in human decision making and strategic misidentification. One alternative might be to rely on indirect measures. Such responses are attractive in the sense that they may be unintentional, uncontrollable, goal independent, autonomous, purely stimulus driven, unconscious, efficient, or fast (Moors & De Houwer,
2006). First evidence supporting the idea that indirect measures can provide information about face recognition comes from two studies with pre-school and school children (Newcombe & Fox,
1994; Stormark
2004). In these studies, participants first viewed a slide show of previously familiar faces (playmates or previous classmates) and unfamiliar faces, while their skin conductance, heart rate, or both were recorded. Subsequently, direct face recognition responses were collected. Although both direct and indirect measures scored above chance in both studies, the indirect measures outperformed direct recognition decisions. In the current line of research, we embraced the call for exploring a potential adaption of the identification procedure in a venture that tested an indirect index of eyewitness identification: the concealed information test (CIT; Lykken
1959).
The CIT is a well-established memory detection technique (for a comprehensive review, see Verschuere et al.,
2011). At first, the CIT looks much like a multiple-choice examination, presenting the examinee with the correct answer embedded amongst a series of incorrect answers. The CIT is used when the examinee may not be able or willing to explicitly identify the correct alternative, and, therefore, does not rely upon an explicit answer but rather on more automatic responses to determine recognition. Suppose an exclusive blue Porsche has been stolen, and the police has a suspect that denies any involvement or knowledge about that theft. The suspect of the car theft could be asked about the stolen car: Was it ….a white Bentley?...A green Mercedes?...
A blue Porsche?...A yellow Ferrari?...A black Jaguar? Stronger (e.g., electrodermal) responding to the actual stolen car compared to the other cars, is taken as an index of recognition. When combining several questions, the CIT can detect concealed recognition with high validity. Reviewing a range of indices, varying from event-related potentials (ERPs) to reaction times, Meijer et al. (
2016) reported the diagnostic efficiency of the CIT (i.e., the area under the curve) to be around 0.82–0.94. This means that in such studies, a randomly chosen person with recognition has an 82–94% chance to respond stronger in the CIT than a randomly chosen person without recognition. In recent years, there is growing interest in the use of reaction times as the response measure in the CIT (for a review, see Suchotzki et al.,
2017). Response times can be administered and analyzed cost and time efficient, requiring a single computer. In the reaction time-based-CIT, the answer alternatives are presented briefly, one by one, on the computer screen. To assure attention to the stimuli, the examinee engages in a binary classification task, pressing a unique button for a set of stimuli learned just before the test (i.e., the targets) and another button for all other stimuli (including the correct answer or probe, as well as all foils, called irrelevants). Building on the example above, the examinee may be explained that the CIT will examine recognition of the stolen car, and asked to press the YES button whenever encountering the target (a red Maserati) and the NO button for all other cars. For the innocent examinee, all NO-reaction times will be roughly similar. For the guilty examinee, the blue Porsche will stand out and grab attention. Longer reaction times for the blue Porsche as compared to the other NO-reaction times provides an index of recognition. After the initial validation of the reaction time-based CIT (Farwell & Donchin,
1991, Seymour & Kerlin,
2008; Seymour & Schumacher,
2009; Seymour et al.,
2000), several recent well-powered studies have confirmed its diagnostic efficiency (Kleinberg & Verschuere,
2015,
2016; Verschuere et al.,
2015,
2016; for a discussion of its boundary conditions and limitations, see Verschuere et al.,
2011; and Meijer et al.,
2016).
Meijer et al. (
2007) conducted two studies to examine whether the ERP-based CIT is sensitive for concealed face recognition. In their first experiment, the CIT was capable of picking up recognition of the faces of siblings and close friends. In their second experiment, the CIT did not show students’ recognition of their faculty professor faces. Seymour and Kerlin (
2008) had participants memorize a set of previously unknown faces, and the reaction time-based CIT showed high accuracy in concealed face recognition. The stimuli used in these studies, however, were not typical of eyewitness identification, because the correct faces were either very familiar or well memorized rather than incidentally encountered as in the case of eyewitnesses. In addition, they were not matched in terms of their outer appearance. As such, they would not meet requirements of a formal identification procedure in an investigation (cf. Technical Working Group for Eyewitness Evidence
1999; Wells et al.,
1998). More specifically, Wells et al.’s (
1998) rule 3 concerning the structure of lineups and photospreads of states:
The suspect should not stand out in the lineup or photospread as being different from the distractors based on the eyewitness’s previous description of the culprit or based on other factors that would draw extra attention to the suspect. (p. 630).
This rule is further specified with the fit-description criterion which stresses the importance that distractors should fit the eyewitness’s verbal description of the perpetrator (Technical Working Group for Eyewitness Evidence
1999; Wells et al.,
1998). Thus, when the eyewitness describes the perpetrator as ‘young, white female, blond hair’, the lineup should consist of young white females with blond hair.
Lefebvre et al. (
2007) were the first to propose the CIT for the purpose of eyewitness identification, namely, to use incidentally encountered faces, and to match faces following guidelines for eyewitness identification. Participants watched four mock crimes across two testing sessions. In the perpetrator-present conditions, participants were presented with the photograph of the perpetrator, the victim, and five foils, one by one, on the computer screen, while electrophysiological recordings were made. Deviating from the classic CIT procedure, participants could respond to each picture by pressing one of three buttons, indicating that this picture depicted the perpetrator, the victim, or another person. In other words, participants made an explicit identification in this ERP-based CIT. The CIT revealed recognition of the perpetrator, and so did explicit identification. While the results point to the potential of the CIT for cooperative eyewitness identification, the electrophysiological index of recognition may have been evoked by the explicit identification. In a second ERP-based CIT study (Lefebvre et al.,
2009), the effects were replicated, but also extended by examining the role of active concealment. In the deceptive condition, participants concealed the identity of the perpetrator from the experimenters by pressing the button that corresponded with an innocent individual, rather than perpetrator. Results confirmed the earlier finding, showing that even when trying to conceal their knowledge, the CIT revealed recognition of the perpetrator’s face.
Taken together, there is preliminary evidence that the ERP-based CIT may be useful for testing the facial memory of cooperative eyewitnesses. In the present research line, we examined whether the findings extend to the reaction time-based CIT, which is much easier to apply. This was tested in a series of five experiments. We expected that the recognition of a face previously encountered in a stimulus event (probes) would be reflected in longer reaction times, compared to reaction times for irrelevants.
Overview of the studies
Participants witnessed a crime involving one or more individuals. The subsequent reaction time-based CIT assessed face recognition of the individuals involved in the crime. Using the classic CIT procedure, participants pressed one specific key for all stimuli (i.e., irrelevants and probes), except for the target stimulus that was memorized prior to the CIT.
1 The progression of the five conducted experiments can be described as follows: in Experiment 1, one stimulus film depicting four actors who played a thief, a victim, and two bystanders was used. The lineup referring to each actor was presented prior to the referring CIT to receive a lineup performance measure that was unimpaired by CIT presentation. In all subsequent experiments, the CIT was presented first, to obtain CIT performance that was unimpaired by participants’ lineup decision. The use of only one stimulus film in Experiment 1 raised the question whether diverging findings could be attributed to certain roles the film featured (i.e., more attention paid to the thief than a bystander) or characteristics of certain actors (e.g., higher or lower distinctiveness). Therefore, we used different stimulus film versions for all subsequent experiments in which actors switched roles across versions, while the plot was identical.
Following null findings and contradictory results in Experiments 1 and 2, and emerging insights into the validity of the reaction time-based CIT, we realized that we may have used a suboptimal CIT protocol. Indeed, Verschuere et al. (
2015) showed that using a separate CIT per probe (i.e., one for victim, one for thief, etc.) reduced accuracy and that it is recommended to use one CIT that presents all items completely intermixed (see also Lukasz et al.,
2017). In Experiments 3–5, we, therefore, administered such a multiple-probe CIT, in which all probes, that is, all actors that appeared in the stimulus event and all corresponding irrelevant items were presented in random order. Following small effect sizes in Experiment 3, we considered the possibility that our stimulus films had not allowed for sufficient encoding of the actors’ faces. We, therefore, prepared a less complex stimulus film with only two actors and optimal viewing conditions (long facial viewing time, including close-ups, for both actors) for Experiment 4. Indeed, small but significant effects materialized for the two actors (thief and victim) in this experiment. The final experiment (Experiment 5) additionally addressed three issues: for one, Experiment 5 included an additional practice block and a minimum proportion of accurate reactions during practice before a participant could move on to the actual CIT. Second, a virtual reality event was used instead of a real life film, to be able to better control the actions and exposure of the subjects featured in the mock crime and to offer participants a more realistic experience of the mock crime (cf. Gorini et al.,
2007; Kim et al.,
2014; Riva
2005; Schultheis & Rizzo,
2001). Finally, we included two control objects in the stimulus event. Finding an effect for the objects but not the faces would replicate earlier findings concerning objects (e.g., Suchotzki et al.,
2014; Verschuere et al.,
2004; Visu-Petra et al.,
2012), showing the validity of the CIT for objects and strengthen the conclusion of the absence of an effect for lineup faces. Anticipating the results, we found a CIT effect for objects, but not lineup faces. Comparison of the methodology in the current studies and CIT research in memory detection in suspects opens new perspectives on when reaction time-based CIT serve as a useful tool to diagnose face recognition in cooperative eyewitnesses.
Discussion
It was the aim of the current line of research to test an alternative to traditional, explicit lineup identification for testing cooperative eyewitnesses’ memory for faces, using an indirect measure of face recognition. To this end, we transferred the reaction time-based CIT methodology that is well established in the field of memory detection in suspects to the field of eyewitness identification. The idea that reaction times in a CIT task should be greater for faces that were previously encountered in a stimulus event as compared to irrelevant foils was tested in a series of five experiments. The methodology of the studies sequentially progressed and addressed possible explanations for non-significant and inconsistent findings. Across 16 reaction time comparisons, seven were in favor of reaction time-based CIT predictions, whereas half of the tests returned no significant effects and one effect was opposite to our expectations. A meta-analysis showed that the overall effect size was very small. Our findings do not support the use of the reaction time-based CIT for testing cooperative eyewitnesses’ facial recognition memory. These findings contrast with the finding that the ERP-based CIT may be useful for eyewitness identification (Lefebvre et al.,
2007,
2009). At least three explanations need to be considered for this apparent discrepancy.
First, it is possible that the stimulus event did not allow for sufficiently deep encoding of the faces. We think that this explanation is unlikely, because the considerable identification accuracy rates in Experiment 1—where the lineups were presented prior to the CIT—are in line with accuracy rates reported in the literature (e.g., Clark et al.,
2008; Fitzgerald & Price
2015; Steblay et al.,
2011) and with those reported in the previous experiments using the same stimulus film (Sagana et al.,
2015,
2014, Experiments 2a–c, 3; Sauerland et al.,
2012). In addition, results from a previous study deem the likelihood that the stimulus persons used in our experiments were particularly difficult to encode unlikely. Specifically, the films used in Experiments 2 and 3 served as stimulus materials in a study looking at eyewitnesses’ memory reports (Sauerland et al.,
2014). Collapsed across different recall conditions, participants reported on average, about 53 person details (i.e., details referring to the appearance of the individuals shown in the film, including facial details, description of clothing, build etc.), of which, on average, 73% were accurate. Together, these findings do not seem to support the notion that it was particularly difficult to encode the actors shown in our stimulus films. Finally, while rerunning our meta-analyses including only participants who correctly identified the actor from an actor-present lineup increased the average effect size, this increase was carried by Experiment 1. It appears that viewing the lineup prior to the CIT—which was only the case in Experiment 1—improved CIT performance. Accordingly, it seems most appropriate to consider the average effect sizes excluding Experiment 1 as true effect of the reaction time-based CIT. These effect sizes were very small (including all participants from Experiments 2–5:
d = 0.05; including only participants who accurately identified the actor from the lineup in Experiments 2–5:
d = 0.15), regardless of accurate actor identification. This confirms our conclusion that a reaction time-based CIT does not work for lineups, even if explicit recognition occurred.
Second, a more likely explanation for our findings concerns the careful matching of the employed faces, as required by eyewitness identification procedural guidelines (e.g., Wells et al.,
1998). Lineup pictures were deliberately selected to match the general description of the probes, leading to matched hair color and length, body type, and age. In fact, during debriefing, many participants spontaneously commented on the resemblance of the different stimulus faces. While the selection of individuals that match in their general description is a necessity in lineup construction, it might be obstructive for the CIT. Indeed, it was found that the more the irrelevants resemble the probe, the smaller the CIT effect (Ben-Shakhar & Gati,
1987). This may explain why Seymour and Kerlin (
2008; see also Meijer et al.,
2007) did find the reaction time-based CIT to be responsive to face recognition. They selected their facial stimuli from the Aberdeen Psychological Image Collection, which contains pictures of 116 people that have not been selected to match any criteria. While Lefebvre et al.’s facial stimuli (
2007,
2009) were matched for some attributes, such as gender, age, race, and hair length, no information was given about other features such as hair color or hair style, and no measures of effective lineup size were provided. Thus, it is possible that the conditions for creating a fair lineup and creating an effective CIT are mutually exclusive. This notion was also confirmed by our findings referring to objects in Experiment 5. Here, the expected CIT effect was found. The fact that the crime-related objects (e.g., Hotel sign) were quite distinct from the irrelevant foils (e.g., Advent wreath, Dutch flag, Art show sign, carnival garland, occupation banner reading “This is ours”) may have contributed to the CIT effect for the objects. One way to test this idea would be by conducting a study with closely matched objects
4 or with non-matched faces.
Third, our findings are in line with the emerging idea that different psychological processes may underlie the reaction time-based CIT and the ERP-based CIT (klein Selle et al.,
2017). Lefebvre et al. (
2007,
2009) provided evidence that the ERP-based CIT is sensitive to face recognition, independent of active concealment attempts. Our series of studies points to the possibility that the reaction time-based CIT critically depends on active concealment, explaining our observed null effects in cooperative witnesses. This reasoning is supported by Suchotzki et al. (
2015) who suggested that the reaction times increase to probes reflects response inhibition (see also Seymour & Schumacher,
2009; Verschuere & De Houwer,
2011). Suchotzki et al. (
2015) observed a reaction time-based CIT effect only when mock crime participants attempted to hide crime knowledge, but not when admitting crime knowledge. Thus, it is possible that stronger forms of active deception may be crucial for obtaining the reaction time-based CIT effect, than achieved here.
4 This leads to the intriguing possibility that (1) CIT measures that do not depend on active deception—electrodermal responding and the P300 ERP may be effective in both cooperative and non-cooperative eyewitnesses and (2) the reaction time-based CIT may be effective in non-cooperative (i.e., deceptive) eyewitnesses (cf. Lefebvre et al.,
2009).
To summarize, the results of the presented five experiments indicate that the reaction time-based CIT is not a valid means of testing facial recognition in cooperative eyewitnesses with matched faces. The findings indicate that it is important to map how stimulus distinctiveness affects the validity of the reaction time-based CIT.