Diagnosing eyewitness identifications with reaction time-based Concealed Information Test: the effect of viewpoint congruency
In 2004, Romano van der Dussen, a Dutchman who lived in Spain, was sentenced to 15½ years in prison on two accounts of assault and rape. The two assault victims and a witness initially did not recognize him from mugshots in the police database. When they saw the mugshots a second time 3 weeks later, they did point out van der Dussen as the perpetrator, and they later identified him again from a police lineup. Apart from the undesirable repeated recognition attempts (Wixted et al.,
2021), there were several other issues with the identification procedure in this case. For example, the witness saw the perpetrator from more than 10 m distance, for a very short time, and in the dark, putting her in a very poor condition to identify the perpetrator (Nyman et al.,
2019). Another issue concerned the lineup itself: the blond-haired van der Dussen was placed among black-haired foils, making him the only lineup member who matched the perpetrator description (cf. recommendation #4 from Wells et al.,
2020). Other warning signs included that no physical evidence linked him to the crimes and that an alibi witness testified that van der Dussen had been at a party 30 km away. In 2004, DNA taken from one of the victims matched with Mark Dixie, a British man convicted for murder. Van der Dussen was only released from prison when Dixie confessed in 2016 (Lindemans,
2019).
Establishing the identity of a perpetrator is at the heart of crime investigation. When investigators have narrowed down their search of a suspect, witnesses may view a live, photo or video lineup that contains the suspect and several foils who are known to be innocent. It is the task of the eyewitness to identify the person who they saw commit the crime—or to reject the lineup if that person is not in the lineup. Decades of research on eyewitness memory have identified conditions that support and impede eyewitnesses in making accurate identification decisions and have resulted in policy changes that are aimed at supporting eyewitness memory. Yet, as the van der Dussen case demonstrates, improper lineup procedures still happen in practice, putting innocent suspects at risk of misidentification and conviction (e.g., Christianson et al.,
1992; Davies & Griffiths,
2008; Epifanio v. Madrid,
2009; Garrett,
2011; Thompson-Cannino et al.,
2009; van Koppen & van der Horst,
2006; Wagenaar,
2009). Under such circumstances, error rates for lineups can be high, with an average of about 50% across conditions (e.g., Clark et al.,
2008; Fitzgerald & Price,
2015). As a result, the use of explicit identification procedures has decreased considerably in some countries (e.g., the Netherlands) and other countries dismiss them altogether (e.g., South Korea, Indonesia). Indirect assessments of recognition, such as the Concealed Information Test (CIT; Lykken,
1959) might provide for an alternative. Advantages of indirect measures of recognition include that they are less intentional, faster, and more stimulusvdriven than direct measures of recognition. But it is important to map their boundary conditions (Verschuere & Meijer,
2014). Here, we tested the validity of the CIT as a means of diagnosing face recognition under viewing conditions that were congruent or incongruent during encoding and testing.
The CIT is a well-established memory detection technique (Lykken,
1959; for a review see Verschuere et al.,
2011) that resembles lineups in some aspects. Similar to a lineup, a CIT includes different types of stimuli: the correct, crime-related stimulus (e.g., murder weapon: a pistol) that is embedded in several plausible stimuli that are not crime-related (e.g., a rifle, a knife, an axe, an injection needle). Instead of relying on explicit responses (“This is the murder weapon”), the CIT infers explicit recognition in an indirect way, namely from neural (e.g., blood oxygen level-dependent response in fMRI; P300 event-related potential), physiological (e.g., skin conductance reaction), or behavioral (e.g., reaction times) responses. In our example, police could ask the suspect about the murder weapon: Was it …. A rifle? … An axe? …A knife? … A pistol? … An injection needle? Differential reactions to the actual murder weapon, the pistol, compared to other stimuli, indicate recognition. When combining multiple questions, for example about stolen goods, the crime scene, and the location of the crime, the CIT can detect recognition with high validity (Meijer et al.,
2014,
2016).
A variation of the classic CIT, the reaction time-based CIT (RT-CIT) requires only a single computer and enables web-based testing with high reliability and validity (Kleinberg & Verschuere,
2015; for a theoretical analysis, see Verschuere & De Houwer,
2011). The RT-CIT uses reaction times to index recognition of concealed information. To assure attention to the stimuli and avoid mindless and indifferent responses to all stimuli, the RT-CIT introduced a third type of stimuli, namely targets.
1Targets are non-crime-related stimuli that the participants need to detect, and usually study just before the test. During the RT-CIT task, the stimuli appear on screen sequentially and participants press one key for the targets and another for all other stimuli. Building on the example above, participants may learn that the CIT will examine recognition of the murder weapon and to press the YES key whenever encountering the target (e.g., a rifle) and the NO key for all other stimuli. For innocent (unknowledgeable) participants, all NO reaction times should be similar. For guilty (knowledgeable) participants, the option
pistol should stand out and affect their response times. Longer reaction times for NO responses to the crime-related stimulus than NO responses to irrelevants provide an index of recognition. A meta-analysis reported a large effect size of Cohen’s
d = 1.04 (corrected), confirming the diagnosticity of the RT-CIT (Suchotzki et al.,
2017).
In the first application of the CIT in the context of face recognition, participants made explicit identifications in an event-related potential-based CIT after they watched four mock crimes (Lefebvre et al.,
2007). Both the CIT and explicit identifications revealed recognition of the perpetrator. Yet, the electrophysiological index of recognition may have been the result of the simultaneous explicit identification. In recent, stricter applications of the CIT protocol in a typical eyewitness paradigm, the RT-CIT showed a good capacity to differentiate the stimulus film actors (i.e., probes) from irrelevants in three experiments (
d = 1.21; Georgiadou et al.,
2019, Experiment 2b;
ds = 0.85 and 0.74; Sauerland et al.,
2023) and moderate capacity in another (
d = 0.39; Sauerland et al.,
2019, Experiment 4). Additionally, one experiment included a probe-absent CIT condition and demonstrated a good capacity of the CIT to differentiate between guilty and innocent suspects (Sauerland et al.,
2023, Experiment 2).
Not all attempts of applying RT-CIT for diagnosing face recognition were successful, however. A series of five experiments reported a small average effect size (
d = 0.14; Sauerland et al.,
2019). These conflicting findings within facial recognition RT-CIT experiments might originate from differences in encoding conditions and event complexity. Experiments with moderate to large effects included only two rather than four actors and provided ample close-ups of both (Georgiadou et al.,
2019, Experiment 2b; Sauerland et al.,
2019, Experiment 4, Sauerland et al.,
2023). In the experiment with the largest effect size (Georgiadou et al.,
2019, Experiment 2b), encoding was additionally enhanced by presenting the pictures of the actors for 15 s after participants had viewed the stimulus film and prior to taking the RT-CIT. From an applied eyewitness identification perspective, this setup was somewhat flawed, though, because the presented picture was identical to the picture used in the CIT (Burton,
2013). Nevertheless, these experiments combined suggest that a certain degree of memory strength might be required to ensure reliable performance in the CIT. Although encoding conditions are not under the control of investigators, this finding might be useful in cases with good encoding conditions.
In the current experiment, we manipulated the congruency of viewing angle at encoding vs. testing to further investigate the impact of encoding conditions on the validity of the RT-CIT as an index of facial recognition. For half of our participants, the viewing angle at encoding and testing matched (both frontal or both profile view), for the other half, encoding and testing viewing angle mismatched (encoding: frontal—testing: profile and vice versa). Recognition of unfamiliar faces becomes more difficult as angular rotation between encoding and recognition increases. Face recognition experiments have first demonstrated this effect with research designs that used photos both at encoding and testing (Crookes & Robbins,
2014; Johnston & Edmonds,
2009; Liu & Chaudhuri,
2002). Recently, an eyewitness identification paradigm where participants viewed a filmed mock theft at encoding and a photo lineup at testing confirmed this effect (Colloff et al.,
2021). Altogether, these findings suggest that we store unfamiliar faces in a viewpoint-dependent manner.
Congruency of stimuli at encoding and testing can also affect the size of the CIT effect. In one experiment, participants encoded stimulus items either verbally or pictorial (van der Cruyssen et al.,
2021). The subsequent RT-CIT protocol presented both types of modalities. Confirming the idea of a modality-match advantage, the CIT effect was larger when the modalities at encoding and retrieval matched (
ds between 0.40 and 0.60) than when they mismatched (
ds between −0.14 and 0.59). Another experiment tested the effect of encoding–testing congruency by varying the level of abstraction of the presented stimuli (Geven et al.,
2019). Participants viewed either exemplar (e.g., Mercedes) or categorical stimulus items (e.g., car) at encoding and the CIT protocol matched or mismatched this stimulus representation. Again, congruent stimulus presentation at encoding and testing elicited a stronger CIT effect (
ds = 0.47 and 0.55) than incongruent stimulus presentation (
ds = −0.23 and 0.06). Another set of two experiments tested whether angular rotations of the crime-related images in the CIT protocol, compared to encoding, affected the CIT effect (Hsu et al.,
2020). A CIT effect emerged in all conditions, but decreased for more occluding angles such as 90° and 270°. Combined, these findings further support the idea of a superiority performance of matched conditions across different tests of recognition. However, previous work has not tested the effect of view congruency of face stimuli on the strength of the CIT effect.
In the current line of research, participants viewed a stimulus film that showed one actor primarily from the front and one actor primarily in profile view. At test, participants completed an RT-CIT (Experiments 1 and 3) or made lineup decisions (Experiments 2 and 4). The lineup data served as a benchmark of eyewitness performance. Participants viewed the facial stimuli at test from the same as or a different perspective than at encoding. We expected better identification and hence a stronger CIT effect (i.e., difference in reaction times to probes vs. irrelevants) when viewing angle was congruent, compared to incongruent (CIT congruency effect; hypothesis 1). We also predicted that identification performance in lineups would vary as a function of congruency (lineup congruency effect; hypothesis 2). The relative capacity of the CIT and lineups to diagnose face recognition is of strong applied interest, but we had no hypothesis for this comparison. Experiments 1 and 2 did not confirm our hypotheses and showed largely inconclusive results. We therefore conducted two preregistered replication experiments (Experiments 3: RT-CIT; and 4: lineup) for which we strengthened the view congruency manipulation and increased power.
Discussion
The RT-CIT is a well-established memory detection technique that allows for indirect assessments of recognition. It might therefore provide a potent alternative to classic lineups as an identification procedure. Here, we tested the validity of the RT-CIT as a tool for diagnosing facial recognition under congruent or incongruent viewing conditions during encoding and testing. We also tested identification performance in a classic lineup condition to create a benchmark of eyewitness performance. Based on the finding that we store unfamiliar faces in a viewpoint-dependent manner (Johnston & Edmonds,
2009), we expected a stronger CIT effect (hypothesis 1) and better lineup performance (hypothesis 2) when viewing angles during encoding and test were congruent, rather than incongruent. Replicating earlier work (Georgiadou et al.,
2019; Sauerland et al.,
2023), but with entirely different stimulus materials, the RT-CIT showed a good capacity to diagnose face recognition (Experiment 1:
d = 0.91; Experiment 3:
d = 0.63). Only Experiment 3 (but not Experiment 1) supported the idea that view congruency moderates this effect (hypothesis 1): the RT-CIT effect was larger for congruent viewing conditions than incongruent viewing conditions. Yet, the effect size for this comparison was small (
d = 0.25) and may have depended on probe role, as suggested by an exploratory, non-preregistered follow-up analysis. In the two lineup experiments, view congruency moderated lineup performance for one of two lineups, lending only partial support to hypothesis 2. Bayesian analyses suggested that identification performance in the RT-CIT vs. lineups did not differ in our first comparison (Experiment 1 vs. 2), but was much stronger for lineups than the RT-CIT in our second comparison (Experiment 3 vs. 4).
Research in face recognition suggests that people are better at recognizing unfamiliar faces if the viewing angle at test is similar to the viewing angle at encoding (Johnston & Edmonds,
2009). Similarly, although never tested with face stimuli, the diagnosticity of the CIT can vary as a function of congruency of stimuli at encoding and testing (Geven et al.,
2019; Hsu et al.,
2020; van der Cruyssen et al.,
2021). It was therefore unexpected that view congruency did not moderate the CIT effect in Experiment 1. In the replication with a strengthened congruency manipulation and a larger sample, we found a significant interaction effect between the CIT effect and congruency when analyzing both probes together, as preregistered and following the standard procedure in the CIT literature (Suchotzki et al.,
2017; Experiment 3). Taken together, the two experiments suggest that view congruency may have a small to moderate effect on the size of the CIT effect and that Experiment 1 may not have had enough power to detect this effect.
Across two experiments, we found only partial support for the hypothesis that view congruency moderates lineup performance (hypothesis 2). Differences in stimulus materials could explain this deviation from the face recognition literature (Johnston & Edmonds,
2009). Experiments in face recognition use photographs both at encoding and at recognition. To simulate the eyewitness situation more closely, we used videos during encoding and photographs during recognition. Despite carefully editing the stimulus films, especially for Experiments 3 and 4, the videos do not show the actors exclusively from a 0° or 90° view but also with slight rotations. Additionally, the richer information about the probes’ appearance during encoding by means of the three-dimensional presentation might counter the effect of view congruency. Future lineup experiments on the effect of view congruency on identification performance might test this idea further.
For being useful in the field, the capacity of the RT-CIT to diagnose face recognition needs to be better or at least equivalent to people’s lineup performance. To compare both methods, we tested identification performance with the RT-CIT and traditional lineups. In our first comparison (Experiment 1 vs. 2), performance in the RT-CIT and lineups was largely equivalent, but in our second comparison (Experiment 3 vs. 4), lineups clearly outperformed RT-CIT. Two previous experiments that compared RT-CIT and lineup performance were inconclusive (Sauerland et al.,
2023): some Bayes factors supported the idea that the two procedures were equivalent, some that lineups were superior, and some that RT-CIT was superior. Combined with the current findings, we can only conclude that compelling or consistent evidence for the superiority of one method over the other is still lacking.
Limitations and future perspectives
One issue of interest is that the CIT effects we observed here – similar to other experiments that tested the validity of the RT-CIT for diagnosing face recognition – was below the average effect size commonly found in RT-CIT experiments (i.e.,
d = 1.04 in a meta-analysis, Suchotzki et al.,
2017; cf. Sauerland et al.,
2023). Those strong effects in memory detection likely derive from the high self-relevance of the probes and the combination of several stimulus groups in one CIT protocol (e.g., sites of crime, identity of accomplices). Options for enhancing the CIT effect in face recognition – and hence while being limited to facial stimuli – might include the use of familiar targets or increasing the number of targets (cf. Suchotzki et al.,
2018). Furthermore, adding different aspects of a person, such as full body pictures with the face covered, clothing, or accessories (Pryke et al.,
2004; Sauerland & Sporer,
2008; Sauerland et al.,
2013) could be a way of adding more stimulus groups to the CIT protocol.
Another observation on the strength of the CIT effect is that compared to Experiment 1, Experiment 3 elicited faster reaction times, fewer errors, a weaker CIT effect, and poorer recognition performance from the follow-up photo display on a descriptive level. Looking at the differences between those two experiments suggests that the enhanced practice procedure in Experiment 3 may be the cause of these differences. Spreading the encoding of the target faces over three rather than two occasions and increasing the practice blocks from one to two blocks to three to five blocks likely strengthened memory for the targets while at the same time undermining memory for the probes. This seems to have both desirable (low error late) and undesirable effects (weaker CIT effect, weaker recognition performance from the follow-up photo display). Future CIT research should keep such effects of the design of the practice phase in mind when fine-tuning the CIT protocol.
Thus far, comparisons of witness performance in the RT-CIT vs. lineups are inconsistent (the current work; Sauerland et al.,
2023). The most relevant question for future investigations could be whether the RT-CIT outperforms lineups under certain conditions, for example whether RT-CIT might be less prone to biases that concern the construction and administration of the procedure than lineups. Because of the indirect character of the RT-CIT, its outcomes might be less vulnerable to the social demands often encountered during lineup administration (cf. Wells & Luus,
1990). Likewise, CIT may benefit people who perform comparably poor in lineups, for example children and older adults (Brackmann et al.,
2019; Fitzgerald & Price,
2015; Martschuk & Sporer,
2018). It is also conceivable that encoding conditions differentially affect the two identification procedures. Indeed, in another comparison between RT-CIT and lineups, observation time did not moderate the CIT effect across two experiments, whereas it did moderate the CIT effect in probe-absent lineups in one experiment (Sauerland et al.,
2023).
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.