Introduction
Human memory is not simply a precise tape recorder, but a system that encodes and ultimately represents knowledge through a process of active reconstruction rather than passive reproduction (Schacter,
2001; Vecchi & Gatti,
2020). This was first documented in Bartlett’s pioneering studies on memory (
1932) and supported by a large body of subsequent research, in which it was demonstrated that participants would forget the precise features of the stimuli they have memorized, in favor of an extraction of the gist of the information (for a review: Brainerd et al.,
2008a; Chang & Brainerd,
2021). That is, humans would use their semantic memory to encode, store and retrieve information, adapting new information to what has been previously memorized, with systematic errors that may occur during these phases (Brewer & Treyens,
1981; Sulin & Dooling,
1974).
The origins of inaccurate or false memories have therefore gradually become a matter of great scientific interest. Different experimental paradigms have been developed to account for false memories’ formation, with the Deese–Roediger–McDermott (DRM; Deese,
1959; Roediger & McDermott,
1995) task being one of the most widely used method in the verbal domain. In this task, participants are typically first presented once with several lists of words that have to be memorized (within each list, the words are related to a non-shown target word, named critical lure; e.g., word list:
door,
glass,
pan,
shade,
ledge, etc.—critical lure:
window) and then, after a brief distracting task, they are asked to perform a recognition task in which they have to indicate whether a given word was part of the memorized lists or not. Interestingly, during this latter phase, participants tend to erroneously report as “old” the critical lures, (i.e., they recognize them as if they were part of the memorized lists, although these words were never presented during the encoding phase; for a review, see: Gallo,
2010).
To explain participants’ performance in the DRM task, two main theories have been proposed: activation-monitoring framework (Gallo & Roediger,
2002; Roediger et al.,
2001) and fuzzy-trace theory (Brainerd & Reyna,
2002; Reyna & Brainerd,
1995). According to activation-monitoring framework, the critical lure would be hyperactivated by the presentation of the studied words related to it, thus leading to high levels of false recognition (Roediger et al.,
2001). Conversely, according to fuzzy-trace theory, while studying the words participants would encode a memory trace—called gist trace—linked to the semantic content of each list, which would be responsible for the production of the false recognitions (Brainerd & Reyna,
2002; Reyna & Brainerd,
1995).
Consistent with these perspectives, previous studies have successfully predicted false memories occurrence on associative (i.e., associative relationships reflect word use, such as “spider-web”) and semantic (i.e., semantic relationships reflect overlap of conceptual features between words, such as “horse-pony”) bases (Brainerd et al.,
2008b; Roediger et al.,
2001). In particular, seminal studies have shown that the association strength between the words that compose each list and the critical lure (i.e., the backward associative strength, BAS) is a key factor in determining false memories (Roediger et al.,
2001) and that multiple semantic sub-components underlie false memories (Cann et al.,
2011). The debate around the associative vs. semantic nature of false memories in the DRM is igniting fervent discussions (for recent evidence see: Brainerd et al.,
2020), as generally these two processes are considered to be independent (Ferrand & New,
2003; Hutchison,
2003). However, recent studies have shown that indexes extracted from natural language use are successful in predicting both associative and semantic effects in priming tasks (Günther, et al.,
2016; Jones et al.,
2006). Such evidence, indicating that the same process (i.e., predicting a target word from the linguistic context in which it typically appears) can explain both associative and semantic processing, suggests in turn that these two processes are likely interdependent in natural settings and may converge into (partially) overlapping structural representations of human memory. That is, the dissociation between associative and semantic processes would be possible on an experimental level by forcing participants to rely on a specific component given certain task demands (e.g., defining a concept or guessing at associates necessarily forces participants to rely on different components of human memory; for a discussion see, Maki & Buchanan,
2008); yet, in natural contexts it is almost impossible to isolate such components and thus to find words that are purely semantically or associatively related (Jones et al.,
2006).
However, currently little is known about how the decision process unfolds when participants accept or reject words in the DRM task. That is, accuracy and reaction times—the classic dependent variables used in memory research—although informative of some explicit and implicit cognitive components are associated with the final state of the decision process, and hence cannot provide a direct measure of how this process unfolds or cannot directly quantify potential conflicts in the response (Freeman,
2018; Stillman et al.,
2018). Alternative methods, such as drift-diffusion models (e.g., Krajbich & Rangel,
2011; Ratcliff,
1978; for evidence on memory tasks see: Osth et al.,
2020), can be used to isolate certain decision components. Specifically, based on reaction times distributions, drift-diffusion models can be for example used to estimate (i) how much the responses of a given participant in a certain condition are relatively conservative/biased, (ii) the degree of perceptual sensitivity or task difficulty, or (iii) the duration of non-decisional components within the decision process (e.g., for further details see: Voss et al.,
2004). The adoption of these models has consequently allowed to clarify several decision components within a large number of psychological domains, including human memory (e.g., Osth et al.,
2020). However, one limit of drift diffusion models is that, in order to estimate the parameters with high quality, such models require a high number of observations within each condition (Voss et al.,
2004). This, in turn, prevents the adoption of continuous predictors at the single stimulus-level (like the semantic predictor employed in the present study).
For these reasons, here we opted for using mouse tracking, as a particularly reliable method able to isolate the dynamics of response conflict and indecision, as well as the evolution of the choice (Freeman,
2018), allowing also for the computation of trial-level estimates. Accordingly, it has also been shown that mouse-tracking measures outperform reaction times in predicting participant’s performance in decisions involving risk (Stillman et al.,
2020). In recent years, mouse-tracking has been indeed successfully used to investigate how participants’ decision unfolds across several cognitive domains such as language (Lins et al.,
2019; Spivey et al.,
2005), social cognition (Freeman et al.,
2011,
2016), recognition memory (Papesh et al.,
2012,
2019), and also to detect faking-good behavior when responding to personality questionnaires (Mazza et al.,
2020). Recently we employed a mouse-tracking paradigm to investigate real-time decisions during semantic processing, by predicting participants’ performance through distributional semantics (Gatti et al.,
2021b). Specifically, in this study participants were shown word pairs and were required to perform a two-alternative forced-choice task selecting either the more abstract or the more concrete word, with the response selection that was achieved by moving the computer mouse. Results showed that mouse trajectories reflected the response conflict and its temporal evolution, with a larger deviation for increasing word semantic relatedness, thus supporting the validity of mouse-tracking as a method to detect deep and implicit decision-making features subserving semantic memory (Gatti et al.,
2021b).
In mouse-tracking paradigms, participants are required to make decisions by moving their mouse from a starting position (typically placed in the middle-bottom part of the screen) to one of the two options presented (typically placed in the two upper corners of the screen). It is assumed that motor outputs (i.e., hand movements) are executed in parallel with the decision that participants are required to make (Freeman et al.,
2010), thus allowing for the quantification of the conflict of the choice and its evolution, which cannot be directly assessed using only reaction times (Stillman et al.,
2018). Through mouse-tracking packages (e.g., Kieslich et al.,
2019) it is indeed possible to extract several dependent variables that are informative about decision-making processes. Generally, decision conflict is quantified by computing the maximum deviation from the direct path (MD; i.e., the furthest point on the actual trajectory from the idealized straight trajectory between the starting point and the selected stimulus), while the decision evolution is quantified by computing the sample entropy, which measures the irregularity and unpredictability degree of the trajectory (for a complete discussion on other possible indexes see, Freeman & Ambady,
2010; Stillman et al.,
2018). For both measures the higher the value, the higher the conflict and the level of indecision. Additionally, mouse-tracking paradigms allow for more refined decision time indexes, such as the computation of the time at which the trajectory reaches the maximum distance (the time at which the decision takes place). Participants are not subjectively aware that their manual trajectories are deviating as a function of task conditions, and thus these indexes can be considered as quantifying implicit and likely automatic measures. Consistent with this, within the recognition domain, seminal evidence has shown that mouse-tracking measures correlate with confidence judgements, with judgements with higher confidence showing more linear response trajectories (Papesh et al.,
2012). However, it should be noted that, compared with classic confidence judgements, mouse-tracking paradigms provide continuous (and not ordinal) dependent variables, thus allowing for more straightforward analyses. Additionally, previous evidence on lexical decision tasks has also shown that entropy quantifies indecision and thus provides additional information on decisional processes compared with other decisional measures (Calcagnì et al.,
2017). Other direct evidence comes from seminal studies investigating the conflict in lexical decision judgements and showing that during the categorization of the atypical exemplar “whale” as “mammal” (vs. the tempting yet incorrect option, “fish”), participants’ movements were more attracted from the “fish” option than when the target was a typical mammal (Dale et al.,
2007).
Here, building upon this evidence, we take advantage of mouse-tracking paradigms to explore the decisional stages subserving recognition memory in the DRM task. Specifically, we applied an already established method to compute the semantic similarity between the new words and the studied words (for a complete discussion see: Gatti et al.,
2021a,
2021b,
2021c), by employing indexes extracted from distributional semantic models (DSMs). DSMs induce words meanings from large databases of natural language data, representing them as high-dimensional numerical vectors: these models are indeed thought to well capture the structure of semantic memory (Günther et al.,
2019; Jones et al.,
2015). In particular, here we used word-embeddings that are based on a predictive component: these DSMs induce word vectors using a neural network architecture with one hidden layer, which is optimized to match a target word (Baroni et al.,
2014; Mikolov et al.,
2013). Briefly, these models are trained on large collections of texts that document natural language use. Nodes in the input and output layers represent words, and a neural network learns to predict a target word on the basis of the lexical contexts in which it appears (i.e., the words it co-occurs within the text), incrementally updating a set of weights by minimizing the difference between model predictions and observed data at each learning event (i.e., every occurrence of the target word). The estimated sets of weights will eventually capture word meanings. These distributed representations, or vectors, can be quantitatively compared by measuring their distance in a multidimensional space, which in turn is thought to capture the semantic similarity between words (Günther et al.,
2019): similar words will occur in similar contexts, ending up being associated with vectors that are geometrically closer. Importantly, word embeddings have been shown to be high-performing across a wide range of semantic tasks (for a review on the recent prediction-based class of models, see e.g., Baroni et al.,
2014). Moreover, they are equivalent to psychologically grounded associative learning models (Günther et al.,
2019; Mandera et al.,
2017).
While previous studies predicted participants’ performance in the DRM task mainly by adopting human-based measures (e.g., backward associative strength—BAS; Roediger et al.,
2001), here we thus employed a measure not computed on human ratings, but rather automatically extracted from natural language. It should be noted that BAS and DSMs-based metrics in the DRM task have been shown to be correlated (i.e.,
r = 0.50, see Gatti et al.,
2022 for an in-depth discussion regarding such a relationship). However, the adoption of an independent-source measure such as data from DSMs may be preferable: that is, predicting human performance using data from association norms in a task that is necessarily tapping on the cognitive processes generating such norms (i.e., as most DRMs are explicitly constructed from free-association norms) may lead to explanatory circularity (Westbury,
2016). In line with this view, here we aimed to predict participants’ behavior in the DRM task starting from independent models that replicate the structure of semantic memory by applying a psychologically-plausible learning model to environmental regularities (i.e., word co-occurrences) (Günther et al.,
2019).
Participants were asked to study several lists of words from a classic DRM task and then, in the recognition phase, they were asked to indicate using their mouse if the words showed were “old” (i.e., presented in the encoding phase) or “new” (i.e., not previously presented). The spatial and temporal measures extracted from mouse movements were then predicted using a semantic index extracted from a DSM. This method allowed us to investigate whether the decision process differs depending on the position of the new (and studied) words in the semantic space (i.e., whether words in the recognition phase are more semantically similar or not to the studies words).
Discussion
In the present study, we explored the decisional stages subserving recognition memory in the DRM task taking advantage of mouse-tracking and of distributional semantic models. Participants were asked to memorize several lists of words in a classic DRM task and then to recognize them among new words using their mouse. The decision-making processes were indexed through different variables computed from mouse trajectories and predicted through item-level semantic metrics extracted from distributional semantic models (for a complete discussion see: Gatti et al.,
2022). Overall, our findings indicate that mouse trajectories are affected by the semantic similarity between each word in the recognition phase and the previously studied words. Specifically, our findings indicate that mouse trajectories are affected by the semantic similarity between each word in the recognition phase and the previously studied words. That is, the higher the semantic similarity, the higher the conflict driving the choice and the irregularity in the trajectory (respectively, measured with the maximum deviation from the direct path and with sample entropy) when correctly rejecting new words. Conversely, on the temporal evolution of the decision, our findings indicate that semantic similarity predicts complex temporal measures indexing the online decision processes subserving task performance. More specifically, we found that regardless of the type of stimuli (old or new), when responding that a word was “old”, the higher the semantic similarity, the earlier the stage at which the decision was achieved; on the contrary, when rejecting a word (i.e., when responding that a word was “new”), the higher the semantic similarity, the later the stage at which the decision was achieved.
These findings well complement the key assumptions of the two main theories accounting for false memory in the DRM task, namely activation-monitoring framework and fuzzy-trace theory. Indeed, both theories trace back the origin of false recognitions to associative/semantic mechanisms, with adequate episodic and source memory processes that would counter them and enhance the occurrence of veridical recognition (Brainerd et al.,
2002; Gallo & Roediger,
2002; Reyna & Brainerd,
1995; Roediger et al.,
2001; and for individual differences evidence see: Gatti et al.,
2021a). In interpreting our findings considering these theories, we first note that the dependent variable used to measure the conflict in the decision (maximum deviation from the direct path) quantifies it as an implicit measure of the attractiveness of the unselected option (Freeman et al.,
2010). Thus, the higher the attractiveness of the unselected option, the higher the maximum deviation value because mouse trajectories would be associated with a larger curvature. Accordingly, our results show that even when participants correctly rejected new words (i.e., by selecting the “new” button), the “old” button exerted high attractivity. Notably, the level of attractiveness varied as a function of the semantic similarity, with a greater level of conflict for more semantically similar new words. Hence, while previous studies have shown that semantic memory is involved in the production of false memories (Gatti et al.,
2022; see also: Montefinese et al.,
2015), here we demonstrate that in the DRM task semantic processes participate also when correctly rejecting new words (i.e., the false memory items). Such an interpretation holds as well for the degree of irregularity and unpredictability of mouse movements, as the evolution of the choice was similarly affected by semantic similarity. That is, when correctly rejecting new words, movement irregularity was higher for more semantically similar new words.
The increased conflict in the rejection of new words as a function of their semantic similarity can be interpreted in terms of the alleged conflict emerging when participants are requested to judge if a new word was actually studied or not. In particular, activation-monitoring framework assumes that the critical lure is associatively hyperactivated in the encoding phase and then, in the recognition phase, such hyperactivation would be responsible for the false recognition (Gallo & Roediger,
2002; Roediger et al.,
2001). On the other hand, fuzzy-trace theory assumes that while studying the word lists the participants would encode two memory traces: a semantic one, linked to the semantic content of each list; and an episodic one, linked to the contextual and perceptual features. In the recognition of a new word, these two traces would therefore counter each other (i.e., the semantic trace would increase the likelihood of false recognition, while the episodic trace would operate in the opposite direction), while for studied words they would both boost veridical recognition (Brainerd et al.,
2002; Reyna & Brainerd,
1995). The enhanced conflict and uncertainty in participants’ rejection of new words with increasing semantic similarity observed here directly documents the online decision processes underpinning false memories, thus supporting both the activation-monitoring framework and fuzzy-trace theory. That is, for the new words in which the episodic trace is lacking, the conflict and the uncertainty would increase with increased semantic similarity of the new word. Specifically, our study shows that using mouse-tracking, it is possible to extract several behavioral metrics indexing the conflict and uncertainty underlying human performance in the DRM task. Additionally, the fact that, across old responses, new and studied words share similar conflict and timing indexes as extracted from mouse trajectories can be considered as the implicit counterpart of the seminal evidence that confidence ratings for both falsely recognized critical lures and correctly recognized studied words show similar levels (e.g., Roediger & McDermott,
1995).
Previous studies investigating associative and semantic involvement in the DRM task have shown that backward associative strength is a major predictor of false memories (Roediger et al.,
2001; but see also: Brainerd et al.,
2020), that multiple semantic sub-components underlie false memories (Cann et al.,
2011) and that memory performance follows a continuous semantic gradient (Gatti et al.,
2022). This last finding was replicated in this study, by observing that for new words, the higher the semantic similarity value, the higher the occurrence of false memories. Critically, here we extended this evidence by showing that the semantic similarity between the words presented in the recognition phase and those previously studied affects not only the explicit memory judgements (i.e., “yes” and “no” recognition responses), but also more implicit measures extracted from participants’ motor outputs. These findings support previous evidence, in that they suggest that, while memorizing the words, participants would implicitly and automatically activate the semantic trace for each list and would then use this trace during the recognition task when judging new words (Gatti et al.,
2022). This effect has been explained by arguing that, since during the encoding task participants were shown the lists of words without any clues that clustered the words, the sequential presentation of each word within each list would have incrementally activated a meaning cluster composed of the same words in semantic memory (Gatti et al.,
2022). Here, extending upon these findings, we further show that semantic similarity between studied and new words affects decision-making in memory retrieval: the more the cluster of words composing each list is semantically close to the new word, the higher the participants’ conflict and uncertainty when correctly rejecting new words. This indicates, therefore, that the structure of semantic memory and the activation of specific clusters of words affect memory retrieval, with the degree of overlap between the vectors representing new and studied words accounting for the level of conflict in the decision-making process.
For the main timing measure (i.e., the time at which the mouse trajectory finally deviates), we found that the semantic similarity affected participants’ performance in both veridical and false memories. In particular, the higher the semantic similarity, the faster participants were in deciding that a word was old, and the opposite for “new” judgments. Hence, the time at which the decision occurred was influenced in opposite directions by semantic similarity, indicating that this variable overall impacted the temporal dynamics subserving task performance. This dissociation may suggest that semantic memory involvement in the DRM task could affect differently temporal and spatial measures of decision making, thus dissociating the time needed to decide if a word was old or new from deeper decision-making components, such as the conflict and indecision underlying memory judgements. As maintained by the activation-monitoring framework and fuzzy-trace theory, different cognitive processes (i.e., associative/semantic and episodic) come simultaneously into play in the DRM task, generating in turn different outcomes (Brainerd et al.,
2002; Gallo & Roediger,
2002; Reyna & Brainerd,
1995; Roediger et al.,
2001). Specifically, the conflict observable in the spatial measures that can be traced back to the interplay between associative/semantic and episodic traces was not observed in the main timing measure, suggesting that two main decision components (i.e., spatial and temporal) are active in parallel during memory retrieval in the DRM task. This dissociation can be explained through dynamical decision-making frameworks arguing that several explicit and implicit processes simultaneously compete when making a decision (Freeman & Ambady,
2011; Melnikoff & Bargh,
2018).
Our findings have relevant implications from both theoretical and methodological points of view. On a theoretical level, our results clarify semantic memory involvement in a complex memory task such as the DRM when participants’ performance is measured through fine-grained hand movements. In particular, here we provide evidence for possible differential involvement of semantic memory across time, conflict and uncertainty of participants’ decisions. Additionally, by successfully predicting mouse-tracking measures using a semantic predictor extracted from a distributional semantic model, we provide further support to the idea that these models are extremely efficient in capturing the structure of human semantic memory (Günther et al.,
2019). Indeed, while previous studies have predicted participants’ performance using distributional semantic models across a wide range of semantic tasks, such as multiple-choice tests (Bullinaria & Levy,
2007), word categorization (Baroni & Lenci,
2010), word relatedness ratings (Bruni et al.,
2014), word naming and lexical decision (Marelli & Amenta,
2018), semantic priming (Günther et al.,
2016), recognition memory (Gatti et al.,
2021c,
2022), as well as using mouse-tracking (Gatti et al.,
2021b), our study is the first to report its effect also on mouse-tracking variables in a complex memory task such as the DRM. Furthermore, on a methodological level, we show that by pairing distributional semantic models with mouse-tracking it is possible to investigate deep decision-making features of human behavior, thus opening new avenues for probing the detailed processes subserving human memory, with this method being promising also for specific manipulations in the DRM task, such as when testing the effect of warning on false memories. The methods used here may be as well particularly suited for better understanding individual differences in false memories. This is indeed a topic of intense research (e.g., Ball et al.,
2021; Leding,
2011,
2012; Nichols & Loftus,
2019; Unsworth & Brewer,
2010; Watson et al.,
2005), and mouse-tracking could provide additional insights regarding implicit and semantic processes involved in false remembering. For instance, recent evidence has shown that individuals with higher source-monitoring abilities are better at recalling contextual information from encoding to correctly reject lures (Ball et al.,
2021). Future studies could thus address to what extent individuals with better source-monitoring abilities would also manifest decreased indecision and conflict when rejecting a critical lure as measured with mouse-tracking.
In conclusion, using distributional semantic models combined with mouse-tracking, we document the decision-making semantic processes underpinning false memories. Our findings are consistent with previous theories on participants’ behavior in the DRM task and provide novel insights into the impact of semantic memory on different decision-making components.