Introduction
Personality questionnaires are the most popular tool used to measure personality for a variety of purposes, from pre-employment assessment to forensic evaluation (e.g., in the context of child custody hearings), (Burla et al.,
2019; Mazza, Orrù, et al.,
2019, Mazza, Monaro et al.,
2019; Roma, Piccinni, & Ferracuti,
2016; Roma et al.,
2013,
2014,
2019). However, the most favorable responses to items on these tests are often easily determined. For this reason, test-takers may decide, depending on their motivation, to distort their responses to achieve personal goals; such behavior is known as faking (Mazza, Orrù, et al.,
2019; Sartori, Zangrossi, Orrù, & Monaro,
2017; Ziegler, MacCann, & Roberts,
2011). Faking-good, more specifically, is a behavior in which subjects present themselves in a favorable manner, endorsing desirable traits and rejecting undesirable ones. The general prevalence of faking-good is unknown; however, Baer and Miller (
2002) estimated its rate to be approximately 30% for job applicants. Indeed, up to 63% of applicants admit to faking on personality tests (Dwight & Donovan,
2003); 50% admit to exaggerating positive qualities, while 60% admit to de-emphasizing negative traits (Donovan, Dwight, & Hurtz,
2003).
Most tests include validity scales designed to detect response bias (Paulhus,
2002)—otherwise known as the systematic tendency to answer items of a self-report test in a way that interferes with accurate self-presentation. However, these validity scales are often comprised of highly transparent items and are thus not always effective in detecting faking; therefore, some authors developed different indices, based on the best combination of scales, that could differentiate between honest respondents and fakers (Bosco et al.
2020; Martino et al.
2016), while other authors suggested that indirect behavioral measures could be accurate in detecting deception.
Starting in the early 1970s, Dunn, Lushene, & O’Neil (
1972) suggested that response times (RTs) could assist in distinguishing fakers from honest respondents. The idea behind this theory is that the cognitive processes involved in lying differ from those involved in answering truthfully. Specifically, the literature indicates that lying requires more time, as it is cognitively more demanding than telling the truth; therefore, fakers typically record longer RTs (Foerster et al.,
2013; Holden & Kroner,
1992; Mazza, Orrù et al.,
2019, Mazza, Burla et al.,
2019; McDaniel & Timm,
1990; Roma et al.,
2018; Roma, Giromini et al.,
2020, Roma, Mazza et al.,
2020; Verschuere,
2018; Walczyk, Roper, Seemann & Humphrey,
2003). A meta-analysis indicated that honest and faking respondents show significantly different RTs when endorsing an item, but similar RTs when rejecting an item, suggesting that the type of answer could play a role in this regard (Maricuţoiu & Sârbescu,
2016). Moreover, there is evidence suggesting that the introduction of a false alibi may invalidate these effects, facilitating dishonest responses and making honest retrieval more effortful (Foerster
2017).
Another line of research suggests that time pressure (i.e., limited time available to answer), leads to less ethical decision making and responses that emphasize socially approved traits and behavior (Gunia et al.,
2012; Khorramdel & Kubinger,
2006; Neubauer & Malle,
1997; Shalvi, Eldar, & Bereby-Meyer,
2012,
2013; Sutherland,
1964). In detail, when respondents are presented with an immediate choice or have limited time available to answer, they tend to lie more frequently; this makes their faking more easily detectable. In contrast, when participants have sufficient time to reflect, they tend to choose their answers more cautiously and moderate their faking behavior. Roma et al. (
2018) found support for this idea in research using the Minnesota Multiphasic Personality Inventory-2 Restructured Form (MMPI-2-RF), (Ben-Porath & Tellegen,
2008; Tellegen & Ben-Porath,
2011): in a sample of 135 male volunteers, participants instructed to fake under time pressure obtained significantly higher T-scores
1 on the L-r and K-r scales when compared to fakers in the unspeeded condition (
\(\eta_{\text{p}}^2\) = 0.243). These findings were later confirmed by a study (Roma, Mazza et al.,
2020) using the MMPI-2 underreporting scales (L, K, S), (Butcher,
2001; Hathaway, McKinley, & Committee,
1989): faking-good respondents in the speeded condition scored higher T-scores on the L and K scales than did faking-good respondents without time pressure (MMPI-2 L scale
\(\eta_{\text{p}}^2\) = 0.481; MMPI-2 K scale
\(\eta_{\text{p}}^2\) = 0.457; MMPI-2 S scale
\(\eta_{\text{p}}^2\) = 0.011). Furthermore, the latter study also highlighted that the effect of time pressure was noticeable only in the faking condition, while honest respondents remained honest in both conditions; this suggests that speeded answering may not always trigger faking. Finally, a recent analysis employing machine learning (ML) models trained on behavioral features (e.g., RT, time pressure) to identify fakers in self-report questionnaires indicated that time pressure was the most reliable method for identifying faking-good behavior Mazza et al. (
2019). However, the effect of speeded tests on RT is debated: a recent meta-analysis (Verschuere,
2018) indicated that cognitive load (e.g., time pressure) could generate higher RTs in honest subjects, thereby decreasing the RT difference between faking and honest respondents by impeding respondents’ ability to quickly tell the truth (
g = − 0.184).
In recent years, research has evaluated the efficacy of using mouse dynamics to detect deception. Specifically, mouse tracking records the cursor’s position, enabling researchers to follow mouse trajectories from the beginning to the end of a movement (Freeman & Ambady,
2010). This procedure has yielded promising results in lie detection studies, highlighting as trajectories data can be a powerful and rich source of cues to detect liars.
One of the pioneering studies in this field recorded the hands dynamics through a Nintendo Wii controller, while the subjects were engaged in an instructed lying task (Duran, Dale, & McNamara,
2010). The analysis of motor trajectories revealed that instructed lies could be distinguished from truthful responses according to the motor onset time, the overall response time, the trajectory, the velocity and the acceleration of the movement. Similarly, it has been shown that the analysis of movement trajectories of participants engaged in mouse-tracking (Pfister et al.,
2016) and finger-tracking paradigms (Wirth et al.,
2016) can reveal the on-going conflicts caused by a voluntary and deliberate rule violation. More recently, a series of studies conducted by Monaro et al. have suggested that, when completing autobiographical inventories, honest respondents follow a direct trajectory from the starting point to the desired answer, whereas fakers show larger and less straight trajectories that initially point towards the actual autobiographical information and then switch in the direction of the alternative (Monaro, Gamberini, and Sartori,
2017; Monaro et al.,
2018). Other studies have demonstrated that it is possible to identify patients simulating symptoms of depression and amnesia with accuracies ranging from 80 to 90% by analyzing their mouse dynamics when responding to questions about their symptoms (Monaro et al.,
2018, Monaro, Gamberini, et al.,
2018; Zago et al.,
2019). A more recent study (Mazza et al.,
2020) highlighted that honest respondents are faster than fakers in moving along the
x-axis when responding to the MMPI-2 underreporting scales (S, K, L); they are also faster in moving along the
y-axis when responding to the K scale and Psychopatic Personality Inventory Revised (PPI-R) VR scale. Furthermore, this study found significantly larger RTs and MD-times (i.e., maximum deviation time, or the time to reach with the mouse the point of maximum distance between the actual and the idealized trajectory) in the faking-good condition compared to the honest test-takers, but only for the L scale.
While mouse tracking software enables researchers to also record RTs, it is worth noting that these RTs are not exactly equivalent to the simple RTs used in the aforementioned studies (Foerster et al.,
2013; Holden & Kroner,
1992; Mazza, Orrù et al.,
2019, Mazza, Burla et al.,
2019; McDaniel & Timm
1990; Roma et al.,
2018; Roma, Giromini et al.,
2020, Roma, Mazza et al.,
2020; Verschuere,
2018; Walczyk, Roper, Seemann & Humphrey,
2003), since they include both cognitive and motor components. Moreover, mouse dynamics have nonetheless proven useful in lie detection research as they have been used to collect data on a large number of features (e.g., initiation time, time to reach the point of maximum mouse deviation, etc.) that can be used as predictors of deception.
To date, studies investigating the relationship between faking and behavioral indicators have largely used tests with dichotomous choice alternatives (i.e., true vs. false). However, many personality inventories adopt Likert scales as a response mode (e.g., strongly agree, agree, moderately agree, disagree, strongly disagree). For this reason, the present study used the underreporting scales of the Personality Assessment Inventory (PAI) and the Psychopathic Personality Inventory-Revised (PPI-r), which were designed to detect overly favorable self-presentations on items with four choice alternatives. To the best of our knowledge, this was the first study on faking-good using exclusively multiple-choice items, specifically with four alternatives. While the literature on this topic is scarce, it indicates that subjects take longer to react to four stimuli than to two (Garner,
1962; Kiesler,
1966); therefore, the number of response alternatives may affect RT and mouse dynamics and interact with the effect of deception and time pressure. Williams, Bott, & Lewis (
2013) reported that increasing the number of possible lie responses—from one to two or three—leads to a greater lying latency effect in subjects.
The aim of the present study was to evaluate the usefulness of T-scores on underreporting scales and behavioral features (i.e., RT and mouse dynamics) in detecting faking-good behavior when items have four, rather than two, choice alternatives. Building on previous findings (Mazza et al.
2020), the hypotheses were as follows: H1) Mouse movements (temporally described by RT, MD-time, velx and vely) would be slower in the faking-good condition relative to the honest condition.
H2) T-scores on the PPI-R VR scale and the PAI PIM would be higher in the faking-good speeded condition relative to the faking-good unspeeded condition; T-scores of honest respondents would not show any significant differences between speeded and unspeeded conditions.
Finally, similarly to previous studies (Monaro et al.
2018, Monaro, Gamberini, et al.
2018; Zago et al.
2019), here we assess the accuracy of the above-mentioned measures (T-scores and mouse tracking temporal features) in predicting whether a subject is having a faking-good behavior or not. Focusing on prediction rather than explanation when data analysis is performed is a recent and increasingly widespread trend in different scientific fields (Yarkoni & Westfall
2017), including a wide range of human research areas, like smart applications (Spolaor et al.,
2018), genetics (Navarin & Costa,
2017), clinical medicine (Obermeyer & Emanuel,
2016) and clinical psychology (Monaro et al.,
2018, Monaro, Gamberini, et al.
2018). This trend is becoming increasingly popular also thanks to the exponential growth of Machine learning (ML), a branch of artificial intelligence that deals with training algorithms to automatically learn information from a set of data and make predictions on a completely new set of unseen data without being explicitly programmed. ML techniques have already been used in behavioral science to predict human malicious behaviors, for example to identify people who declared false identities (Monaro, Gamberini, and Sartori 2017), who simulate depression (Monaro et al.
2018, Monaro, Gamberini, et al.
2018) or amnesia (Zago et al.
2019). From an applicative point of view, one of the main advantages of using ML is that it makes it possible to make predictions at the individual level, while traditional statistical methods just make inferences on the group level (Orrù et al.
2020). In other words, ML algorithms provide a useful and automatic tool to identify people who produce malicious behaviors in a clinical setting. In this research, ML algorithms are trained to investigate the accuracy of T-score and temporal mouse tracking variables in identifying faking-good respondents to the PPI-R VR scale and PAI PIM scale.