Introduction
Self-report measure is one of the important methodologies in many disciplines of psychology, such as educational, developmental, clinical, social, and personality psychology. Using questionnaires, researchers have quantified people’s insight into their skills, intelligence, cognitive ability, personality, or mood, and have created psychological models or theories. Despite the prevalence of self-report in psychological measurements, the correspondence between self-evaluations of ability and objective performance has been debated. Zell and Krizan (
2014) synthesized meta-analyses across diverse disciplines and ability domains and reported that the mean correlation between ability self-evaluations and behavioral performance was moderate (
M = 0.29). This finding suggests that people have only modest insight into their ability, perhaps reflecting not only the inaccuracy or imprecision of self-evaluations but also the biases (e.g., social desirability or self-esteem) inherent to self-report questionnaires (Choi & Pak,
2005).
Although the meta-synthesis indicated a moderate relationship between people’s insight into their ability and actual performance, recent studies have reported that people have good insight into their face recognition ability using the 20-item prosopagnosia index (PI20) (Livingston & Shah,
2017; Shah, Gaule, Sowden, Bird, & Cook,
2015; Shah, Sowden, Gaule, Catmur, & Bird,
2015). Shah et al. developed the PI20 to serve as a new self-report measure for estimating face recognition ability and developmental prosopagnosia (DP) risk, while criticizing a pre-existing questionnaire (Kennerknecht, Ho, & Wong,
2008), a 15-item questionnaire developed in a Hong Kong population (hereafter, HK questionnaire), on the grounds that it correlates poorly with objective face recognition performance (Palermo et al.,
2017) (but see Johnen et al.,
2014; Stollhoff, Jost, Elze, & Kennerknecht,
2011). However, although the PI20 was aimed to overcome the weakness of the HK questionnaire (i.e., it contains items irrelevant to face recognition and it has a ‘weak relationship’ to actual behavioral performance), its performance was not validated formally against the HK questionnaire. No direct comparison between the questionnaires was performed not only in terms of their relation to behavioral performance, but also their own relationship. Thus, whether the PI20 outperforms the HK questionnaire remains unclear.
Moreover, whether people have insight into their face recognition ability also remains to be investigated. Recent studies have reached different conclusions regarding the association between self-report and actual face recognition performance (Livingston & Shah,
2017; Palermo et al.,
2017; Shah, Gaule, et al.,
2015). Not only do they differ in the questionnaire used, but also in their participant demographics. Shah, Gaule, et al. (
2015) reported that people have good insight into their face recognition ability (
r = − 0.68); they used PI20 and recruited individuals ‘identified themselves as suspected prosopagnosics’ in addition to a normal population. On the other hand, Palermo et al. (
2017) reported that people have moderate insight into their face recognition ability (
r = − 0.14); they used the HK questionnaire and recruited a normal population, without ‘suspected prosopagnosics’. (The distinction between ‘good’ and ‘moderate’ insight has been arbitrary and seems to be based solely on researchers’ intuition or convention without clarifying the criteria, but here we regard a significant correlation coefficient of
r = 0.5 or larger as ‘good’ insight and a significant correlation coefficient less than
r = 0.5 as ‘moderate’ or ‘modest’ insight.) These inconsistent results are likely to result from the two methodological differences. First, although the PI20 and the HK questionnaire are so similar and simply asking how good (or bad) people are at recognizing faces, their subtle differences in texts might lead to a difference in correlation between self-report and behavioral performance. Second, because Shah and colleagues used an extreme group approach (i.e., recruited ‘suspected prosopagnosics’), which almost always leads to upwardly biased estimates of standardized effect size (Preacher, Rucker, MacCallum, & Nicewander,
2005), they might observe an inflated correlation between self-report and behavioral performance. Thus, it is crucial to use the two questionnaires in the same population and assess the relationship between the questionnaires and their relation to behavioral face recognition performance. We examined this issue by administering the two questionnaires to a large population and performing a set of analyses including correlation analysis, hierarchical clustering, a brute-force calculation/comparison of reliability coefficients, and a behavioral validation using Taiwanese Face Memory Test (TFMT) (Cheng, Shyi, & Cheng,
2016), an East Asian version of Cambridge Face Memory Test (CFMT) (Duchaine & Nakayama,
2006). If the PI20 is a better self-report instrument in estimating face recognition ability than the pre-existing HK questionnaire, the PI20 is expected to have distinct or more desirable features (i.e., low or moderate correlation between the questionnaires, PI20-specific cluster, or higher reliability) and a greater prediction accuracy of behavioral face recognition performance compared to the HK questionnaire.
General discussion
Whether people have insight into their face recognition abilities has been debated recently. Although recent studies reported that people have good insight into their face recognition ability using PI20 (Livingston & Shah,
2017; Shah, Gaule, et al.,
2015), other studies showed that people have modest insight using the HK questionnaire (Bobak et al.,
2019; Murray, Hills, Bennetts, & Bate,
2018; Palermo et al.,
2017). Since the difference might be due to the difference in the questionnaire and/or the bias induced by including an extreme group, we examined the relationship between self-reported face recognition ability and actual behavioral performance using both questionnaires. Our results showed that both questionnaire scores moderately correlated with behavioral face recognition performance (about
r = 0.3) and that the correlation was stronger for HK11 than for PI20. This suggests that people have modest, not good, insight into their face recognition ability and necessitates a revision of the view that the PI20 overcomes the weakness of the pre-existing questionnaire.
Although the Kennerknecht’s HK questionnaire was criticized because of its “weak relationship” to actual face recognition performance (Shah, Gaule, et al.,
2015), our findings showed a significant correlation between HK11 scores and behavioral performance. This might be partially due to the fact that most studies used the score summed over all 15 items when using the HK questionnaire (Johnen et al.,
2014; Kennerknecht et al.,
2008; Palermo et al.,
2017; Stollhoff et al.,
2011), even though it includes the four dummy questions. Incorporating irrelevant items to a questionnaire not only reduce the reliability, but also reduce the predictability of a questionnaire in behavioral performance. Using the reduced subset of the pre-existing questionnaire (HK11), which excludes the dummy items, we showed that the Kennerknecht’s HK questionnaire may have a greater potential to capture face recognition ability than the PI20.
Furthermore, the use of extreme group approach might cause the inconsistency between studies. Selecting individuals on the basis of (expected) extreme scores of a sample distribution could result in inflated effect size estimates, which in turn leads to inappropriate expectations or conclusions (Preacher et al.,
2005). In fact, although the correlation between PI20 scores and behavioral performance was reported to be high (
r = − 0.68) (Shah, Gaule, et al.,
2015), it decreased remarkably if the data from people with suspected prosopagnosics was excluded (
r = − 0.34) (Livingston & Shah,
2017). Studies that have reported the moderate correlations also did not include suspected prosopagnosics in their sample (Bobak et al.,
2019; Palermo et al.,
2017). In addition, a recent study has reported that people who have been previously informed of their exceptionally high performance (i.e., ‘super-recognizers’ (Russell, Duchaine, & Nakayama,
2009) actually performed well, whereas naïve participants had only moderate insight into their face recognition ability (Bobak et al.,
2019). Thus, if the studied population includes those already known to have poor (Shah, Gaule, et al.,
2015,
b) or good (Bobak et al.,
2019) face recognition ability, it may inflate correlation between insight and behavioral performance. It might be difficult to generalize such findings to naïve individuals across the full range of face recognition abilities. One should be careful in these kinds of participants selection biases that can cause circular analysis (i.e., double dipping) whose results statistics inherently depend on the selection criteria (Kriegeskorte, Simmons, Bellgowan, & Baker,
2009).
Surprisingly, our findings are in line with a recent meta-synthesis that showed that the mean correlation between ability self-evaluations and performance was moderate (
M = 0.29) (Zell & Krizan,
2014). Although individual effects varied from 0.09 to 0.63, the meta-synthesis indicates that people have limited insight into their ability. If the correlation between self-report and behavioral face recognition performance is not so strong in a naïve population, then what do questionnaire-based measures tell us about face recognition? How do we reconcile self-report with objective performance? Unfortunately, there would be no straightforward way to reliably estimate an individual’s face recognition ability or DP risk. Instead of simply asking participants about insight into their face recognition ability, we might have to improve measurements and/or analytical methods, for example by elaborating the design/texts of a questionnaire, extracting latent cognitive factors from a battery of behavioral tests (e.g., Miyake & Friedman,
2012), and creating a reliable predictive model based on a machine learning technique.
In conclusion, our results suggest that the two representative self-report face recognition questionnaires (Kennerknecht et al.,
2008; Shah, Gaule, et al.,
2015) measured the similar but slightly different traits, and that people have modest, not good, insight into their face recognition ability. Although the HK11 and/or the PI20 may serve as a moderate (albeit non-definitive) measure for estimating face recognition ability and DP risk (Livingston & Shah,
2017), our findings suggest that, contrary to the Shah et al.’s claims, the reliability and validity of the PI20 may be less than that of the pre-existing questionnaire (precisely, the reduced subset, HK11) (Kennerknecht et al.,
2008). Given the current state of DP, where neither objective diagnostic criteria nor biological markers have been established (Barton & Corrow,
2016; Susilo & Duchaine,
2013), we might need to focus on creating a reliable face recognition questionnaire (rather than a ‘DP questionnaire’) that can predict behavioral face recognition performance (Arizpe et al.,
2019). Alternatively, more exploratory research not only using HK11 and PI20 together or a combination thereof, but also a range of other face processing measures could aid the extraction of latent prosopagnosia traits/dimensions and the development of valid DP taxonomy. In either case, self-report may not be, at least in its current form, a reliable measure for estimating face recognition ability or DP risk as it gives us limited insight into the prediction of naïve individuals’ face recognition performance.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.