Introduction
According to the empathizing–systemizing theory (E–S theory) of sex differences (Baron-Cohen
2009),
empathizing is defined as “the drive to identify another person’s emotions and thoughts, and to respond to these with an appropriate emotion” (Baron-Cohen
2002). According to the theory, the complementary cognitive style of empathizing is
systemizing, which is the drive to (1) analyse the variables in a system, (2) to derive the underlying rules that govern the behaviour of a system, and (3) to construct systems (Baron-Cohen
2002). Systemizing allows a person to predict and control the behaviour of a system. Approximately one decade ago, two self-report questionnaires were introduced to measure the extent to which people possess these cognitive styles; the Empathy Quotient (EQ) (Baron-Cohen and Wheelwright
2004) and the Systemizing Quotient (SQ) (Baron-Cohen et al.
2003). To date, numerous studies have found that females adopt on average a more empathizing style, while males adopt on average a more systemizing style of information processing, with sex differences reaching effect sizes of half to one standard deviation. The E–S theory distinguishes different brain types that can be determined by means of the standardized scores on the EQ and SQ (Baron-Cohen
2002; Wheelwright et al.
2006). Individuals with higher standardized scores on the EQ than the SQ are categorized as having an empathizing or ‘female brain’ (type E), whereas individuals with higher standardized scores on the SQ than the EQ are categorized as having a systemizing or ‘male brain’ (type S). Individuals having equal standardized scores of the EQ and SQ are categorized as having a ‘balanced brain’ (type B). Consequently, the difference score (D) of the EQ and SQ can be used to characterize a person’s cognitive style or brain type.
The E–S theory originated from the research on autism spectrum disorder (ASD) (Baron-Cohen et al.
1985; Baron-Cohen
2009). Individuals with ASD are characterized by difficulties in social interaction and communication, alongside with unusually strong and narrow interests and repetitive behaviour (American Psychiatric Association
2013). Early theories explained the social and communicative difficulties of individuals with ASD by “mind-blindness”, which is the inability to put oneself into someone else’s shoes, to imagine their thoughts and feelings (Baron-Cohen et al.
1985). The E–S theory extended this mind-blindness theory by adding difficulties in emotional reactivity, forming the empathizing factor, and by adding the systemizing factor that could also explain the non-social characteristics of the disorder (such as the narrow interests and attention to detail) (Baron-Cohen
2002,
2009). According to this theory, individuals with ASD lie at the extreme end of the normally distributed difference between systemizing and empathizing (D), and consequently possess an above average systemizing cognitive style but a low and/or deficient empathizing style, i.e. an extreme type S or extreme male brain. A large number of studies making use of the EQ and/or SQ provided support for this Extreme Male Brain (EMB) hypothesis in ASD by demonstrating that males report lower levels of EQ, higher levels of SQ and hence a more systemizing brain type than females, while patients with ASD (both males and females) report even lower levels of EQ, even higher levels of SQ and even more systemizing brain type than males (Baron-Cohen and Wheelwright
2004; Baron Cohen et al.
2014; Berthoz et al.
2008; Sucksmith et al.
2013; Wakabayashi et al.
2007; Wheelwright et al.
2006). Moreover, autistic traits as measured by the Autism Spectrum Quotient (AQ) could be successfully predicted by both EQ and SQ in a community sample as well as in a sample of patients with ASD (Wheelwright et al.
2006). For both groups factor analysis had demonstrated that EQ and SQ both had strong loadings on AQ and together accounted for ~75 % of variance in the AQ scores. The EMB-hypothesis is further supported by the results of neuropsychological studies performed on children with ASD, demonstrating poorer performance on tests of social cognition (e.g. the ‘seeing leads to knowing test’, the ‘false belief test’ and the ‘reading the mind in the eyes test’) compared to typically developing children, and intact or superior performance on visuospatial tests (e.g. ‘physics test’, ‘picture sequencing test’) (see for a review: Baron-Cohen
2009).
The EQ and SQ have shown good cross-cultural stability, see Tables
1 and
2 for an overview of the psychometric properties of the EQ and SQ across different countries. Although the majority of studies have been conducted in the UK (Baron-Cohen and Wheelwright
2004; Baron Cohen et al.
2014; Lawrence et al.
2004; Manson and Winterbottom
2012; Muncer and Ling
2006; Sucksmith et al.
2013; Wheelwright et al.
2006), a large number of studies have validated the EQ by demonstrating the typical sex differences in other European countries (Dimitrijevic et al.
2012; Preti et al.
2011; Vellante et al.
2013; Von Horn et al.
2010; Zeyer et al.
2012), as well as in Canada and the US (Berthoz et al.
2008; Wright and Skagerberg
2012), but to a lesser degree in Asian countries (Kim and Lee
2010; Wakabayashi et al.
2007). The typical sex differences are also present for the SQ in European, Asian as well as US samples (Baron-Cohen et al.
2003; Ling et al.
2009; Manson and Winterbottom
2012; Von Horn et al.
2010; Wakabayashi et al.
2007; Wheelwright et al.
2006; Wright and Skagerberg
2012; Zeyer et al.
2012). Good cross-cultural validity of the measures is also demonstrated by lowered EQ scores and elevated SQ scores in international research on samples of individuals with ASD (Baron-Cohen et al.
2003; Baron-Cohen and Wheelwright
2004; Berthoz et al.
2008; Wakabayashi et al.
2007; Wheelwright et al.
2006).
Table 1
Overview of the psychometric properties of the 40-item Empathy Quotient (EQ) across countries
Baron-Cohen and Wheelwright ( 2004) | UK | 0.92 | .97b
| 197 (71) | 47.2 (10.2) | 41.8 (11.2) | 0.50 |
| UK | n.r. | .84b
| 172 (79) | 49.6 (9.6) | 40.9 (11.9) | 0.80 |
| UK | 0.85 | n.r. | 362 (156) | 46.3 (9.5) | 37.9 (10.5) | 0.84 |
Wheelwright et al. ( 2006) | UK | n.r. | n.r. | 1761 (723) | 48.0 (11.3) | 39.0 (11.6) | 0.79 |
Wakabayashi et al. ( 2007) control group
| Japan | 0.86 | n.r. | 137 (71) | 36.9 (10.7) | 31.1 (10.7) | 0.54 |
Wakabayashi et al. ( 2007) student group
| Japan | 0.86 | n.r. | 1250 (616) | 36.1 (10.4) | 30.6 (9.9) | 0.54 |
| Canada (French) | 0.81 | .93c
| 410 (201) | 41.4 (7.7) | 37.7 (10.0) | 0.41 |
| Korea | 0.78 | .84d
| 478 (156) | 35.8 (9.2) | 34.7 (10.5) | 0.11 |
Dimitrijevic et al. ( 2012) | Serbia | 0.78 | n.r. | 694 (293) | 43.1 (9.0) | 37.1 (9.4) | 0.65 |
| Sweden | n.r. | n.r. | 299 (114) | 51.1 (9.7) | 43.4 (10.3) | 0.78 |
| Italy | 0.79 | .85d
| 256 (118) | 45.4 (9.3) | 41.8 (9.4) | 0.39 |
Manson and Winterbottom ( 2012) | UK | n.r. | n.r. | 321 (133) | 46.4 (12.6) | 39.0 (11.7) | 0.61 |
Wright & Skagerberg ( 2012) a
| US | 0.86-0.87 | n.r. | 5186 (n.r.) | 3.1 (0.30) | 2.9 (0.31) | 0.66 |
| Switzerland | 0.86 | n.r. | 500 (250) | 43.8 (8.3) | 37.7 (10.2) | 0.66 |
| UK | n.r. | n.r. | 187 (93) | 48.5 (14.1) | 37.7 (13.5) | 0.78 |
| Italy | 0.80 | n.r. | 200 (92) | 48.3 (8.4) | 41.8 (8.7) | 0.72 |
Baron-Cohen et al. ( 2014) | UK | n.r. | n.r. | 3906 (2562) | 48.5 (13.7) | 38.0 (13.7) | 0.76 |
Present study | Netherlands | 0.89 | .78e
| 685 (270) | 49.0 (10.4) | 39.1 (12.0) | 0.88 |
Table 2
Overview of the psychometric properties of the Systemizing Quotient (SQ) across countries
Baron-Cohen et al. ( 2003) a
| UK | 0.79 | n.r. | 278 (114) | 24.1 (9.5) | 30.3 (11.5) | 0.59 |
Wakabayashi et al. ( 2007) control group
b
| Japan | 0.88 | n.r. | 137 (71) | 17.3 (10.9) | 29.5 (10.4) | 1.15 |
Wakabayashi et al. ( 2007) student group
b
| Japan | 0.88 | n.r. | 1250 (616) | 17.7 (9.0) | 27.8 (11.8) | 0.96 |
Wheelwright et al. ( 2006) c
| UK | 0.90 | n.r. | 1761 (723) | 51.7 (19.2)/(27.7)
f
| 61.2 (19.2)/(32.6)
f
| 0.49 |
| UK | 0.83 | n.r. | 167 (84) | 22.5 (8.5) | 32.1 (10.4) | 1.01 |
| Sweden | n.r. | n.r. | 299 (114) | 23.9 (8.6) | 31.7 (10.4) | 0.82 |
Manson and Winterbottom ( 2012) a
| UK | n.r. | n.r. | 321 (133) | 23.7 (9.6) | 33.2 (11.6) | 0.89 |
Wright and Skagerberg ( 2012) d
| US | 0.91-0.94 | n.r. | 5186 (?) | 2.6 (0.37) | 2.8 (0.38) | 0.53 |
| Switzerland | 0.83 | n.r. | 500 (250) | 17.7 (10.2) | 28.4 (9.0) | 1.11 |
Baron-Cohen et al. ( 2014) | UK | n.r. | n.r. | 3906 (2562) | 55.1 (21.1)/(29.4)
f
| 68.1 (21.6)/(36.3)
f
| 0.41 |
Present study | Netherlands | 0.87 | .79e
| 685 (270) | 49.4 (15.3)/(26.3)
f
| 61.9 (17.9)/(33.0)
f
| 0.75 |
Across international studies, criterion validity of the EQ is indicated by correlations between the EQ and other measures of empathy or measures related to emotional functioning. For example, a strong correlation was found between the EQ and the Friendship Questionnaire, measuring the enjoyment and importance of friendships (Baron-Cohen and Wheelwright
2004), weak to moderate correlations between EQ and both the Interpersonal Reactivity Index measuring affective and cognitive aspects of empathy (Dimitrijevic et al.
2012; Kim and Lee
2010) and the Toronto Alexithymia Scale measuring alexithymia (Preti et al.
2011; Vellante et al.
2013), but only negligible to weak correlations between EQ and the Reading the Mind in the Eyes Test (Vellante et al.
2013). In contrast to the EQ, evidence for cross-cultural validity of the SQ is limited, because only one study outside the UK investigated an ASD sample (Wakabayashi et al.
2007). This study, however, demonstrated good groups validity, because a typical sex difference on the SQ was demonstrated for Japanese participants. Furthermore, Japanese patients with ASD scored higher on the SQ and had more systemizing brain types (as measured by D) compared to the control participants. Only one study investigated the association of SQ with other measures (Ling et al.
2009), and supported its criterion validity by demonstrating that SQ was associated with mental rotation performance and not with general intelligence (Ling et al.
2009).
While several international studies suggest good cross-cultural stability of the EQ and SQ, to date no psychometric properties of a Dutch variant of these questionnaires are available. The aim of the present study was to evaluate the basic psychometric properties of Dutch translations of the EQ and revised version of the SQ (SQ-R) and to investigate whether the mean EQ and SQ-R scores of Dutch males and females are comparable to the scores of other countries as reported in international studies. The SQ-R has previously been created to improve the original SQ by adding more items that might be relevant to females, because the items of the original SQ had primarily been selected from male domains (Wheelwright et al.
2006). Short versions of the EQ have previously been developed containing 28 items (Lawrence et al.
2004) or 15 items (Muncer and Ling
2006). In these short versions, items loading high on social desirability had been removed and factor analyses had demonstrated a clear three-factor structure with the factors Cognitive Empathy (CE), Emotional Empathy (EE), and (Social Skills), that have been partly confirmed in translated versions of the EQ (Berthoz et al.
2008; Dimitrijevic et al.
2012; Preti et al.
2011). Since the questionnaires were developed within the scope of the male brain hypothesis of autism (Baron-Cohen
2009), the groups validity of the questionnaires will be explored by testing for sex differences and ASD-control differences. This study may contribute to the availability of measures for empathizing and systemizing behaviour for Dutch-speaking individuals and moreover to the literature on the cross-cultural stability of the E–S theory of sex differences and autism.
Discussion
The aim of this study was to describe the psychometric properties of Dutch translations of the EQ and SQ-R questionnaires and to review the cross-cultural validity of the EQ and SQ. To this end the reliability and validity of the original 40-item EQ, the short versions of the EQ (15- and 28-item versions), and the SQ-R were tested in a Dutch-speaking healthy sample and a patient sample of males with ASD. The psychometric properties of the Dutch EQ and SQ-R of the healthy sample were compared to the psychometric properties described in the international literature. For this purpose, the international studies on the EQ and SQ had been systematically reviewed in the introduction section and a synthesis on the cross-cultural validity of the EQ and SQ is explicated below.
The EQ mean scores of the Dutch sample reported in the present study are comparable to the scores of other Western countries (see Table
1). The sex differences are also comparable in magnitude, with medium effect size, as compared to the medium to large effect sizes of sex differences in the other countries. Reviewing the current literature on the EQ revealed that the average EQ scores of both males and females in Asian countries (for both student and community samples) are roughly one standard deviation lower compared to Western countries, and also the sex differences in these Asian countries are only small in effect size (and not always significant for the total EQ scale). This may be explained by cultural differences in the emotional and social habits of people in Western and Asian countries, e.g. in Western countries it is much more desired to openly express one’s emotions than in Asian countries (Eid and Diener
2001). In Asian countries, empathy may therefore be expressed to a lesser extent in social situations, and sex differences in the inner emotional life may therefore be underestimated or less well recognized when completing the EQ. Concerning cross-cultural stability of the EQ, it can be concluded that findings are stable in Western countries, but that EQ is characterised by a lower stability and sensitivity for sex differences in Asian countries. It remains unclear to what extent there are cultural differences in the interpretation of the EQ items, and therefore the difference between Asian and Western countries might partly stem from measurement invariance.
With regard to systemizing, we investigated the revised version of the SQ (SQ-R) in the present Dutch sample, and the obtained scores for both sexes were comparable to the other two studies using this version in a British sample (Baron Cohen et al.
2014; Wheelwright et al.
2006). In order to compare the SQ-R score to the scores of the other countries that made use of the original 40-item SQ, we recalculated the SQ-R score [SQ = (SQ-R/75) × 40]. The recalculated SQ-R scores of the present sample as well as the British samples were slightly higher compared to the scores of the other countries making use of the original 40-item SQ. This is most likely explained by the characteristics of the SQ-R, which includes more items that are less specific for males and more suitable for both sexes. We therefore recommend not to directly compare the SQ to the SQ-R. Reviewing the current literature on the SQ showed that Asian samples (for both student and community samples) score similar to Western samples on SQ and that the sex differences are also similar in magnitude, ranging from medium to large across international studies. Concerning cross-cultural stability of the SQ, it can be concluded that, different from EQ, SQ is stable regarding mean scores and sex differences across cultures.
In the present study good reliability and validity of especially the short 28-item version of the EQ was replicated (the Dutch EQ, scoring key, and norm table can be requested from the corresponding author). The 28-item EQ had overall good validity, as was evident by (a) significant sex differences with medium effect size, by (b) significant differences between males with and without an ASD diagnosis with mostly large effect sizes, by (c) weak to large positive correlations with a questionnaire assessing the enjoyment and importance of friendships and interest in other people (FQ), and by (d) negative correlations with the AQ in an ASD sample. A three-factor structure with the factors CE, EE and SS could be supported by factor analysis on 28 out of the 40 original items that had been proposed in previous psychometric studies on the EQ (Berthoz et al.
2008; Dimitrijevic et al.
2012; Lawrence et al.
2004; Muncer and Ling
2006; Preti et al.
2011). The 28-item EQ had overall good consistency and good test–retest reliability across a time span of 15 months. On a subscale level, SS had a lower consistency and lower intercorrelations compared to the CE and EE scales which could be due to the lower number of SS items (6) compared to the CE (11) and EE (11) scales. Another factor that may play a role in its low reliability is that the SS scale mainly consists of reversed items and that reversed items were shown to have overall lower consistency than forward items. The moderate intercorrelations with the SS scale could therefore be due to the lower reliability of this scale, but could alternatively suggest that the SS scale is related to, but nevertheless different from the general construct of empathy. Another point of discussion on the subscale level is that unlike the CE and SS subscale, the EE scale had only moderate test–retest reliability. We speculate that the EE score may in addition to empathic
trait factors, also measure
state factors. The majority of its items refer to feelings in relation to other people that may vary with the current social context or the affective state. Clinicians and researchers should therefore be cautious in interpreting the EE subscale as a fixed emotional empathic trait, and consider the social context or affective state at the time of the assessment.
The SQ-R also appeared to be a reliable and valid measure (the Dutch SQ-R, scoring key, and norm table can be requested from the corresponding author). The factor analysis on the 75-item SQ-R (Wheelwright et al.
2006) demonstrated that a one-factor structure was preferable, because no statistical or psychological meaningful clusters were found in a multifactor solution. Given the high internal consistency of the total scale, we decided in line with Wheelwright et al. (
2006) that it was more appropriate to interpret SQ-R as a single scale without any specific subscales. The test–retest reliability of the SQ-R was also good, and divergent validity was reasonable as indicated by weak to moderate negative correlations with EQ and FQ in the community sample, and a weak positive correlation with AQ in the ASD sample. Although divergent validity of SQ-R appeared reasonable, convergent validity was not tested in this study which would be necessary for further validation of the SQ-R. Some studies with the original SQ did demonstrate good convergent validity, as higher SQ scores go along with higher scores on visuospatial tasks, such as mental rotation and ball targeting (Cook and Saucier
2010; Ling et al.
2009). With regard to criterion validity, typical sex differences of medium effect size could be demonstrated, but surprisingly no differences were found between males with and without ASD. Patients with ASD scored in the same range as the males of the norm group. Furthermore, ROC analyses exploring the accuracy of SQ-R in detecting males with ASD yielded poor predictive validity for the SQ-R.
In the light of the EMB hypothesis of ASD, the outcomes in this study provide support for reduced empathy in ASD but not for increased systemizing. The sole use of SQ-R scores was not predictive of having ASD or not. The EQ and ‘brain type’ were better predictive measures, as ROC analyses revealed that both EQ and D could detect patients with ASD above chance level. However, the combinations of sensitivity and specificity were suboptimal, so the instruments are not suited for predictive or diagnostic purposes. It must be noted that the predictive value of ‘brain type’ is most likely carried by the predictive value of EQ, which partly constitutes the ‘brain type’ measure. Furthermore, only a weak negative association between the EQ and SQ-R was found in the community sample and this correlation was absent in the ASD sample. This implies that there is only a weak trade-off between empathizing and systemizing, which is inconsistent with the EMB hypothesis stating that these cognitive styles are complementary. The latter findings could however relate to the inclusion of a relatively heterogeneous ASD sample (see “
Limitations” section). Wheelwright et al. (
2006), for example, did find a stronger negative association between EQ and SQ-R in a sample of ASD patients compared to a typical group, suggesting a stronger trade-off between empathizing and systemizing in patients with ASD. Other studies did provide support for increased systemizing in ASD (Baron Cohen et al.
2014; Wakabayashi et al.
2007; Wheelwright et al.
2006). More support for the systemizing part of the EMB theory in adult samples is necessary, not only by means of the SQ but also by neuropsychological assessments.
Limitations
The actual sex differences for empathy and systemizing could be smaller than the sex differences reported in this study because of several reasons. Firstly, regarding empathy, participants may fill-out the EQ in a social desirable or sex-stereotypical way. Previous studies found somewhat smaller sex differences for EQ when controlling for social desirability (Berthoz et al.
2008; Preti et al.
2011) and an association was found between EQ and social desirability, which is larger in females than males (Vellante et al.
2013). We expect that the influence of social desirability is smaller in the short 28-item version of the EQ, because this version excludes those items with high loadings on social desirability (see Lawrence et al.
2004). Secondly, it is not known whether males and females differ in the way they interpret the items of the EQ and SQ-R (i.e. to what extent there is measurement invariance), and therefore part of the sex difference could be due to measurement artefacts. As these limitations specifically apply to self-report measures, it is advisable to rely not only on self-report measures for the assessment of empathy and systemizing, but to also include more objective measures, such as social-cognitive tasks (e.g. Vellante et al.
2013). Finally, the sample of the present study was not randomly selected from the community and may therefore suffer from a self-selection bias. It is possible that empathic males and females are more likely to participate in studies like these. However, since the mean scores and the magnitude of the sex differences are in line with other international studies, we do not consider this limitation as a serious threat to the validity of the findings.
No back-translation has been performed on the Dutch EQ and SQ-R translations, which may have caused minor differences between the Dutch versions and the original English versions. These minor differences are not likely to have influenced the validity of the questionnaire, because the psychometric properties of the Dutch questionnaires were very similar to those reported in previous studies.
The included high functioning ASD sample can be described as a heterogeneous sample including the different conditions from the broad autistic spectrum, ranging from mild to severe. Although the patients were all diagnosed with a DSM-IV classification in the autistic spectrum, a large proportion had not been assessed with an instrument that is regarded as gold standard for the assessment of ASD, such as the ADOS. The majority did not achieve the proposed AQ cut-off score of 32 by Baron-Cohen et al. (
2001). Interestingly, the vast majority of the patients scoring below this cut-off were rated as having clinical problems in the autistic spectrum according to their friends, families or professionals on the ADOS or SRS-A. However, in the present study such other-report measures were unfortunately not available for all patients in order to objectify their autistic spectrum problems. The heterogeneity of the sample, however, might have influenced the results in that respect that even stronger EQ differences and actual SQ-R differences could be found in more severe ASD samples.
Clinical Use
Although lowered EQ is a consistent finding in ASD, the EQ cannot be used to predict or diagnose whether a person has ASD, because its predictive value appeared insufficient for this purpose. Following the methodological framework for assessing health indices (Kirshner and Guyatt
1985), the EQ is not regarded a discriminative or predictive measure, but is rather useful as an evaluative measure. It yields information about an individual’s experience of empathy and the individual’s strengths and weaknesses regarding particular aspects of empathy. The EE subscale should be carefully interpreted in the light of the social context and affective state at the time of assessment, because its test–retest reliability appeared only moderate. Regarding the SQ-R, poor predictive validity was found in a heterogeneous sample of ASD patients. Based on the present study, we therefore recommend to interpret the SQ-R score always in relation to EQ, because SQ-R may lie in the normal range, whereas the discrepancy between empathizing and systemizing in the brain may be large. As for EQ, the SQ-R should merely be viewed as an evaluative measure of an individual’s systemizing style.
The EQ and SQ are self-report measures that depend on the participant’s capacity of self-reflection. Although healthy individuals may in general be well able to reflect upon their own cognitive style, i.e. possess the ability of meta-cognition, this ability may be limited in patients with autism. For example, patients with Asperger syndrome were shown to be impaired in self-reflection and self-awareness (Jackson et al.
2012). When using the EQ and SQ as assessment tools (as well as other self-report tools such as the AQ and SRS-A), they can therefore only be interpreted reliably when the examinee (e.g. a patient with ASD) disposes of good self-reflection abilities. In this context, it is important to consider that self-awareness is regarded as an important part of empathy, because it allows an empathic person to clearly differentiate between his/her own experience and that of the person being observed (Decety and Meyer
2008). This means that patients with ASD who are more impaired in self-reflection abilities may also suffer from greater impairments in empathy, while at the same time they might overestimate their empathic skills on self-report questionnaires like the EQ. Therefore it is important to consider self-reflection or meta-cognitive skills when assessing or interpreting self-reports of empathy. This issue also underscores the importance of using other informants for assessing empathy (Johnson et al.
2009).