Introduction
Autism spectrum disorder (ASD) is characterized by persisting deficits in social communication and interaction, alongside repetitive, stereotyped behavior and restricted interests (APA,
2013). The number of adults diagnosed with ASD has increased dramatically in the past decade and ASD now accounts for a large burden on health care (Fombonne
2009; Keyes et al.
2012). The global prevalence varies greatly but is approximately 1% (Elsabbagh et al.
2012), with 1.8% in men and 0.2% in women (Brugha et al.
2011). Autistic traits have moderate to high heritability, are highly stable, and distributed on a continuum in the general population, where ASD is at one extreme of the population distribution (Hoekstra et al.
2007; Robinson et al.
2011).
Studying autistic traits can give further insight into how they relate to mental processes (Kuo et al.
2014), individual differences (Rivet and Matson
2011), and psychiatric disorders such as anxiety and depression (Rosbrook and Whittinham
2010). Screening for autistic traits in the general population may be helpful in epidemiological research because it may provide necessary sample size to investigate relationships between autism phenotype severity and theoretically important factors. Furthermore, examining autistic traits in general population samples can serve as ‘analogue studies’ for ASD, providing access to larger, more easily accessible samples and thus allowing more complex statistical analyses to be conducted (e.g. Jackson and Dritschel
2016; Kunihira et al.
2006).
Among the variety of screening tools developed to quantify autistic traits, over the past decade the most commonly used is probably the Autism-Spectrum Quotient (AQ; Baron-Cohen et al.
2001a). The AQ has been used to screen clinical samples (Woodbury-Smith et al.
2005) and to predict performance on cognitive tasks (Stewart et al.
2009), social cognition (Baron-Cohen et al.
2001b), spontaneous facial mimicry (Hermans et al.
2009), gaze preference to social and non-social stimuli (Bayliss and Tipper
2005), and auditory speech perception (Stewart and Ota
2008).
The AQ is a self-administered questionnaire for measuring the degree to which adults with normal intelligence show autistic traits. It consists of 50 questions, with 10 questions assessing five different domains relevant for autistic traits (
social skill, attention switching, attention to detail, communication, and imagination). Adequate test–retest reliability has been shown in the AQ (Baron-Cohen et al.
2001a) and the AQ sum scores are normally distributed in the general population (Hurst et al.
2007). Cross-cultural equivalence in Dutch and Japanese samples has also been shown (Hoekstra et al.
2008; Kurita et al.
2005; Wakabayashi et al.
2006).
However, some aspects of the AQ are still questionable. Baron-Cohen et al. (
2001a) originally proposed a unidimensional structure of the AQ based on descriptive item analysis and sum score distribution across ASD and non-ASD groups. The sum score is by far the most commonly used AQ result, yet Baron-Cohen et al. (
2001a) only found adequate internal consistency (defined as Cronbach’s alpha above 0.70; Nunnally and Bernstein
1994) in one of the five autism trait domains in the AQ. Low Cronbach’s alpha indicates a lack of correlation between the items in a scale, which suggests deviation from unidimensionality. The low degree of internal consistency in the AQ has been extensively replicated (e.g., Austin
2005; Hoekstra et al.
2008; Hurst et al.
2007; Kloosterman et al.
2011; Stewart and Austin
2009).
To date, studies using more advanced statistical methods, such as factor analysis, have demonstrated that the AQ may consists of five (Kloosterman et al.
2011; Lau et al.
2013), four (Stewart and Austin
2009), three (Austin
2005; Hurst et al.
2007) or two (Hoekstra et al.
2008) dimensions. The two-factor model (actually two higher-order factors and four primary factors) was confirmed in a validation of a 28-item short form of the AQ (Hoekstra et al.
2011). Thus, the unidimensional structure assumed by Baron-Cohen et al. (
2001a) has not been replicated.
A common feature of previous studies is that the psychometric analyses are mostly based on non-ASD samples. The choice of mainly student samples may be reasonable, given that the AQ is directed towards autistic traits in the general population. However, the feasibility of the AQ and the theoretical basis of an autistic trait continuum require that the properties of the AQ are similar among those with and without ASD.
Another common feature of these studies is that they apply classical test theory techniques, such as principal component analysis, exploratory factor analysis or confirmatory factor analysis. As shown by Gorsuch (
1997), factor analysis on ordinal data, if treated as interval data, can result in spurious factors. In addition, item distributions may differ from each other and therefore items will tend to load on the same factor as other items with similar distributions. One will thus make erroneous conclusions about the scale, especially when sum score, as in AQ, is used to define the degree of an underlying trait. Consistent with this, Stewart and Austin (
2009) noted that their initial exploratory factor analysis suggested a large number of poorly defined factors. Consequently, these numerous factors may possibly reflect distribution properties and not the underlying construct being measured. Therefore, we will take a different approach in the present study and examine the dimensionality of AQ using Rasch analysis.
Rasch models (Rasch
1960) have currently been applied in the development and validation of unidimensional scales with interval scale properties based on frequency questions or Likert items. They facilitate calibration of the observed test values with the underlying latent property (Linacre
1994). Rasch analysis can thus determine the degree to which items in the AQ accurately characterize autistic traits. Rasch models facilitate analysis of whether an instrument meets the requirements of invariance; for instance whether the scale works in a similar manner among men and women with and without ASD. Finally, the Rasch model is a method to validate the interval properties of a scale. An advantage of Rasch analysis is that it makes no assumptions about the distribution of the latent property, whereas in classical test theory techniques, normally distributed latent variables are required. Hence, the aim of the study was to test the scale properties of the Swedish AQ using Rasch analysis.
Methods
Participants
Two samples, an ASD group and a non-ASD group, were recruited for this study. The ASD group was recruited from the Centre for Adult Habilitation, Region Örebro County, Sweden. A total of 401 adults diagnosed with ASD and without intellectual impairment (i.e., IQ > 70) were invited to participate and 130 of them volunteered (68 men and 62 women, age 18–62, mean = 29.3 years, SD = 9.9). No age difference was found between the participants and the non-participants; however, the proportion of participating men (28%) was significantly lower than the proportion of participating women (40%) (χ2 = 6.25, p < 0.05).
The non-ASD group consisted of 219 university students recruited from various departments at Örebro University (93 men and 126 women, age 18–55 years, mean = 23.8 years, SD = 5.7). None of them reported having an ASD diagnosis. No age difference was found between men and women (t(217) = 0.68, p = 0.50) and the sex ratio of the sample was equivalent to that of the university (i.e., 60% women).
The ASD and non-ASD groups differed in regard to sex and age. The ASD group had significantly more men than the non-ASD group (χ2 = 9.43, p < 0.01) and the ASD group was on average older than the non-ASD group (t(347) = 9.06, p < 0.001).
The Autism-Spectrum Quotient
The Autism-Spectrum Quotient (AQ; Baron-Cohen et al.
2001a) is a 50-item self-report questionnaire for measuring the degree to which an adult with normal intelligence has the traits associated with the autistic spectrum. The items, which are given in Table
3, assess five different domains (10 items per domain):
social skill, attention switching, attention to detail, communication, and
imagination. All items are scored on a four-point rating scale ranging from 1 =
definitely agree to 4 =
definitely disagree. The scorings are reversed (from 4 =
definitely agree to 1 =
definitely disagree) for the items in which an “agree” response indicates an autistic trait. The following items were reversed: 2, 4, 5, 6, 7, 9, 12, 13, 16, 18, 19, 20, 21, 22, 23, 26, 33, 35, 39, 41, 42, 43, 45, and 46. All item scores are summed; thus, AQ sum score can vary between 50 (at the lowest extreme of the autistic trait continuum) and 200 (at the highest extreme of the autistic trait continuum).
The AQ was translated into Swedish after permission from Professor Simon Baron-Cohen. The translation was performed independently by two professional translators. The two translations were compared and the few minor discrepancies that emerged, which consisted of different choices of synonymous words or sentence structure, were discussed with the translators. Subsequently, a third professional translator translated the Swedish version back into English to confirm equivalence with the original. Hence, the Swedish version of AQ is linguistically similar to the English original. The Swedish translation is available from the first author.
Procedure
The adults with ASD received the study information, the study consent form, the AQ questionnaire, and a prepaid envelope by post. The students (non-ASD group) were informed verbally about the study and completed the AQ questionnaire during lectures. No course credit was received.
Data Analysis
IBM SPSS Statistics version 22 (IBM Corp, Armonk, NY) was used to summarize participant characteristics and to evaluate group differences using t-tests. A
p value below 0.05 was regarded as significant. The AQ rank-ordered scores were analyzed using Rasch rating scale model with Winsteps 3.81.0 (Linacre
2014). Detailed explanation of Rasch models is given elsewhere (Engelhard
2013). In brief, Rasch analysis converts rank-ordered data into interval logit measures, giving each person and each item a logit measure. Logit stands for Log-Odds Unit and form an equal interval linear scale. The logit scale is unaffected by variations in the distribution of measures and independent of the particular items included in a test or the particular samplings of people (Wright
1993). Thus, an ‘AQ person measure’ represents the degree to which a person shows autistic traits (the higher the logits, the higher the degree of autistic traits). An ‘AQ item measure’ represents how difficult any particular item may be to endorse given a specific degree of autistic traits (the higher the logits, the more difficult to endorse). Rasch analysis enables the researchers to identify whether any items are misleading and whether the rating categories have been used as intended by the instrument developer.
Rating Categories
The four rating categories were examined according to four criteria (Linacre
2002):
(i) there should be at least 10 responses in each rating category,
(ii) the average AQ person measure should be lower in a category representing low AQ than in one representing high AQ,
(iii) the transition point between each two categories (threshold) should follow an increasing level of the underlying autistic trait, and
(iv) the category outfit mean square should be less than 2.0. The rating scale graphs generated by Winsteps were used to examine the ordering of thresholds and how the rating categories were positioned along the latent variable.
Item Properties
Point–measure correlations, local item independence, and fit statistics were used to examine the item properties. Point–measure correlation of each item reports the relationship between the group’s performance on the item and the group’s performance on the whole instrument. All items are expected to correlate positively in the direction of the latent variable, if any items show negative correlations it is assumed that these items are considered invalid. Local item independence assessed whether responses to any item were unrelated to any other item when trait level was controlled; thus, the endorsement of any item should not affect the probability of endorsement of the other items. Violation of local item independence may affect parameter estimates. An item residual correlation of at least 0.7 (i.e., common variance approximately 0.50) was set as a criterion for item dependency (Linacre
2009). Fit statistics detect the extent to which the response pattern observed in the data matches the one expected by the model. In this study, an item was considered as misfit if infit and outfit mean square was greater than 1.50.
Differential Item Functioning
Differential item functioning (DIF) was used to examine whether an item performed differently for the ASD group than for the non-ASD group. For this study, item DIF was considered present if the difference between two groups on an item measure was 0.5 logits or more and reached significance (p < 0.05) in a t test (Karami
2012).
Scale Reliability
Scale reliability was evaluated in terms of person reliability, an index similar to Cronbach’s alpha: for the range 0–1, coefficients above 0.70 are considered as a minimum for group use and coefficients above 0.85 for individual use (Tennant and Conaghan
2007).
Unidimensionality
Principal components analysis of residuals was used to examine whether the five AQ domains measure different dimensions or work together to measure one dimension. We used two criteria: at least 50% of the total variance should be explained by the first latent variable and any additional factor should explain less than 5% of the remaining variance after removal of the first latent variable (Linacre
2009).
Targeting
We explored the potential use of the AQ to measure a clinical population by examining the targeting of item difficulty (not too easy, not too hard) to the individual’s trait level in the person–item map. The map orders person and item measures along the same scale, which enables us to examine whether the AQ has enough items to discriminate people with different levels of autistic traits. The item difficulty range is expected to match the range of autistic trait levels in the ASD group. A value around zero thus indicates that the items are well targeted for the people in the sample (Tennant and Conaghan
2007).
Sensitivity and Specificity
Sensitivity and specificity of the AQ as a screening tool for ASD was evaluated using the receiver operating characteristic (ROC) curve and area under the curve (AUC) calculated for the full AQ scale and the five AQ domains. The Youden index (Youden
1950), which is the point at which the tangent to the ROC curve is parallel to the chance line, was used to find the optimal cut-off scores. This index has been used in the development of diagnostic assessments for ASD (Cohen et al.
2010) and is regarded as one of the most stringent statistical method to identify a cut-off or threshold in diagnostic measures.
Discussion
The study tested the scale properties of the Swedish AQ using the Rasch rating scale model, with mixed results: several scale properties were good to excellent whereas others were poor. On the one hand, the AQ fulfilled the rating scale criteria, had minimal DIF, adequate item properties, adequate item and person separation and reliability, and excellent targeting for the ASD group; on the other hand, the AQ did not meet the criteria for a unidimensional scale.
In regard to item properties, five items were misfit and thus did not fit the expected model: item 21 in the domain
Imagination and items 9, 29, 30, and 49 in
Attention to detail. Three of the items (29, 30, and 49) had negative point–measure correlations, with the scoring orientation on these items opposite to the orientation of the latent variable (the degree of autistic traits). Reasons for negative point–measure correlations can, for instance, be person-specific knowledge, guessing, or reverse scoring. It is notable that all three items are negatively worded and that these items were also scored higher by the non-ASD group than the ASD group, suggesting that the items do not represent a measure of autistic traits and need revision. This is in line with previous studies finding low or negative domain loadings for these items (Austin
2005; Hoekstra et al.
2008; Hurst et al.
2007; Stewart and Austin
2009). It should be noted that in the development of the AQ, Baron–Cohen and colleagues (Baron–Cohen et al.
2001a) found that items 29 and 30 were scored higher by controls than adults with Asperger’s syndrome or high-functioning autism, but nevertheless were retained in order to reduce the group differences.
No item pair was locally dependent, although item residuals were moderately correlated between “I enjoy social chit-chat” (item 17) and “I am good at social chit-chat” (item 38), and between “I enjoy social occasions” (item 44) and “I enjoy meeting new people” (item 47). In both pairs, the items are similar in meaning. Even if they fit the model, use of highly similar worded items will boost the items’ correlation with the total score while providing no unique information about the responder. In the presence of local dependency, it is recommended that one of the similar items should be excluded due to potential redundancy.
Five of the 50 items showed DIF, three from the
Social skill domain, one from the
Imagination domain, and one from the
Attention to detail domain. Interestingly, the DIF indicated that these items exaggerated the group differences in the expected direction. That is, people with ASD are expected to be less socially skilled and imaginative and more attentive to details than those without ASD; these items thus highlight the group differences more distinctly than the other items in the AQ. Absence of DIF is crucial for an adequate scale (Tennant and Conaghan
2007), but given this overestimation bias—that only five out of 50 items showed DIF and that all but one of these items were below 1 logit—it would appear that the AQ items, for all practical purposes, are adequate for people with as well as without ASD.
The AQ items targeted well at the individuals with ASD. However, as shown in the person–item maps, most of the non-ASD respondents were clustered at the lower end of the measures, indicating a low position on the autistic continuum, while many of the items were concentrated at the higher end of the continuum. This would suggest that the set of AQ items is less appropriate for measuring degree of autistic traits in the non-ASD group. Furthermore, the result is reasonable given that the AQ was developed to screen adults with Asperger’s syndrome or high-functioning autism, who are more likely to endorse many of the items. During piloting of the AQ, Barron-Cohen (
2001a) excluded the items (except items 29 and 30) if non-ASD people selected ‘definitely disagree’ or ‘slightly disagree’ more often than did people with Asperger’s syndrome or high-functioning autism. Consequently, non-ASD respondents would be less likely to endorse items on the AQ and they will thus show worse targeting.
The Rasch analysis supported most of AQ scaling properties but failed to support Barron-Cohen et al.’s (
2001a) assumption that AQ measures a single latent variable, namely, the degree of autistic traits. This result is in line with previous research using factor analysis (Austin
2005; Hoekstra et al.
2008; Hurst et al.
2007; Stewart and Austin
2009) and Mokken scaling (Stewart et al.
2015). The hypothesized single latent variable is not consistent with the multidimensional nature of ASD, as expressed in the Diagnostic and Statistical Manual of Mental Disorders, DSM-5 (American Psychiatric Association
2013), or with the fact that Barron-Cohen (
2001a) selected the AQ items from the domains in the “triad” of autistic symptoms. The use of a single AQ sum score may therefore not adequately express the multifaceted aspect of ASD.
By reducing the AQ to 12 items from the
Social skill, Attention switching, and
Communication domains, we were able to meet both criteria for unidimensionality. Intriguingly, nine of these items (11, 13, 17, 22, 26, 34, 38, 44, and 47) are among the ten items that passed the Mokken scaling test on people with ASD (Stewart et al.
2015). Hoekstra et al. (
2008), using CFA, found that the AQ consisted of two second-order factors, one of them including
Social skill, Attention switching and
Communication. Using different evaluation methods we thus converged on a similar conclusion: the AQ measures more than one latent variable and consists of an unnecessarily large number of items in order to measure a unidimensional autistic trait. Despite this, a majority of empirical studies use the AQ sum score as the sole measure of an autistic tendency. If the AQ measures a set of (somewhat related) constructs, what exactly does an AQ sum score mean and what consequences does this have for our understanding of autism?
According to the psychometric literature, if the assumption of unidimensionality is violated, any statistical analysis based on it would be misleading. Specifically, estimates of the latent variables and item parameters will generally be biased because of model misspecification, which in turn leads to incorrect decisions on subsequent statistical analysis, such as testing group differences and correlations between latent variables (e.g., Horton et al.
2013).
It should be noted that unidimensionality is a relative matter. The judgment of whether a scale is sufficiently unidimensional should ultimately come from outside the data and be driven by the purpose of measurement, clinical, and theoretical considerations (Andrich
1988; Cano et al.
2011; Rasch
1960).
A pragmatic way to salvage a situation like this would be to treat the AQ sum score as an index, in other words, a formative latent variable (see Simonetto (
2012) for an overview). A formative latent variable is defined by a number of non-interchangeable composite indicators, such as income, education, and occupation in the variable socioeconomic status, or weight and height in the variable body mass index. Consequently, a formative latent variable does not exist at a deeper conceptual level than its defining composite indicators (Law et al.
1998). Following this path, AQ sum score will lose content validity and serve as a mere observable outcome and predictor variable.
To what extent, then, can the AQ predict presence of ASD? The person reliability and separation indices of the AQ were adequate, as were the item reliability and separation indices. The AQ has the potential to classify three groups of people (low, average, and high degree of autistic traits) and is at a level of sensitivity required for both group and individual use (Tennant and Conaghan
2007). The AQ may also be able to separate more than ten item difficulty levels, which confirms its item difficulty hierarchy, in other words, its construct validity. The AQ sum score differentiated well between the ASD group and the non-ASD group. The AUC was above that found on similar populations in Britain (e.g. Woodbury-Smith et al.
2005) but lower than that reported in the Netherlands (Wouters and Spek
2011) or Australia (Broadbent et al.
2013). Regarding the AQ domains, the ROC indicated that the domains
Social skill, Attention switching, and
Communication had adequate AUC (above 80%), whereas the AUC of
Imagination was fair and the AUC of
Attention to detail, though above chance, was poor (below 60%). This is in line with the large proportion (40%) of misfit items in this domain and with previous studies showing that
Attention to detail is the poorest domain in the AQ for differentiating people with and without ASD diagnoses (Allison et al.
2012; Wouters and Spek
2011).
The AQ logits and sum scores obtained for each individual were highly correlated (r = 0.998); suggesting that summed raw scores adequately reflected true change along the autistic traits continuum that the AQ quantifies. However, it should be borne in mind that the conversion to logits would only be motivated if the sample characteristics are similar to those of the present study. Consequently, Rasch analyses are needed prior to using the AQ on other populations.
Limitations
Although this study provides an important contribution to our understanding of the AQ and the assessment of autistic traits in people with and without ASD, there are a number of limitations that warrant discussion. First, the groups were not matched for sex and age. The participants in the non-ASD group were younger and included a larger proportion of women than the ASD group. Despite sex and age differences, the DIF analyses showed few discrepancies between the ASD and non-ASD groups. Consistent with previous research, there was no difference between mean AQ sum scores of men and women with ASD (Baron-Cohen et al.
2001a,
2006; Hoekstra et al.
2008).
Moreover, the sample size fulfilled the requirement of stable calibration for Rasch analysis but the subgroups for DIF analysis were too small (see Linacre
2013) to draw a definite conclusion regarding whether, for example, sex- or age-related DIF was present in the items in either the ASD or the non-ASD group. Therefore, any conclusions regarding sex or age differences between groups should be interpreted with caution.
Furthermore, some of the ASD participants attached comments to their questionnaires that it was somewhat challenging for them to complete so many questions. It is reasonable to conclude that some people with ASD, regardless of their motivation to complete the questionnaire, may have lacked the ability to do so. Although all people with ASD registered in the county were invited to participate, the results are only generalizable to those with the ability to complete the AQ questionnaire. This may have less impact on estimated AQ scale properties, because the reported level of autism traits as quantified by AQ is probably an underestimation of the true level in the ASD population. In addition, the non-ASD sample completed the AQ anonymously, which meant that we could not verify whether any of them had an ASD diagnosis or would fall within that category.