Introduction

Autism spectrum disorders (ASD) comprise a group of complex neurodevelopmental disorders characterized by restricted interests, impaired social interaction, and stereotyped and repetitive behaviors1. Epidemiological studies reflect that around 1% of worldwide population could have ASD diagnosis2. The lack of empirical evidence to support the appropriateness of different ASD subtypes has led to their classification into one single category. Part of the clinical heterogeneity of ASD has to do with the coexistence with psychiatric and medical conditions3,4,5,6,7,8,9. Differential psychiatric comorbidities may have an etiopathological relationship with the ASD subtype, as previously conceptualized.

Major depression (MDD), anxiety (ANX), attention deficit hyperactivity disorder (ADHD), schizophrenia (SCZ), and obsessive-compulsive disorders (OCD) are among the most prevalent psychiatric comorbidities in ASD patients. This clinical overlap has been more extensively studied in high functioning autism (HFA) and in Asperger syndrome (AS), which may be biasing the findings10,11,12,13,14.

Recent advances in ASD genomics have demonstrated a clear pattern of biological overlap between ASD and other psychiatric conditions14,15,16,17,18. While ASD predisposing rare variation shows a clear overlap with severe neurodevelopmental disorders, intellectual disability and, in lesser degree, schizophrenia16, is the common variation that seems to be shared across major psychiatric disorders14,15. For instance, strong genetic correlations between ASD and MDD (rG = 0.412; P < 10−24), SCZ (rG = 0.212; P < 10−24) or ADHD (rG = 0.360; P < 10−11) have been described in terms of common variation15. In this sense, the latest ASD GWAS to date demonstrated that common variation contribution to Asperger (AS) subtype was almost double the contribution to other DSM-IV ASD subtypes15. Given this main difference in the genetic architecture of AS, a reasonable question is whether common predisposing variation from comorbid disorders may be present to a greater extent in AS than in other ASD subtypes.

In addition, the criteria for an appropriate AS diagnosis is, still, a matter of important debate19,20,21,22. In a previous study, our group surveyed subclinical psychopathology in children and adolescents diagnosed with AS following DSM-IV and Gillberg’s criteria, describing high sub-syndromic MDD, OCD, ADHD symptomatology in our AS sample23.

Therefore, in this study, we aimed to assess whether polygenic risk for ASD and comorbid disorders are differently transmitted in AS compared with other autism subtypes (Non_AS). Using polygenic transmission disequilibrium test, a method previously performed in a larger ASD cohort18, we analyzed polygenic transmission using an ASD cohort of 379 trios and tested whether comorbid disorders were over-transmitted in AS but not in Non_AS trios. We then used hierarchical clustering to classify subjects based on their transmitted polygenic scores. Indeed, to more deeply understand the significance of this polygenic contribution, we used recent developed software, PrediXcan24 and S-PrediXcan25 to analyze how over-transmitted alleles in AS subjects affect brain cortical gene expression.

Methods

ASD sample for polygenic score calculation (target sample)

ASD complete trios (N = 379) were recruited in two sites in Spain, Santiago de Compostela (N = 138; sample 1), and Madrid (N = 241; sample 2). Subjects from Madrid were recruited as part of AMITEA program, at the Department of Child and Adolescent Psychiatry, Hospital General Universitario Gregorio Marañón. Subjects from Santiago were recruited from an ongoing project from the Galician Public Foundation of Genomic Medicine. Informed consent signed by each participating subject or legal guardian and approval from the corresponding Research Ethics Committee were obtained before starting the study. Only individuals with 3 years of age or above and ASD diagnosis were included. All trios recruited were complete. Parents had no diagnosed psychiatric disorder when trios were recruited.

Diagnosis of ASD for sample 2 was done by Child Psychiatrists with extensive experience in ASD, clinically trained in the autism diagnostic interview-revised (ADI-R) and research-trained in the autism diagnostic observation schedule (ADOS). All diagnoses followed Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition Text Revision and Fifth Edition (DSM-IV-TR and DSM-5) criteria26. When necessary (due to inconsistencies found between all the information available), the ADOS and/or ADI-R were also administered. For the comorbidity study, we used only participants from the Madrid site, with 39 trios with this specific AS diagnosis, and 202 with other ASD diagnoses (Non_AS), since only these 241 trios were phenotyped enough to ensure a possible diagnosis of AS. These subjects were diagnosed with AS if they had an ASD without mental retardation diagnosis and met Gilbert’s criteria for AS19. In fact, 41 probands without mental retardation were included within the 202 Non_AS trios, as they did not meet both DSM and Gilbert’s criteria. No screening for copy number variants (CNVs) carriers was performed. The average age for recruitment and diagnoses was 15.41 years (CI 95%: 14.55–16.66).

Polygenic scores calculation

We calculated polygenic risk scores (PRS) in the 379 Spanish ASD trios that were sequenced as part of the Autism Sequencing Consortium (ASC). Firstly, exome-based VCF files were imputed at Michigan Imputation Server (https://imputationserver.sph.umich.edu/) using 1000 Genome Project phase 3v.5 reference panel in order to capture genomic variants beyond exome. From 654,286 variants, a total of 38.7 million were obtained. Only biallelic variants with imputation quality score > 0.9 and minor allele frequency (MAF) > 0.1% were considered (72,292 genotyped and 997,210 imputed SNPs were retained). Then, exome-based PRS scores were calculated. Briefly, GWAS summary statistics from PGC repository (https://www.med.unc.edu/pgc/results-and-downloads) were used as discovery sample (Supplementary Table 1). PRS was assigned to each individual from target sample as the sum of number of effect alleles weighted by their effect in the discovery sample. Indels were also excluded. Clumping was performed using PLINK v1.9 code “--clump-r2 0.1 --clump-kb 500”. We used Flip Strand and remove ambiguous genomic positions. For each individual, we calculated PRS scores for ASD15 and for five comorbid disorders here surveyed: MDD27, SCZ28, ADHD29, ANX30, and OCD31. We generated different PRS using various P thresholds (P < 0.01, 0.05, 0.1, 0.2, 0.5, and 1) in the discovery dataset for inclusion of SNPs below these thresholds in PRS and calculated polygenic transmission disequilibrium test (pTDT), using parent-child trios PRS information, as previously performed in ASD18. Briefly, the expected PRS distribution of the offspring is compared with the average PRS distribution of the parents, and its deviations is tested with a one-sample t-test. For each disorder, the P threshold with the highest estimate value of transmission from pTDT on the whole sample (sample 1 and sample 2) was selected and henceforth used for comparison between AS (N = 39) and Non_AS (N = 202 trios; sample 2) trios. As a negative control, pTDT analysis was repeated using summary statistics of body mass index (BMI), from recent meta-analysis32, a trait without genetic correlation with ASD.

Clustering classification based on transmitted genetic scores

In order to analyze substructure of our ASD cohort based on polygenic transmission, we performed an agnostic procedure of clustering classification (without specifying the number of clusters to be generated). We first determined the more suitable method using the R package clValid33. Briefly, clValid performs clustering to analyze the input data by using many different algorithms (hierarchical, self-organizing maps (SOM), K-means, self-organizing tree algorithm (SOTA), and model-based). The most suitable method is suggested after an internal validation procedure implemented. The internal measures include the connectivity (the degree of connectedness of the clusters), Silhouette Width (the degree of confidence in a particular clustering assignment), and Dunn Index (the ratio between the smallest distance between observations not in the same cluster to the largest intra-cluster distance). We then used NbClust R package34 to determine the optimal number of clusters making use of different indexes and varying all combinations of number of clusters, distance measures, and clustering methods.

Hierarchical clustering using Ward’s minimum variance method was performed with hclust function from cluster R package. Goodness of clustering algorithm results was assessed by determining the silhouette width coefficient, which measures how well an observation is clustered and it estimates the average distance between clusters. Observations with negative silhouette width were removed. fviz_silhouette function from factoextra R library was used. Finally, ASD subjects within each cluster were analyzed to test enrichment of AS subjects within any group.

Gene expression prediction

PrediXcan, a recently published software that imputes tissue-specific gene expression levels from subject’s genetic profile24, was used to assess gene expression differences between patients from AS and Non_AS trios. Briefly, PrediXcan makes use of available expression quantitative trait loci (eQTLs) data from GTEx project35 to predict expression levels in a tissue-dependent manner and compares expression levels across different phenotypes. Genotype files from ASD trios with available diagnose information (N = 241) were converted from.bed to.dosage format. Predixcan was run with --predict and –assoc --logistic arguments to predict expression levels per individual and perform statistical comparisons per gene between AS and Non_AS subjects. Given the extensive literature connecting cortical dysfunctions and autism, either from imaging36, gene expression37,38,39 or rare disrupting variation40 studies, GTEx brain frontal cortex was used as reference transcriptome. Differences in predicted gene expression in frontal cortex between AS and Non_AS patients were assessed.

S-PrediXcan25, an extension of PrediXcan software that infers its results using only summary statistics from GWAS, was then used on GWAS summary data from ASD and comorbid disorders with significant polygenic transmission (SCZ, MDD, ADHD) to estimate gene expression differences between cases and controls using imputed expression data from GTEx (brain frontal cortex).

Correlation between gene expression differences between AS and Non_AS (measured by per gene Z-values derived from PrediXcan test) and gene expression differences between cases and controls from comorbid disorders (measured by per gene Z-values derived from S-PrediXcan test) was analyzed. This relation was studied under different P thresholds from S-PrediXcan.

GTEx expression data from lung and cerebellum, other brain tissue with less consistent evidences in psychiatry that frontal cortex41 were used as comparative controls. Body mass index (BMI) summary statistics32 were also used in S-PrediXcan analyses in same tissues as a negative control, as it has not been associated with ASD across literature and repeatedly used as control. Finally, Alzheimer disease (ALZ) summary statistics42 were used in S-PrediXcan analyses in same tissues as a comparison between neurodevelopment and a neurodegenerative disorder.

Statistical analyses

To assess for polygenic over-transmission in AS and Non_AS, one-sided t-test were performed in form of pTDT. Full procedure is described in previous work18. Comparisons of polygenic transmission values between AS and Non_AS groups were performed with two-sample t-test. Data normality was contrasted with Shapiro–Wilk test. Paired t-test was used to test significance of transmission across all comorbid disorders in AS and Non_AS subgroups of trios. Correlation analyses were performed using Spearman correlation. In case of multiple test comparison, Benjamini–Hochberg FDR correction was performed.

Results

Polygenic transmission disequilibrium

After imputation and quality control (QC) of genotypic data, we retained 221,418 SNPs, of which a total of 51,718 variants with MAF > 0.1% were used after clumping (methods). Average pTDT and deviation were calculated in our ASD trios cohort using GWAS data from ASD and five comorbid disorders assessed (SCZ, MDD, ADHD, ANX, and OCD; Table 1). All subsequent analyses were based on the P threshold in which pTDT had higher transmission values at each disorder.

Table 1 Polygenic transmission estimate values (from pTDT tests) of ASD and studied comorbid disorders (ASD, SCZ, MDD, ADHD, ANX, and OCD) in all ASD trios’ population (N = 379).

ASD trios with available full diagnosis (N = 241) were used to compare pTDT values between AS and Non_AS. Polygenic risk for SCZ, ADHD and MDD was significantly over-transmitted to AS affected children in trios (P < 0.05; Fig. 1), but not to Non_AS affected children (P > 0.05). We confirmed these results by conducting 1000 random permutations of diagnosis status (permutation-PSCZ = 0.0409; permutation-PMDD = 0.051; permutation-PADHD = 0.0301). As a negative control, pTDT analysis was repeated using BMI summary statistics. Polygenic over-transmission was neither found in AS nor in Non_AS subgroup (P > 0.05 for both comparisons; Supplementary Table 2). Although no significance was found in ASD, OCD, or ANX polygenic risks in the AS subgroup of patients, we observed higher polygenic transmission in the AS group for all psychiatric-related disorders assessed in this study. This pattern of over-transmission of psychiatric-related polygenic risks is, in fact, unlikely to happen by chance (Paired t-test P = 0.02175, CI (95%) = 0.04562–0.37237).

Fig. 1: Polygenic transmission of ASD, comorbid disorders and BMI AS trios (N = 39), and Non_AS trios (N = 202).
figure 1

Transmission disequilibrium is represented as standard deviations of the mid-parent distribution. Colored geometric lines represent 95% confidence intervals. P-values over geom error bars measure the probability that the mean of the pTDT deviation distribution is higher than 0 (two-sided, one-sample t-test). For each disorder, pTDT values were calculated based on the P threshold in which highest transmission were found for the whole ASD trios’ population (N = 379). Significant over-transmissions were confirmed with random permutation of AS subgroup. “*” Permutation P-value < 0.05; “+” Permutation P-value < 0.1.

Correlation of pTDT deviations across disorders was assessed (Supplementary Fig. 1). Although SCZ and OCD pTDT deviations are significantly correlated within our ASD cohort, there are no correlation between any of significantly over-transmitted pTDT scores (ADHD, SCZ, and MDD). Therefore, multiple polygenic over-transmission does not occur due to genetic overlap between disorders. Instead, different genetic etiologies contribute to the development of AS condition is a more likely outcome.

We finally performed clustering using all psychiatric pTDT deviation values from AS and Non_AS trios. Hierarchical clustering was determined as the most suitable method (Supplementary Table 3). Two different clusters (C1 and C2) were observed with the highest probability (Supplementary Table 4; Supplementary Fig. 2; 165 and 76 subjects in C1 and C2, respectively). Individuals with negative silhouette classification values were removed (Supplementary Fig. 3), pruning clusters with only confident results (118 and 74 individuals in C1 and C2, respectively). AS subjects were overrepresented in cluster C2 (18 of 74 vs. 13 of 118; Fisher two-tailed P = 0.025; OR (95%) = 2.60 (1.10–6.19)).

Prediction of brain frontal cortex gene expression

In order to analyze whether polygenic over-transmission related with comorbid disorders (MDD, ADHD, and SCZ) in AS subgroup has biological consequences, we used two recent software, Predixcan and S-PrediXcan (methods), to infer gene expression differences between AS and Non_AS, and to compare these differences with expression patterns related with those comorbid disorders.

We used gene expression predictions from brain frontal cortex, a tissue extensively related with ASD. Gene expression was predicted from our imputed exome data. Two-hundred twenty-six out of 4111 imputable genes were differently expressed (P < 0.05) in AS vs. Non_AS subjects. However, no gene reached statistical significance in predicted gene expression differences after correcting for multiple testing (Supplementary Table 5).

Z-values from PrediXcan test, regarding gene expression differences in brain frontal cortex between AS and non AS patients, and Z-values from S-PrediXcan test, regarding gene expression differences in cases and controls for each comorbid disorder, were compared to study whether cortical gene expression profiles in AS compared to Non_AS resemble gene expression profiles in comorbid disorders with significant polygenic transmission in AS trios (ADHD, MDD, and SCZ; Fig. 2 and Supplementary Table 6). We observed a significant pattern of correlation between ADHD and AS gene expression patterns in brain frontal cortex (Fig. 2; most significant Spearman Corr = 0.135; P = 8.832 × 10−4 (Pthreshold = 0.1)). When restricting analysis to genes more differently expressed from S-PrediXcan results (ADHD vs. controls), correlation systematically grows, indicating that the more specific ADHD genes in terms of predicted gene expression, the higher correlation with gene expression differences between AS and Non_AS samples. Interestingly, this growing correlation pattern was neither reproducible in cerebellum nor in lung, reflecting that polygenic overlap between ADHD and AS may have high impact in brain cortex but not in other tissues (Fig. 2).

Fig. 2: Spearman correlation between gene expression diferences in AS against Non_AS and gene expression differences in comorbid disorder (ADHD, MDD, and SCZ).
figure 2

Correlation under various P cutoffs from S-PrediXcan results (imputed gene expression differences in cases vs. controls from ADHD, MDD, SCZ, and BMI GWAS), were assessed. Predicted expression relationships were studied in brain frontal cortex, cerebellum, and lung. BMI was used as a negative control disorder. *P < 0.05; **P < 0.01; ***P < 0.001. Gene numbers and correlation results are described in Supplementary Table 6.

We also identified significant correlation between MDD and AS gene expression patterns in brain frontal cortex (Fig. 2; most significant Spearman Corr = 0.490; P = 7.65 × 10−3 (Pthreshold = 0.001)), and similar growing patterns of correlation were observed when restricting genes analyzed to those more differentially expressed from S-PrediXcan results (lowering Pthreshold). However, when comparing expression differences across other tissues, significant correlation and same growing correlation trends were observed in cerebellum (Fig. 2; most significant Spearman Corr = 0.68; P = 1.64 × 10−5 (Pthreshold = 0.001)) and lung (Fig. 2; most significant Spearman Corr = 0.44; P = 4.49 × 10−3 (Pthreshold = 0.001)). Thus, expression similarities between MDD and AS conditions appear not to be restricted to brain cortex but distributed across other tissues.

Contrary to MDD and ADHD, polygenic overlap between AS and SCZ could not be explained in terms of predicted expression differences in brain frontal cortex, where no significant correlations were found (Supplementary Table 6). Significant but less consistent correlations were found in cerebellum (Fig. 2; most significant Spearman Corr = 0.176; P = 0.016 (Pthreshold = 0.001)), but to a significant lesser degree than findings in ADHD or MDD.

To assess whether these correlations patterns are extendable to a neurodegenerative disorder, AS and ALZ gene expression correlation was also explored. In contrast with ADHD or MDD, no consistent growing pattern of correlation was observed (Supplementary Table 6), which suggests that expression similarities between AS and neuropsychiatric disorders are restricted to neurodevelopment.

Indeed, no correlation pattern was observed when using BMI results from S-PrediXcan in any of the three tissues analyzed (Fig. 2).

Discussion

The results presented here confirm that the ASD polygenic contribution is shared with other psychiatric disorders, such as SZ, MDD, and ADHD, but restricted to AS (DSM-IV plus Gillberg criteria) instead of the whole ASD sample. Indeed, for the five described comorbid disorders studied here (SCZ, MDD, ADHD, ANX, and OCD), the polygenic contribution is higher in AS subtype of ASD than in Non_AS autism (Fig. 1). This higher contribution of common variation in AS has been recently described in the largest ASD GWAS performed to date15. Although diagnostic criteria used in both studies vary slightly, the results presented here are consistent with those from Grove et al. Insofar, a higher contribution from common variation arises in HFA compared to syndromic forms or other subtypes of autism with marked intellectual affectation, who seem to be more affected by rare disruptive variation40. The differential genetic contributions in different clinical subtypes of ASD could partially explain the weaker correlations observed between ASD and other psychiatric disorders in previous cross-disorder studies based on both genomic14 or transcriptomic43 data.

A main utility of PRS is the possibility to evaluate the genetic contributions of different disorders on well-characterized subtypes of a target sample under study15,44,45,46. In this study, by performing pTDT18, which uses family-based samples, results are not affected by ancestral stratification or other environmental factors that potentially biases case–control studies, such as socioeconomic status.

Given that no correlation between ASD and comorbid disorders (SZ, MDD, and ADHD) was observed at the pTDT level (Supplementary Fig. 1), we performed hierarchical clustering using pTDT values and observed two main clusters with one of them enriched in AS subtype (P = 0.025; Supplementary Fig. 2). Thus, by using an agnostic procedure of hierarchical clustering we strengthen the rationale of our initial hypothesis.

Whole-genotype data was used to predict gene expression differences between AS and Non_AS subjects24. Similarly, GWAS summary statistics were used to infer gene expression differences between cases and controls from studied comorbid disorders25. By comparing AS–Non_AS and case–control differences from comorbid disorders, we described significant gene expression relationships between AS and ADHD or MDD.

These results suggest that the polygenic risk shared between AS and other psychiatric disorders is mediated mainly through changes in brain gene expression. In this sense, while we show that expression correlations between AS and ADHD are restricted to prefrontal brain cortex, we observe gene expression correlations between AS and MDD extending to cerebellum and beyond brain tissue, with notorious correlation even in lung tissue, suggesting involvement of genes with broad expression related to general biological processes. Although these expression analyses from PrediXcan24 and S-PrediXcan25 software rely on imputed data using eQTL information from GTEx database35, they have been already extensively used in psychiatric genetics as a reliable approach to infer tissue-specific gene expression values47,48.

One of the greatest novelties of our study is the combination of DSM-IV and Gillberg criteria to diagnose AS subtype. This coincides with Klin et al.20 proposal of giving precedence to the diagnosis of AS when both AS and Autism criteria are met. Unique features of AS such as social motivation, awkward one-sided social approaches or precocious language, among others20,49, are considered when clinically diagnosing AS as a subtype separated from HFA. Lack of empirical evidence for distinct subtypes derived in lumping all subtypes of ASD within a same category in the DSM-550. Apart from brain connectivity differences51,52,53,54, very little neurobiological data support the distinction between AS and HFA. The qualitative differences between AS and HFA with respect to psychiatric comorbidity have been hitherto explained by psychological mechanisms50. Thus, having social motivation but being incompetent in appropriately approaching others might lead to greater difficulties in social adjustment than being more uninterested in social relationships (as in the aloof subtype described by Lorna Wing22), which may lead to greater comorbidity with depressive disorders, particularly in adolescence and the transition to adult life. However, the debate whether these features characterize a subtype within HFA or personality traits is beyond the scope of this article. Data presented here are consistent with previous clinical studies23, and we extract from our results that AS meeting Gillberg’s criteria might be a highly polygenic disorder and strongly affected by other psychiatric contributions.

However, our observations do contrast with those from other recent studies15, and place AS, in terms of genetic contributions, closer to conditions as MDD or ADHD rather than to average ASD. From a nosological perspective these results may add to the discussion whether genetic contribution may help subtyping the group of disorders comprised in the term ASD55.

We have to be aware of important limitations in our study, including the small sample size of the sample and its limited phenotypic characterization. Intriguingly, we did not observe significant polygenic over-transmission of ASD common variation in either AS or Non_AS subtypes (Fig. 1). There are two main reasons that may be behind this counterintuitive result. Firstly, a bigger sample size could be necessary to increase the power of the analyses and reduce the pTDT confidence intervals’ width. Secondly, PRS prediction power in target sample strongly depends on discovery sample size, which varies among the disorders here tested (for instance: NASD_GWAS = 46,351, whereas NMDD_GWAS = 480,359). Moreover, other power dependencies as phenotypic heterogeneity of ASD discovery sample might also explain these results. In this sense, similar inconsistencies have been previously found56, pointing to ASD clinical heterogeneity as a big obstacle for explaining the expected variance in PRS analyses. A deeper phenotypic characterization of the sample (particularly within ASD without intellectual disability) could help better understanding which patients may have higher polygenic risk for comorbid conditions. Indeed, we may not discard the possibility that some forms of high functioning autism, i.e., AS meeting Gillberg criteria, or any HFA with social motivation, could have stronger polygenic psychopathological contributions57 than others.

The results herein presented of a polygenic overlap and correlated gene expression profiles between AS and other psychiatric disorders support the idea of AS being qualitatively distinct from Non_AS autism. This adds to previous evidence showing high psychopathological comorbidity (at the symptom level) between AS and SCZ, ADHD, and MDD23. The data presented here, particularly that related to comorbidity with ADHD, adds on previous elucubrations giving the possibility of a more biological-based explanation of the high psychiatric comorbidity observed in AS.