Introduction
Autism Spectrum Disorder (ASD) is a neurodevelopmental disorder caused by a combination of genetic and environmental factors resulting in a range of various phenotypes across ASD-affected individuals (Lai et al.,
2014). The US Center for Disease Control estimates that 1 in every 44 people have ASD with more males affected than females at a 4:1 ratio (Maenner et al.,
2021). While ASD can be diagnosed as early as 2 years of age, diagnosis usually occurs around the age of four (Baio,
2018). Subtraits, including repetitive behavior, challenges in social settings, and sensory issues will vary among ASD-affected individuals in their presentation and in severity (Lai et al.,
2014). Comorbidities that occur with ASD can include brain-based conditions, such as epilepsy or anxiety, as well as non-brain-based conditions, such as inflammatory bowel disease and cardiac dysrhythmia (Doshi-Velez et al.,
2014; Lord et al.,
2020; Somekh et al.,
2016). Environmental risks can include increased paternal age, birth trauma, and caesarean section delivery (Modabbernia et al.,
2017; Wu et al.,
2017a; Wu et al.,
2017b). In addition to environmental risks, many molecular variations have been associated with ASD.
Complex human phenotypes involve allelic variation across multiple genes (Geschwind,
2008; Tang & Siegmund,
2002). ASD is clearly a complex trait where as many as 1,000 genes have been associated with the phenotype as found in the Simons Foundation Autism Research Initiative (SFARI) Gene database (Abrahams et al.,
2013). Genomic features ranging in size from large DNA copy number variants (CNV) to single nucleotide variants (SNV) have been associated with the disorder (Geschwind,
2011; Lai et al.,
2014; State & Levitt,
2011). CNVs, large-scale duplications or deletions, have an impact on gene dosage and can result in haploinsufficiency or altered transcription patterns (Sebat et al.,
2007). Present in up to 10% of ASD-affected individuals, CNVs have been associated with genes SHANK2, SYNGAP1, DLGAP2, and others (Pinto et al.,
2010). These can also occur in large chromosomal regions, such as 15q11-13, 16p11.2, and 22q11.2 (Geschwind,
2011; Velinov,
2019). Single gene mutations have also been identified, some of which as a result of comorbidity, such as FMR1 for fragile X syndrome and MECP2 for Rett syndrome (Betancur,
2011). For idiopathic cases, mutations have been identified in genes associated with synaptogenesis, like neuroligins (NLGN3, NLGN4), shank proteins (SHANK2, SHANK3), and neurexins (CNTNAP2, NRXN1) (Berkel et al.,
2010; Durand et al.,
2007; Jamain et al.,
2003; Tan et al.,
2010; Tromp et al.,
2021). However, among these associations and others, no single association accounts for more than 2% of all cases (Abrahams & Geschwind,
2008). It is possible that for most ASD-affected individuals, the cause is more likely to be a network of altered genetic interactions rather than singular change with a global effect.
In addition to allelic variation in gene dosage and protein coding regions, it is likely that many human traits are modulated by variation in expression of genes within gene regulatory networks (GRNs) (Boyle et al.,
2017). One technique to detect variation in gene output is through expression quantitative trait locus (eQTL) analysis by which a DNA variant at a specific genomic position is associated with the RNA expression level of a gene in a tissue-specific context (Dimas et al.,
2009). eQTLs are identified through genome sequencing and transcriptome analysis, via microarray or RNA-seq, where RNA expression levels of each gene are tested for association with DNA polymorphisms across individuals in a defined DNA window surrounding the locus (Albert & Kruglyak,
2015). eQTLs afford a better understanding of natural variation and disease by offering a mechanistic explanation for variant alleles that segregate with a phenotype. With the publication of normal human tissue-specific eQTLs, such as those described by the Genotype-Tissue Expression (GTEx) project (GTEx Consortium,
2015), one can generate tissue-specific gene expression phenotype control hypotheses using genotype calls in affected and unaffected groups without the need of destructive sampling of human tissue.
eQTLs have previously been employed in ASD research. Sun
et al. combined ASD GWAS SNV genotypes with frontal cortex RNA gene expression to identify eQTLs with altered expression levels across 76 genes between ASD-affected and control individuals (Sun et al.,
2019). Most eQTLs were found to influence regulation due to their overlap with histone marks or an association with the methylation level of each associated gene. Other studies have uncovered multiple variants associated with expression levels of a particular gene. LoParo and Waldman looked exclusively at the oxytocin receptor gene (OXTR) and found four SNVs (three intronic and one in the promoter) in which an allele was associated with a higher risk of ASD (LoParo & Waldman,
2015). Wu
et al. found three significant SNVs associated with ASD in the transcription start site of the MEGF10 gene, which is thought to be active in modulating neural connections during development (Wu et al.,
2017a;
2017b).
By expanding polymorphism searches into intergenic genomic regions through whole genome sequencing, as opposed to exome sequencing, there are likely to be many genotype–phenotype associations to uncover that function in gene expression control. However, because of this widened scope, new techniques including machine learning may be necessary for the detection of genetic associations with ASD. Artificial neural networks (ANNs) are a machine learning technique used to detect patterns within an input of features (e.g. histology images, gene expression profiles, clinical data) and are capable of classifying labeled groups (Hyde et al.,
2019). Deep learning ANNs, especially the Multi-Layer Perceptron (MLP) model, have been shown to be more effective in predicting disease compared to other traditional methods such as logistic regression models (Yu et al.,
2019). MLPs were first used in 2001 to distinguish an individual’s specific type of small, round blue cell tumor based on their gene expression profile (Khan et al.,
2001). Because the initial application of ANNs was in image analysis (Egmont-Petersen et al.,
2002), it is not surprising that a large majority of ASD-related machine learning studies have been completed on brain imaging data (Bi et al.,
2018; Sherkatghanad et al.,
2020). A valuable resource used in many of these studies is the ASD brain imaging data exchange database, which contains structural and functional imaging data from roughly 2,000 individuals, half of which are ASD-affected (Di Martino et al.,
2017). The projects employing this data tend to have classification accuracies ranging from 70 to 90% (Bi et al.,
2018).
There are fewer studies that have used DNA sequence patterns as input to machine learning methods. Jiao
et al. used the genotypes of 29 SNVs to predict symptom severity (mild, moderate, severe) among individuals diagnosed with ASD (Jiao et al.,
2012). The SNVs fell within the introns, exons, promoters, or untranslated regions of nine genes that had previously been associated with ASD subtraits. This model was able to accurately predict the symptom severity in 67% of individuals (Jiao et al.,
2012). Another study used DNA sequencing data as input into an ANN to measure the impact of a variant on splicing patterns between individuals with and without ASD (Xiong et al.,
2015). Although the number of variants between the two groups did not significantly differ, the variants within the ASD-affected individuals were associated with more brain-expressed genes compared to controls, as well as a higher enrichment for brain development and function (Xiong et al.,
2015).
A geometric driver of genetic discovery for ASD and other phenotypes are databases filled with huge numbers of genotyped individuals who are often tagged with rich phenotypic data. The Simons Foundation Powering Autism Research (SPARK) dataset published by SFARI contains DNA sequencing and phenotypic information for 27,615 individuals, including 9,779 affected individuals and their immediate family members (Feliciano et al.,
2018). The sequencing data for each individual includes exome sequencing and whole genome genotyping performed on a saliva sample, while the phenotypic data includes an individual’s background history, a basic medical screening, a developmental coordination disorder questionnaire, a repetitive behavior questionnaire, and a social communication questionnaire (Feliciano et al.,
2018). This dataset has allowed for new insights and discoveries in the field of ASD research (Bhat,
2021; Chang & Kochel,
2020; Matoba et al.,
2020).
Additional databases contain genotyped individuals that serve to estimate allele frequencies and can be used to control for confounding population structure based on known ancestry. The Genome Aggregation Database (gnomAD) is a resource of genetic information spanning 105 million indels and 602 million SNVs produced by whole genome sequencing of 71,702 individuals (Karczewski et al.,
2020). Over 52 projects have contributed to this dataset including the 1000 Genomes Project (Auton et al.,
2015) and GTEx (GTEx Consortium,
2015). While gnomAD has published genotype summary information for many populations, GTEx has completed whole genome sequencing for a total of 818 Caucasian and African American individuals (phs000424.v8.p2) and contains individual-specific genotyping information available for analysis. Together, the gnomAD and GTEx datasets contain both population-specific and individual-specific genotype data to compare with populations of interest, like those of ASD-affected individuals (An et al.,
2020; Novelli et al.,
2020).
Another goal for GTEx was to publish tissue-specific gene expression data in relation to
cis-regulation variants. In the eighth version of the database, there are 17,382 tissue samples, 15,201 of which underwent eQTL analysis. While GTEx investigated tissues across major human organs, PsychENCODE is a genomics database that focuses solely on brain features (Wang et al.,
2018). These derived brain-specific lists include enhancers specific to the pre-frontal cortex, acetylation peaks specific to the pre-frontal cortex, temporal cortex, and cerebellar cortex, active brain eQTLs, and differential gene expression results comparing ASD-affected individuals to controls (Wang et al.,
2018). These gene regulation data available through GTEx and PsychENCODE have been used to further investigate ASD (Chau et al.,
2021; Pain et al.,
2019; Reilly et al.,
2020). Chau et al. used the BrainSpan RNA-seq dataset published by PsychENCODE to find that loss-of-function mutations were present in isoforms and were expressed at higher levels in ASD-affected individuals prenatally compared to controls (Chau et al.,
2021; Wang et al.,
2018). Others used information from GTEx, along with other datasets, to identify 14 differentially expressed genes in ASD and to find common trends in pathways and functions among ASD genes (Pain et al.,
2019; Reilly et al.,
2020).
Many large effect genetic associations have been identified in ASD-affected individuals, such as the genes SHANK3 and MECP2 as well as the CNV regions 15q11.2 and 16p11.2 (Abrahams et al.,
2013). However, these large effects can only be identified in roughly 20% of the ASD population (Lord et al.,
2020). It is possible that for most ASD-affected individuals, the disorder stems from the accumulation of many small effect changes (Klei et al.,
2012). eQTLs are one small effect genetic feature that has not been fully investigated on a genome-wide scale in relation to ASD. In this study, we used tissue-specific eQTLs as our search space for genetic associations with ASD. While some studies have investigated eQTLs surrounding specific genes, we have not limited our search to genes already associated with the brain or with ASD (LoParo & Waldman,
2015; Wu et al.,
2017a;
2017b). After discovering eQTL alleles associated with ASD, we employ machine learning models to test their classification accuracy.
Results
In this study we were interested in finding alleles associated with ASD at eQTL loci. An overview of the workflow can be seen in Fig.
1. All tissue-specific eQTLs were downloaded from GTEx (N = 1,207,976 eQTLs), however only those that met the significance threshold recommended by GTEx (qval < 0.05) were retained for further analysis (Table
1). Of the 475,829 eQTLs that met the significance threshold, 98,934 were associated with a brain tissue (Table
1). The variants associated with the 475,829 eQTLs were extracted and used for the following analysis.
The primary analysis was organized into a Discovery stage and a Classification stage. In the Discovery stage, variants with significantly different allelic distributions between ASD-affected and ASD-unaffected individuals were discovered. In the Classification stage, we used tSNE and a neural network to observe the classification of ASD-affected and ASD-unaffected individuals into their respective groups based on the genotype profiles at the significant variants. The Discovery stage was composed of two experiments: investigating the SPARK and GTEx datasets as well as the SPARK and gnomAD datasets. The SPARK dataset was made up of 360 ASD-affected individuals. For the controls, one experiment used 360 GTEx ASD-unaffected individuals and the other used 37,543 gnomAD ASD-unaffected individuals. From the SPARK and GTEx datasets the 360 individuals were chosen at random, and only one member of each family was chosen to avoid familial substructures. In the gnomAD dataset, no individual-specific data was available, so the population-specific genetic data for each variant was used.
Once the experimental and control groups were determined, the variants from the GTEx eQTLs were compared to the genomic information available in each control dataset. For the SPARK and GTEx datasets, variants had to have genotype calls for at least 90% of the Discovery individuals to be considered for further analysis. The SPARK-gnomAD experiment was able to investigate 12,933 eQTL-associated variants, while the SPARK-GTEx experiment investigated 12,789 eQTL-associated variants (Table
1).
For each experiment, every variant underwent a Fisher’s Exact Test to compare the number of reference and alternate alleles between the ASD-affected and ASD-unaffected individuals. The variants were then ranked by their p-values from lowest to highest. As each variant-ASD association test represents a single hypothesis, we corrected for false discovery using the Benjamini–Hochberg procedure and a false discovery rate of 1%. For each variant, the critical value was calculated (critical value = (rank/total number of variants) * 0.01). We then moved down the ranked variant list to find the last variant where the p-value was less than the critical value. The variant information (including the reference allele counts, alternate allele counts, ranks, p-values and critical values) for the SPARK-GTEx experiment and SPARK-gnomAD experiment is also available (Additional File 2: Supplemental Table 1 and Supplemental Table 2).
The 34 significant variants discovered in the SPARK-GTEx experiment were then compared to the 83 significant variants from the SPARK-gnomAD experiment. There were 30 variants that were found to be significant among both experiments (Additional File 2: Supplemental Table 3). These were deemed to be the
significant variants. These significant variants were then compared to the 475,829 GTEx eQTLs (with a q-val < 0.05) to find their gene and tissue associations. The significant variants aligned with a total of 174 tissue-specific eQTLs (Table
1, Additional File 2: Supplemental Table 4), representing 34 genes and 49 tissues. The 174 associated eQTLs were then known as
significant eQTLs, and the 34 associated genes as
significant genes. The significant genes and their tissue associations can be seen in Table
2. The expression levels of the significant genes across multiple brain tissues can be seen in Fig.
2. An eQTL summary by tissue can be seen in Additional File 2: Supplemental Table 5.
We were interested in any known brain associations among the significant eQTLs. PsychENCODE has published a list of active brain enhancers, lists of acetylation peaks specific to the pre-frontal cortex (PFC), cerebellar cortex (CBC), and temporal cortex (TC) areas of the brain, a list of active brain eQTLs, and a list of genes differentially expressed in ASD (Wang et al.,
2018). These associations were downloaded and compared to the list of significant variants and significant genes. Fourteen of the significant genes had an association of interest. They were associated with a variant that fell within a brain regulatory region or were associated with a brain or endocrine tissue through GTEx. These genes and their relevant associations can be seen in Table
3.
To further investigate the classification accuracy of the genotype profiles of the significant variants, we used tSNE and a MLP neural network in the Classification stage of the study. The Classification stage consisted of three subsets of individuals: the Caucasian subset used in the Discovery stage, a new African American subset, and a new Caucasian subset. For this stage all the ASD-affected individuals came from the SPARK dataset while all the ASD-unaffected individuals came from the GTEx dataset. The Caucasian Discovery subset was made up of the same individuals used earlier (180 ASD-unaffected females, 180 ASD-unaffected males, 180 ASD-affected females, 180 ASD-affected males), while the African American Classification subset (8 ASD-unaffected females, 40 ASD-unaffected males, 8 ASD-affected females, 40 ASD-affected males) and the Caucasian Classification subset (60 ASD-unaffected females, 60 ASD-unaffected males, 60 ASD-affected females, 60 ASD-affected males) introduced new individuals. Two new subsets were included to verify that the significant variants found in the Discovery stage were not solely an artifact of the random group of Caucasian individuals in the Discovery stage but could also apply to a separate group of Caucasians as well as an additional population.
First, tSNE was used to visualize how individuals from the Caucasian Discovery subset segregated based on their genotypic profiles at the significant variants and all variants tested. In Fig.
3A, the genotype profiles at the significant variants were able to segregate ASD-affected (blue dots) and ASD-unaffected individuals (orange dots) apart from a few GTEx outliers that clustered with the SPARK individuals. However, this was not true when considering the genotype profiles at all variants tested as seen in Fig.
3C. To ensure that the individuals were not segregating by sex, sex labels were used in Fig.
3B and D: males (green dots) and females (purple dots). In Fig.
3B, there are no distinguishable clusters based on sex, which showed that the two main clusters in Fig.
3A were not sex related. In addition to sex labels, labels based on ASD severity can be found in Additional File 1: Supplemental Fig.
1. We observed that when reducing the number of variants to only those found to be significant, tSNE was able to separate individuals based on their genotype profiles and therefore their ASD status.
The tSNE approach employed above was used similarly for the African American and Caucasian Classification subsets. In Fig.
4 the distribution of individuals based on the genotype profiles at the 30 significant variants and 30 random variants can be seen. For both populations the genotype profiles at the significant variants (Fig.
4A and B) were able to separate ASD-affected and ASD-unaffected individuals better than the same number of random variants (Fig.
4C and D). The separation of African American ASD-affected and ASD-unaffected individuals in Fig.
4B was not as distinct as the Caucasian individuals in Fig.
4A. This could possibly be attributed to the lesser number of African American individuals available within the SPARK dataset.
The matrices used to create the tSNE plots in Figs.
3 and
4 were also used with a MLP neural network to observe how well it could classify individuals into ASD-affected and ASD-unaffected groups based on their genotype profiles at relevant loci (Fig.
5). Across all three groups of individuals (Caucasian Discovery, Caucasian Classification, and African American Classification) the classification accuracy based on the genotype profiles of the 30 significant variants was higher (ranging from 90 to 100%) than that of 30 random variants or all variants tested (accuracy ranging from 30 to 70%). The random and complete variant lists also have more sporadic fluctuations in accuracy. Nonetheless, their average classification accuracy centered around 50%, which is expected for a random classification of individuals into two groups.
Finally, we tested if the significant variants found using GTEx and gnomAD as control datasets would also be discovered if we used ASD-unaffected siblings from the SPARK dataset as the control. This is referred to as the secondary analysis in this study, an overview of which can be seen in Fig.
6. The same Caucasian Discovery subset was used in conjunction with 360 Caucasian ASD-unaffected siblings from the SPARK dataset (180 ASD-unaffected males, 180 ASD- unaffected females). While these ASD-unaffected controls were siblings of ASD-affected probands, there were enough individuals within the dataset to ensure that there were no familial relations within or between the ASD-affected and ASD-unaffected groups. The same approach was taken for this experiment as the initial Discovery stage of the study. For each variant investigated, 90% of individuals from the SPARK ASD-affected and SPARK ASD-unaffected subsets had to have a genotype call. A Fisher’s Exact Test was used at each variant to compare the allelic distributions of the ASD-affected and ASD-unaffected groups. Between these two groups, no significant variants were discovered after Benjamini–Hochberg correction. A tSNE plot was used to compare the segregation of SPARK ASD-affected, SPARK ASD-unaffected, and GTEx ASD-unaffected individuals at the 30 significant variants found in the earlier experiments (Fig.
6). The plot shows all SPARK individuals clustering together, while the GTEx ASD-unaffected individuals segregate separately except for two individuals (seen as two blue dots among the yellow and green).
Discussion
In this study, we set out to identify variants at eQTL-associated loci with significantly different allele distributions between ASD-affected and ASD-unaffected individuals. Using controls from non-ASD datasets, 30 significant variants associated with 34 genes were identified. Interestingly, when using unrelated ASD-unaffected siblings of ASD-affected individuals, no variants had significantly different allele distributions. When visualizing the distribution of ASD-affected and ASD-unaffected individuals from the Caucasian Discovery subset, Caucasian Classification subset, and African American Classification subset in Figs.
3 and
4, ASD-affected and ASD-unaffected individuals were seen to easily segregate based on their genotype profiles at the 30 significant variants identified in the Discovery stage. It is unlikely that the ASD-affected and ASD-unaffected groups are separating based on any possible dataset bias. In Fig.
3A, several GTEx ASD-unaffected individuals (blue dots) separated with the SPARK ASD-affected individuals (orange dots) when analyzing the genotype profiles from the significant variants. If there were ascertainment bias (Lachance & Tishkoff,
2013), we would expect to see separation of the datasets in Figs.
3C,
4C, and D as well when observing the distribution of individuals at all variants tested or 30 random variants. It is possible that the few GTEx ASD-unaffected individuals that separated with the ASD-affected SPARK individuals (seen in Figs.
3,
4,
6, and Additional File 1: Supplemental Fig. 1) have ASD-affected relatives. That information was not included in the phenotypic surveys for the GTEx dataset but could be examined in future studies.
Another interesting characteristic found in several of the tSNE plots was the formation of clusters within the ASD-affected and ASD-related groups at significant variants. In Figs.
3,
6, and Additional File 1: Supplemental Fig. 1 there are three sub-clusters when observing individuals from the SPARK dataset. We tested to see if this sub-cluster formation was due to gender (Fig.
3B) or to ASD severity (Additional File 1: Supplemental Fig. 1), however neither trait segregated with the clusters. These clusters were also not distinctly present in Fig.
3C (when observing all variants) or Fig.
4 (when using the Discovery subsets). Because they were not present when using genotype profiles at all variants, it is likely something specific about these significant 30 variants that results in the formation of the sub-clusters. They were not seen in Fig.
4, but it is possible this was because of the fewer number of individuals in the Classification subsets, as there were 120 Caucasian ASD-affected individuals and 48 African American ASD-affected individuals in the Classification subsets, compared to 360 individuals in the Discovery subset. This is reinforced by the slight cluster formation seen in Additional File 1: Supplemental Fig. 1, which contained 90 less severe ASD-affected individuals and 90 more severe ASD-affected individuals. Therefore, the sub-cluster formation among ASD-affected individuals at the significant variants seems to become more distinct as the number of individuals increases. This is typical of tSNE visualization, where the patterns become clearer as the number of individuals increases. The sub-cluster formation at the significant variants can also be seen when SPARK ASD-related individuals were added in. In Fig.
6, the ASD-related individuals are segregating into sub-clusters along with the ASD-affected individuals, while the GTEx ASD-unaffected individuals do not. We hope to further investigate any other possible associations with these sub-clusters in future studies.
The datasets that underwent tSNE visualization were also subject to a neural network analysis where a machine learning model predicted if an individual was ASD-affected or ASD-unaffected based on their genotype profile. The model was able to classify individuals with a high accuracy when inputting the genotypes for the significant variants for all three groups tested: Caucasian Discovery subset, Caucasian Classification subset, and African American Classification subset. When using the genotypes for all variants or random variants, the classification accuracy was much lower, fluctuating between 30 and 70%. For these results, the neural network predictions seem to become more stable as more individuals are added. For example, the Caucasian Discovery subset has the most individuals (720 total) and the most consistent trends in the classification accuracy, while the African American Classification subset has the least individuals (96 total) and the least consistency in classification accuracy when using genotypes at 30 random variants as input.
In the secondary analysis all variants were tested to find those that had significantly different allele distributions between SPARK ASD-affected and SPARK ASD-unaffected individuals, but no variants were identified as significant. To visualize this, a tSNE plot was used for the genotype profiles at the significant variants of the primary analysis for 360 Caucasian SPARK ASD-affected, 360 Caucasian SPARK ASD-unaffected, and 360 Caucasian GTEx ASD-unaffected individuals. In Fig.
6, the SPARK ASD-unaffected individuals can be seen segregating with the SPARK ASD-affected individuals into sub-clusters rather than with the GTEx ASD-unaffected individuals. The SPARK ASD-unaffected individuals were siblings of SPARK ASD-affected individuals that were not included in these analyses. However, if we can assume that the SPARK ASD-affected siblings of the SPARK ASD-unaffected group follow similar genotypic trends as those in the Discovery and Classification subsets, then it is fair to say that the ASD-unaffected siblings are more likely to follow the ASD genotype pattern because of the inherent familial relationships compared to unrelated controls. It is possible that the significant variants identified in this study contribute to the hypothesis that the accumulation of otherwise common variants can also confer ASD risk. Previous research has also shown that ASD-unaffected siblings of ASD-affected siblings can still show milder symptoms of ASD, even as high as 15% of siblings exhibiting a ‘broad autism phenotype’ (Bolton et al.,
1994; Gamliel et al.,
2007; Losh et al.,
2009). Nonetheless, there seems to be no significant genetic differences between ASD-affected individuals and unrelated ASD-unaffected siblings at the variants we have identified.
The significant variants in this study were associated with 174 tissue-specific eQTLs derived from the GTEx database. Based on the log2aFC information of each variant and the allele distributions between ASD-affected and ASD-unaffected individuals, the direction of expression in ASD can be determined. Of the 174 eQTLs, 96 had higher gene expression in their associated tissue compared to 78 eQTLs with lower gene expression (Additional File 2: Supplemental Table 4). When looking solely at the 27 eQTLs associated with a brain tissue, 8 had lower gene expression in ASD individuals while 19 had higher gene expression (Table
1, Additional File 2: Supplemental Table 4). CES1 and MAN2C1 had the most GTEx brain associations. CES1, a drug metabolizer, has been shown to have significantly different mRNA levels in peripheral leukocytes of ASD-affected individuals and unrelated ASD-unaffected mothers of ASD-affected individuals, compared to controls (Johnson et al.,
2013; Kuwano et al.,
2011). Microdeletions in the area surrounding MAN2C1 have been linked to gene expression of MAN2C1 in the brain, a lower verbal and non-verbal IQ, and ASD (Cáceres et al.,
2016; McInnes et al.,
2010; Roetzer et al.,
2010).
When comparing the significant eQTLs to data published by PsychENCODE, more brain related associations were found (Table
3). For example, POLR1B is associated with a variant that falls into an acetylation peak that is active in the PFC, TC, and CBC. POLR1B is a target of a microRNA (miR-365a-3p) that has been shown to be upregulated in ASD-affected individuals (Kichukova et al.,
2016). The variant associated with SIRPB1 falls within a PsychENCODE enhancer and has been identified as an eQTL in the brain. SIRPB1 was deleted in a case study of monozygotic ASD twins, and has been linked with impulsive-disinhibited personality, meaning the individuals struggle to control their impulses, which has been implicated in ASD (Hill,
2004; Laplana et al.,
2014a,
2014b; Laplana et al.,
2014a,
2014b). One other example of an association with PsychENCODE can be found in the CIDEA gene, which was identified by our study and was shown to be significantly down-regulated in ASD-affected individuals by PsychENCODE. Additionally, CIDEA has shown to be down-regulated in obesity and associated with ADHD, both of which often appear in tandem with ASD (Cortese & Vincenzi,
2012; Croen et al.,
2015; Kweon et al.,
2018; Lee & Ousley,
2006; Nordström et al.,
2005). In total, the eQTLs described here are evidence of tissue-specific regulation across both brain and non-brain tissues. While it is likely that neurological traits originate in the brain, ASD-affected individuals share common non-neurological traits as well. It is possible that eQTLs are one avenue in which a variant can result in changes of gene expression across multiple tissues, like can be observed for genes CES1 and MAN2C1 in this study. The significant variants and eQTLs discovered here will need to be further investigated to elucidate if a global regulatory variant and its consequent tissue-specific changes could be linked to ASD as a possible mechanism.