Introduction

Autism is a severe neurodevelopmental disorder and one of the most heritable neuropsychiatric syndromes, with a male to female ratio of 4:1.1 The diagnosis of autism is based on impairments in reciprocal social interaction and communication, and restricted and stereotyped patterns of interests and activities, with abnormal development apparent within the first 3 years of life.2 Autistic spectrum disorders (ASDs) include: autistic disorder, childhood disintegrative disorder, pervasive developmental disorder-not otherwise specified, Asperger syndrome and Rett syndrome. The prevalence of childhood autism is estimated at 38.9 per 10 000 with the total prevalence for all ASDs at 116.1 per 10 000.3

There is substantial evidence from twin and family studies4, 5, 6, 7 to support the involvement of genetic factors in ASDs. Statistical modelling of the pattern of inheritance in the idiopathic form of the disorder implicates 3–15 genes, suggesting an oligogenic mode of inheritance.8, 9

The contribution of chromosomal structural variation to differences in genetic makeup within the human population has recently been highlighted.10 Advances in technology have enabled the screening of large cohorts of individuals for copy number variants (CNVs) genome-wide. A number of such studies have been published,11, 12, 13, 14, 15, 16 which have found that CNVs appear to have a larger causal role within autism than previously thought.

Several CNVs have been found in SHANK3 (SH3 and multiple ankyrin repeat domains protein 3, also known as ProSAP2). SHANK3 belongs to the family of SHANK proteins, which were first identified in the rat17 and subsequently in humans.18 These proteins are believed to function as molecular scaffolds in the post-synaptic density (PSD)19 of excitatory synapses and aid in the assembly of protein complexes within cells by containing multiple binding domains for protein–protein interaction.20 SHANK3 is located on chromosome 22q13.3, spans approximately 58 Kb of genomic sequence, and contains 24 exons, 7 of which are alternatively spliced,21 which may influence the spectrum of SHANK-interacting proteins at the PSD. Its expression pattern in the brain is confined to the cortex, hippocampus and cerebellum.20

SHANK3 was first implicated in the field of neuropsychiatric disorders when a patient with terminal 22q13.3 deletion syndrome was found to have a de novo reciprocal translocation that disrupted the gene.18 22q13.3 deletion syndrome includes the phenotypes of severe expressive language delay, severe/profound mental retardation and at times autism. Since the initial report, a number of studies22, 23, 24, 25 have described further cases of 22q13.3 deletion syndrome with disruption or deletion of SHANK3. This finding has led to the hypothesis that haploinsufficiency of SHANK3 may cause the behavioural phenotypic consequences of 22q13.3 deletion syndrome.

Owing to its emerging role in neuropsychiatric disorders and the overlap of phenotypes between autism and 22q13.3 deletion syndrome SHANK3 was further analysed in patients with ASDs. Durand et al21 found three separate families containing disruptions or deletions of SHANK3 that were not identified in controls. Interestingly, each individual with either a deletion of SHANK3 or a truncated protein had severe language problems, whereas the individual with 22qter partial trisomy had precocious language development and fluent speech, but social communication difficulties.21 Subsequent studies have discovered de novo deletions and mutations of SHANK3 in individuals with autism, 11, 13, 26, 27 adding support to the role of the gene within the autistic phenotype.

Given the emerging evidence for SHANK3 haploinsufficiency in some individuals with autism, DNA samples from 319 multiplex families and 11 trios, from the International Molecular Genetic Study of Autism Consortium (IMGSAC) multiplex cohort, were analysed for CNVs and SNP association across this locus. DNA samples from 76 IMGSAC probands with autism from Italian singleton families were also investigated for CNVs.

Materials and methods

Subjects

Multiplex families were identified, collected and assessed by IMGSAC as described earlier.28, 29 In short, after passing an initial screen, parents were administered the Autism Diagnostic Interview-Revised (ADI-R)30 and the Vineland Adaptive Behavior Scales.31 Probands were assessed using the Autism Diagnostic Observation Schedule-Generic (ADOS-G),32 and a medical examination was carried out to exclude cases of known aetiology, for example tuberous sclerosis. IQ was assessed using the Ravens Progressive Matrices coloured33 or standard and the Picture Vocabulary Test BPVS34 or PPVT.35 Karyotyping was performed when possible on probands and molecular genetic testing for Fragile X syndrome on one affected individual in each family. Written informed consent was obtained from all parents/guardians or, where appropriate, the proband. The relevant ethical committees approved the study. A blood sample was taken from all affected individuals and first-degree relatives and a lymphoblastoid cell line created. If blood was unavailable a buccal swab was taken instead. DNA was extracted from blood, cell lines and buccal swabs by means of a DNA purification kit (Nucleon BACC2 Blood and Cell Culture DNA purification kit, Tepnel Life Sciences, Manchester, UK). A total of 319 IMGSAC multiplex families and 11 IMGSAC trios were genotyped, with a male/female ratio of 4.3:1 in probands. Seventy-six IMGSAC Italian probands from a singleton family collection, who all meet IMGSAC criteria for inclusion, were also screened for distal chromosome 22q CNVs by multiplex ligation-dependent probe amplification (MLPA).

SNP genotyping

CEU genotypes for the region on chromosome 22q13.3 covering the SHANK3 gene, including 5 kb of sequence upstream (chromosome 22: 49 398 550–49 464 810 bp, NCBI Build 35), were downloaded from the HapMap phase II (release 21). Fifteen haplotype-tagging SNPs were chosen across SHANK3 using Tagger in Haploview v4.0beta1236 (aggressive tagging, r2 > 0.8, MAF > 0.05), that tagged in total 23 SNPs (including themselves), of which 95.7% of alleles were captured with an r2>0.8. DNA was quantified using the Picogreen dsDNA Quantitation Kit (Invitrogen). A total of 40 ng of genomic DNA was genotyped using the Sequenom MassARRAY i-PLEX Gold assay (Sequenom, San Diego, USA). Two control DNA samples were included on each plate to check for inter-plate reproducibility. SNP genotype calls were verified visually using the Sequenom MassARRAY Typer software (version 3.1.4.0). Merlin error checks37 were performed to identify unlikely genotypes within the data set and Pedwipe used to remove them from the pedigree file. Genotypes were uploaded to the Integrated Genotyping System.38

Statistical analysis of SHANK3 genotypes

Hardy–Weinberg equilibrium was verified for the SNPs genotyped, in all individuals and in unrelated individuals, using PEDSTATS.39 All families were checked for Mendelian inconsistencies by PedCheck40 and two subsequently removed because of bad inheritance found in the SHANK3 SNPs and those of another gene genotyped simultaneously, thus reducing the likelihood of detection of de novo deletions. Association analysis was carried out on the IMGSAC Caucasian families. STATA (StataCorp: Stata Statistical Software Release 7.0, Stata Corporation: College Station, TX, USA) was used to perform single locus transmission disequlibrium test (TDT)41 association analysis, taking into account pedigrees, and case/pseudo–control analysis42 that uses untransmitted genotypes as pseudocontrols. Genotypes were constructed for three ‘pseudo–controls’, which consist of the other possible genotypes that the offspring could have received from their heterozygous parents. Two haplotype-tagging SNPs together tested another SNP and so TDT was also performed for this haplotype within Haploview.

Fluorescence in situ hybridisation

The extent of SNP homozygosity across SHANK3 was examined as a potential indicator of hemizygosity. Overall, 53 individuals (23 affected individuals, 25 parents and 5 unaffected siblings) were selected for analysis by FISH (fluorescent in situ hybridisation), as they were homozygous for all 14 haplotype-tagging SNPs successfully genotyped. One individual who was homozygous at only two SNPs was used as a control. Chromosome preparations for FISH analysis were obtained from the EBV-transformed lymphoblastoid cell lines following standard techniques. Two fosmid probes kindly provided by Dr. Stephen Scherer, G248P86064C8 and G248P86149G7, were labelled with Digoxigenin-11-dUTP (Roche) by nick-translation (Vysis kit) and separately co-hybridised with an FITC-labelled ‘paint’ for chromosome 22 (Cambio), following standard procedures. The fosmid probe signals were detected with Rhodamin anti-Digoxigenin (Roche), and the chromosomes were counterstained with DAPI in Vectashield (Vector Laboratories). The slides were examined using an Olympus BX-51 epifluorescence microscope coupled to a Sensys charge-coupled device camera (Photometrics). At least 20 informative cells were analysed for each hybridisation experiment.

MLPA

Mulitplex ligation-dependent probe amplification reactions using probe mix SALSA P188 MLPA KIT 22q13 Lot 0407, 0906 were performed on 100 ng genomic DNA from 94 IMGSAC multiplex affected individuals, 76 Italian singleton affected individuals and one positive control individual harbouring a known SHANK3 deletion24 of 100 kb (DNA kindly provided by Dr. Maria Clara Bonaglia). The P188 kit and protocol43 were supplied by MRC-Holland (Amsterdam, The Netherlands). PCR products were separated on a 96-capillary Applied Biosystems 3700 DNA Analyzer. Peak size was verified using Genotyper 2.0 (Applied Biosystems, Foster City, USA). Coffalyser software v844 was used to analyse the MLPA data for CNVs. Bin sizes were adjusted accordingly for the peak sizes observed. Data were normalised by division of each probe's peak area by the average peak area of the seven control probes in the probe mix obtained from the sample set. The normalised data were then divided by the median peak area of all samples to obtain an indication of copy number variation for each probe. A figure of 0.7 or below and 1.3 and above were set as thresholds for losses and gains, respectively. Two consecutive probe deletions or duplications were necessary for further investigation.

The affected IMGSAC individuals were chosen for analysis through MLPA by identifying runs of consecutive homozygosity in the 14 successfully genotyped haplotype-tagging SNPs. All affected individuals were ranked according to the number of consecutive homozygous calls and the 94 individuals with the largest number selected. This sample included 18 individuals also tested by FISH and 76 others who contained between 8 and 14 homozygous calls. Only one affected individual per multiplex family was chosen; preferentially IMGSAC Case Type One (clinical diagnosis of autism, meets ADI-R and ADOS or ADOS-G criteria for autism, history of language delay, and performance IQ of >35).29 The affected individuals in total comprised 51 Case Type One, 30 Case Type Two, 2 Case Type Three and 11 Case Type Four (See Supplementary Table 1 for Case Type description). Among the affected Italian individuals, 80% were either Case Type One or Two. Overall, the probands selected were of a high functioning nature and met strict criteria for autism.

Results

SNP genotyping

Fourteen haplotype tagging SNPs were successfully genotyped on the Sequenom system, all of which conformed to Hardy–Weinberg equilibrium (P>0.001). As only 14 of the 15 selected haplotype-tagging SNPs were successfully genotyped, there was a reduction in tagging efficiency, with 91.3% of alleles being captured with an r2>0.8 rather than 95.7%. Sample and SNP genotyping success rates were 99 and 93%, respectively, and inter-plate reproducibility 99.5%.

Association analysis of SHANK3 SNPs

Family-based association analysis was carried out using the STATA package on a subset of the samples genotyped, comprised of Caucasian individuals, which included 297 IMGSAC multiplex families and 11 IMGSAC trios. Single SNP TDT and a case/pseudo–control approach were performed and no association was found (Table 1).

Table 1 Results of single SNP TDT and case/pseudo–control analysis in Caucasian families for the 14 haplotype-tagging SNPs genotyped across SHANK3

Copy number analysis by FISH mapping

Fluorescent in situ hybridisation was used to test for chromosome rearrangements in individuals identified by the SNP genotyping as displaying potential hemizygosity in SHANK3. Out of the 53 individuals, 44 respective lymphoblastoid cell lines were tested, consisting of 21 affected individuals (16 male and 5 female), 12 fathers, 6 mothers and 5 unaffected siblings. Eight individuals did not have a cell line available and one cell line failed to grow. A sample that was heterozygous at 12 out of 14 SNPs in SHANK3 was also probed as a negative control.

Two fosmid clones, G248P86064C8 and G248P86149G7, were used as FISH probes across a region of 64 kb covering the SHANK3 gene to search for deletions in the selected IMGSAC samples. Of the 45 lymphoblastoid cell lines probed, including the control, no deletions were identified.

MLPA analysis

Multiplex ligation-dependent probe amplification was used as a second method to look for smaller deletions across SHANK3 not detectable by FISH. The 37 probes within the P188 probe set cover a larger distance than the fosmids, approximately 19 Mb, enabling characterisation of potential deletions. MLPA is also a higher throughput technique than FISH; therefore, a greater number of individuals could be analysed simultaneously.

Ninety-four affected individuals from IMGSAC multiplex families, seventy-six affected individuals from IMGSAC Italian singleton families and one positive control were analysed using MLPA (Figure 1). Two individuals were removed as they had SDs for the copy number ratios above 0.25, which was used as the cutoff for this sample. All probes performed accurately apart from two within the gene MLC1, MLC1 Probe 6335-L5910 and MLC1 Probe 6338-L5913, which have been removed from Figure 1. No individual had two or more consecutive probes showing deletions or duplications within SHANK3, apart from the positive control, which showed all seven probes deleted from exon 9 in SHANK3 to the telomere. Three affected individuals exhibited two probes duplicated consecutively and one affected individual had two probes deleted; however, these events were all more than 150 kb proximal to SHANK3.

Figure 1
figure 1

Copy number analysis using MLPA on the region chr22q12.2-q13.33 in 168 autistic probands from IMGSAC multiplex families and Italian singleton families. The thresholds for gains and losses are marked as 1.3 and 0.7, respectively. The positive control sample clearly shows the loss of SHANK3, ACR and RABL2B. Graph not to scale. For a complete list of the probes and their positions see MRC-Holland probe mix P188 lot 0407, 0906.

Discussion

The IMGSAC multiplex family collection was analysed for association and CNVs within SHANK3, using SNP genotyping, FISH and MLPA analysis. The SNP, fosmid and MLPA probe positions are shown in Figure 2, comprehensively covering the gene SHANK3. A smaller group of affected individuals from an Italian cohort of IMGSAC singleton families was also analysed by MLPA for CNVs in SHANK3.

Figure 2
figure 2

Positions of SNPs, fosmids and MLPA probes within SHANK3 (UCSC May 2004, NCBI Build 35). Exons annotated as those in the study by Durand et al.21

The association analysis results from the SNP genotyping in the IMGSAC families presented here were all nonsignificant. Nevertheless, no previous reports of association with SHANK3 have been published, and one would not expect to observe association of a rare deletion, if only present in a small number of individuals. Sequence analysis was not conducted, and so rare mutations potentially could have been missed within SHANK3, such as those found in previous studies.21, 26, 27 The sample size of the IMGSAC cohort of families is relatively small in comparison with larger genome-wide association and CNV detection studies that are currently being carried out,12, 16 and so this must also be taken in to consideration.

The SNP genotyping was used as a crude indication of potential hemizygosity across SHANK3 within the IMGSAC multiplex families. Unfortunately, the Sequenom system was unable to provide quantitative information about signal intensity from the genotype calls, and so the distinction between a true homozygous call and a hemizygous call could not be distinguished by simple analysis of an output file. Therefore, those individuals homozygous for all haplotype-tagging SNPs across the region were probed using FISH.

The two probes used for FISH are each approximately 40 kb in length, and so if the deletion occurred within the probe sequence, but still allowed the probe to hybridise, a fluorescent signal would still be seen and a deletion missed. To overcome this problem, MLPA was used to potentially identify smaller aberrations. This technique relies on hybridisation of probes on either side of a target sequence and subsequent ligation that permits amplification. The amount of PCR product, dependent on ligation, indicates copy number. No CNVs within SHANK3 were identified by this technique. This finding could have been due to the method of selection of the affected individuals tested; 94 individuals with the most consecutive homozyogous SNPs. Individuals with just a few homozygous calls at either end of SHANK3 were not included in the analysis, but could be representative of a deletion extending outwards from the extremities of the gene. This method of choosing affected individuals preferentially selected for deletions and so duplications may have been missed, whereas the Italian singleton sample was not preselected, hence equally likely to harbour deletions or duplications. Four affected individuals did demonstrate events of two consecutive probes duplicated or deleted, positioned at greater than 150 kb proximal to SHANK3, these regions were deemed not to be directly affecting the expression of the SHANK3 gene when interrogated by the use of an expression database (mRNA_bySNP_browser_v1.0.1.exe45).

The majority of IMGSAC families analysed were multiplex in origin. Recent reports have suggested that the prevalence of de novo CNVs in singleton families is between 7 and 10%, whereas in multiplex families it is 2–3%.11, 13 As the majority of SHANK3 CNVs identified are de novo in origin, this would decrease the probability of detecting an event within our multiplex family sample set, particularly as no Mendelian errors were found during the SNP genotyping. Although the Italian individuals that were tested by MLPA originated from singleton families, no CNVs were identified within these affected individuals.

Another reason for the lack of CNVs identified within this study could be because of the inclusion criteria used to select IMGSAC multiplex and singleton families. The majority of affected individuals are higher functioning and meet a more narrowly defined diagnosis of autism than those that have been reported with SHANK3 CNVs in previous papers.21, 26 Seventy six percent of the IMGSAC affected individuals assessed by MLPA had performance IQ data available with a mean of 78 (SD=30), whereas severe mental retardation was described for some autistic individuals with SHANK3 aberrations by Durand et al.21

SHANK3 is a binding partner of the neuroligins,46 which in turn have ligands in the form of the neurexins. Both of these protein families contain genes within which rare mutations have been identified in autism; NLGN347, 48 and NLGN4,47, 48, 49 CNTNAP250 and NRXN1.12, 51 Nevertheless, coding mutations in NLGN3 and NLGN4 have not been found to be associated with autism in IMGSAC families,52 and no CNVs have been identified in SHANK3 using the methodology described here, although sequence analysis may have detected rare variants. Despite this negative finding, a number of genes within the synaptic network could still be harbouring mutations, which could lead to a similar phenotypic end point. Further analysis of these genes for CNVs within the IMGSAC cohort is warranted.