Introduction

The defining features of major depressive disorder (MDD) are marked and persistent depressed mood, associated with physical and cognitive signs and symptoms (anhedonia, insomnia or hypersomnia, appetite and weight changes, psychomotor agitation or retardation, fatigue or loss of energy, excessive guilt or worthlessness, poor concentration or indecisiveness, and recurrent thoughts of death or suicide). MDD is distinct from ‘normal’ sadness by its persistence (ie, ≥2 weeks), the presence of accompanying signs and symptoms, and its associated impairment. The definition of MDD excludes other disorders that can have substantial depressive symptoms. These include bipolar disorder, schizophrenia, schizoaffective disorder, and depressive symptoms resulting from another disorder (eg, medical condition like hypothyroidism).1

Combining available estimates, the lifetime prevalence of MDD is ∼15% and is twofold higher in women than in men.2 The course of MDD is typified by recurrence of illness. In a meta-analysis, 76% had ≥1 recurrence over a 10-year follow-up.3 MDD is associated with large morbidity (greater than chronic medical conditions such as diabetes and arthritis),4, 5, 6 excess mortality from suicide and other causes,7, 8, 9, 10 and substantial direct and indirect costs (>$43 billion/year in the US).11 The WHO projected MDD to be the second leading cause of disability worldwide by 2020.12 Thus, MDD is a first-rank public health problem.

It has been firmly established that variation in MDD liability is in part genetic.13 Family studies find a significantly higher lifetime prevalence of MDD in biological relatives of MDD probands (pooled odds ratio=2.84, 95% CI=2.31–3.49). Twin studies have established that the familial component reflects genetic vulnerability rather than shared environmental risk. The meta-analytic heritability estimate is 37% (95% CI=31–42%) with a minimal contribution of environmental effects common to siblings (0, 95% CI=0–5%), and substantial individual-specific environmental effects/measurement error (63, 95% CI=58–67%).

Despite the evidence for heritability of MDD, as with most complex traits, identification of vulnerability genes has not been yet very successful. Results from studies of first-stage genome-wide linkage scans for MDD or related personality traits are summarized in Figure 1. One study was of MDD,14 two of recurrent MDD,15, 16 and three of recurrent, early-onset MDD.17, 18, 19 Five studies investigated quantitative traits related to MDD, that is harm avoidance20 and neuroticism.21, 22, 23, 24 Two studies were secondary analyses of cohorts selected for alcoholism14 or nicotine dependence.23

Figure 1
figure 1

First-stage genome-wide linkage scans for MDD or related personality traits (neuroticism and harm-avoidance). The x-axis shows the human genome from 1ptel to 22qtel. Each row shows the findings for that study. Studies of bipolar disorder, candidate region linkage studies, fine-mapping results, and statistical analyses of epistasis or studies that included covariate models were excluded. The width of the coloured bars indicates the genomic physical distance/region implicated by a particular study. LOD/equivalents are plotted. Key: red LOD≥3, green LOD≥2, blue LOD≥1.5, grey otherwise.

Results from linkage studies of MDD and associated personality traits are summarized in Figure 1. Regions on chromosomes 1, 4, 5, 7, 8, 11, 12, and 13 show a significant linkage signal (LOD>3) in at least one study. However, taken together these studies do not indicate a large degree of replication. This lack of success so far, may be due to the relatively small sample sizes of most linkage studies. Likewise, the generally small sizes, the different definitions of the phenotype, and the limited number of polymorphisms that could be included in association studies of candidate genes have not lead to consistent results. This is even true for candidate genes that have been assessed in large sized studies.25, 26, 27

However, since 2000 there have been exceptional advances in our knowledge of the human genome along with a precipitous drop in the cost of genotyping. These advances directly led to genome-wide association studies whereby large case–control samples are individually genotyped for 500 000 or more single nucleotide polymorphisms. Examples where genome-wide association studies has provided new genetic insights include the CFH gene and age-related macular degeneration,28 the FTO gene and body mass index in children and adults,29 the TCF7L2 gene and type-2 diabetes30 and the IL23R gene and inflammatory bowel disease.31 Recently, the Wellcome Trust Case Control Consortium32 identified 24 association signals for seven major diseases, including bipolar disorder, using 2000 cases and 3000 controls.

In 2006, a consortium of investigators at the VU University Amsterdam, universities in Groningen and Leiden, and UNC-Chapel Hill in the US was selected for GWA genotyping as one of the six Genetic Association Information Network (http://www.fnih.org/GAIN) studies.33 The GAIN initiative is a component of the public–private partnership of the Foundation for the National Institutes of Health Inc., and funding is via Pfizer, Affymetrix, and Abbott Laboratories. GWA genotyping for 600 000 single nucleotide polymorphisms in MDD cases and population-based controls has been conducted by Perlegen Sciences. Stage 1 genome-wide association results are scheduled to be available before the end of 2007 and will be available to the scientific community via the NCBI dbGAP web portal (http://www.ncbi.nlm.nih.gov). Approved users can download genotype and phenotype data with the restriction that use of genotype and phenotype data is restricted to psychiatric health and related somatic conditions.

In this paper, we describe the approach and logistics of the two large-scale projects in the Netherlands collecting biological samples for genetic studies, which together constitute the GAIN-MDD study. We give an overview of the phenotyping for MDD and the selection of case and control subjects for GWA genotyping in the GAIN-MDD study.

Subjects

Subjects eligible for inclusion in the GWA study of MDD come from two large longitudinal projects: 1702 depressed cases come from the Netherlands Study of Depression and Anxiety (NESDA, www.nesda.nl), and 1700 controls come from the Netherlands Twin Registry (NTR, www.tweelingenregister.org). In addition, 160 cases from the NTR and 157 controls from the NESDA study were included in the original selection, and both parents of 33 controls to form 33 trios and 21 duplicate samples.

MDD cases

Depressed subjects (cases) are mainly derived from NESDA, a longitudinal cohort study designed to be representative of those with depression and anxiety disorders in different health care settings and in different stages of the developmental history of the disorders. Details on objectives, recruitment, and methods of NESDA have been described elsewhere.34 In short, recruitment of participants took place from September 2004 through February 2007. Cases from mental health care organizations were recruited from seven outpatient regional facilities in the Netherlands (total catchment area=1 175 000 persons). When new patients were diagnosed with a depressive and/or anxiety disorder during a standardized intake assessment, they were asked to participate in NESDA. For recruitment of respondents in primary care, a three-stage screening procedure was used. A random sample of patients who consulted a GP in the last 4 months, irrespective of the reason for consultation, filled out a screening questionnaire.35 Those who screened positive were interviewed by phone with the short-form of the Composite International Diagnostic Interview (CIDI).36 Those with current MDD or an anxiety disorder were asked to participate in NESDA and were invited for a baseline assessment, which included a full CIDI interview. A community sample of NESDA cases was derived from the Netherlands Mental Health Survey and Incidence Study, a community-based study examining the prevalence of psychiatric disorders in the Netherlands.37 A multistage, stratified, random sampling procedure recruited 7076 respondents from different households in 90 Dutch municipalities. Those who were diagnosed with lifetime MDD or anxiety disorder during one of the CIDI interviews in 1996, 1997 or 1999, were approached for participation in NESDA. Finally, the NESDA sample includes a subgroup previously recruited for the Adolescents at Risk for Anxiety and Depression Study,38 a prospective cohort study examining the development and course of MDD and anxiety disorders among the offspring (18–25 years) of parents with depressive or anxiety disorders. Potential cases in the NTR were identified based on a multivariate composite score that combined survey data on depression, anxiety, and neuroticism.39 From this group of potential cases, we selected unrelated subjects, who received a CIDI interview by telephone.

Regardless of recruitment setting, similar inclusion, and exclusion criteria were used to select MDD cases. Inclusion criteria were a lifetime diagnosis of DSM-IV MDD as diagnosed through the CIDI psychiatric interview (section E, version 2.1), an age between 18 and 77 years and self-reported western European ancestry. Persons who were not fluent in Dutch and those with a primary diagnosis of a psychotic disorder, obsessive compulsive disorder, bipolar disorder, or severe substance use dependence were excluded. The 1862 cases included in GAIN, were recruited through mental health care organizations (785), primary care practitioners (603), and community samples (218 Netherlands Mental Health Survey and Incidence Study, 96 Adolescents at Risk for Anxiety and Depression Study and 160 NTR).

Control subjects

Control subjects are mainly derived from the NTR, which has collected longitudinal data by mailed surveys every 2–3 years since 1991. Data collection has occurred in 1991, 1993, 1995, 1997, 2000, 2002/3, and 2004/5 and takes place in twins and their family members. There are nearly 22 000 participants from 5546 families (some families are linked and come from larger pedigrees). The majority of families were recruited when the twins were young adults through City Council registration systems in 1990–91 and in 1992–93. After 1993 an effort was made to recruit adult and older twins through a variety of approaches. Details on recruitment, response rates, response bias, and demographic characteristics of the sample have been described previously.40 The sex distribution is 40% men and 60% women for twins, 45 and 55% for the siblings of twins, 47 and 53% for parents, and 64 and 36% for spouses. Longitudinal phenotyping by survey includes assessment of depressive symptoms (multiple instruments), anxiety, neuroticism, and other personality measures. Additionally, subjects are asked about life events, lifestyle, education, health, and religious background.

From the group of NTR participants for whom both survey data and biological samples were available, we selected controls for the MDD cases. Controls never scored high (>0.65) on a general factor score for anxious depression. The factor score is a combined measure of neuroticism, anxiety, and depressive symptoms assessed via longitudinal questionnaires.39 It has a mean of 0 and a SD of 0.7. Subjects never reported a history of MDD in any survey or at the blood sampling visit (either as a complaint for which treatment was sought from a specialist, reported medication use, or via the CIDI). Controls and their parents were born in the Netherlands or western Europe. If there were multiple eligible controls in a family, we first matched on sex and age, and used the highest number of completed questionnaires as an additional criterion. Only biologically unrelated subjects were selected.

The NESDA controls (133 from general practice, 24 from Adolescents at Risk for Anxiety and Depression Study) came from a larger healthy control group. Controls did not have a lifetime diagnosis of MDD or an anxiety disorder as assessed by the CIDI, and all controls reported low depressive symptoms at baseline (<16 on K-10 (29), <four on Inventory of Depressive Symptoms41).

IRB approvals and informed consent

The NESDA and NTR studies were approved by the Central Ethics Committee on Research involving human subjects of the VU University Medical Center, Amsterdam, an Institutional Review Board certified by the US Office of Human Research Protections (IRB number IRB-2991 under Federal wide Assurance-3703; IRB/institute codes, NESDA 03-183; NTR 03-180). All subjects provided written informed consent. As part of the GAIN application process, consent forms were re-reviewed. For NESDA, only 22 respondents refused informed consent for genetic research (1.7% of all respondents approached). NTR participants were approached with a broad consent form for the entire NTR biobank project.

Phenotyping

MDD, depressive symptoms, and other psychopathology indicators

NESDA

The Composite International Diagnostic Interview, section E, version 2.136 was used to diagnose MDD. The interview also provides information on age of onset, number of episodes of MDD, and specific symptoms of depression. Information on lifetime comorbid panic disorder with or without agoraphobia, generalized anxiety disorder, social phobia, alcohol use, and dependence was also collected with the CIDI. Depression and anxiety severity indicators include the Inventory of Depressive Symptoms-self-report (IDS-SR41), the Fear Questionnaire42 and the Beck Anxiety Inventory.43 Neuroticism, an endophenotype for MDD, was assessed with the NEO.44 The family tree inventory was used to examine depression in first degree relatives.

NTR

Phenotyping for depression in the NTR survey studies took place with the Beck depression inventory (BDI45) in 1993 and 1997 and with the depression scale from the YASR46 in 1991, 1995, 1997, and 2000. Neuroticism was assessed with the ABV47 in five out of six surveys (not in 1995) and with the NEO44 in 2004. Anxiety (STAI48) was assessed at five out of seven surveys (not in 1995 and 2004). NTR cases and a subsample of controls underwent the CIDI protocol by phone; either as part of an earlier study49 on the heritability of major depression and anxiety disorders or as part of the selection procedure for GAIN.

Biobank procedures and DNA isolation

Before the start of the NESDA and NTR biological sample collection, processing and storage protocols were harmonized. Blood sampling for the NESDA participants took place during the baseline visit and was done between 0830–0930 h at one of the seven field sites, all at walking distance of laboratories facilities for immediate processing.

For NTR, biological samples were taken at the respondents' home between 0700 and 1000 h. Starting in 2004, adult participants registered with the NTR were invited by letter to participate in a project in which blood and morning urine samples would be collected. Eligible participants were ≥18 years, had returned at least one survey or came from a family in which at least one person completed a questionnaire. The letter was followed by a telephone call to schedule an appointment for a home visit. With fertile women, an appointment was made for the third to fifth day of the menstrual cycle when possible. For women taking oral contraceptives, an appointment was made for their pill-free week. In twin individuals, we also took buccal swabs for DNA isolation50 as blood group chimerism in twins is not rare.51 Tubes were stored, as appropriate, in melting ice (0–2°C) during transport, at room temperature, or in an insulated box with a constant temperature of 37°C. Effects of transportation and storage on blood quality, RNA with and without challenge, lymphocytes and other parameters were examined in a pilot project52 and appear to be negligible.

All venous blood samples were taken after overnight fasting using a safety-lock butterfly needle. For NESDA participants, a total of 10 blood tubes were drawn in the following order: 3 × 7 ml and 1 × 2 ml EDTA, 2 × 7 ml and 1 × 2 ml Heparin, 1 × 4.5 ml CTAD, 1 × 2 ml NaF, and 1 × 6 ml ACD. For the NTR participants, a total of seven blood tubes were drawn in the following order: 2 × 9 ml EDTA, 2 × 9 ml Heparin, 1 × 4.5 ml CTAD, 1 × 7 ml serum, and 1 × 2 ml EDTA. For DNA isolation blood collected in the EDTA anticoagulant tubes was used. For DNA isolation of NESDA samples the Qiagen FlexiGene® DNA AGF3000 Kit for large volumes of fresh whole human blood was used on an AutoGenFlex 3000 workstation (Autogen, Holliston, MA, USA). The GENTRA Puregene® DNA isolation kit (manual) was used on frozen whole blood samples of the NTR. DNA concentrations were determined using the PicoGreen® dsDNA Quantitation kit from Molecular Probes. All procedures were performed according to the manufacturer's protocols.

GAIN sample

For the case and the control groups we selected unrelated subjects. Baseline characteristics of cases and controls, and of an additional NTR comparison group, are summarized in Tables 1 and 2. Tables 1 and 2 describe the data as deposited on the dBGaP website (www.ncbi.nlm.nih.gov/dbgap, 9 October 2007). After excluding subjects with genotyping or sample problems, data from 1821 MDD cases and 1822 controls are available for analyses. The actual numbers for genome-wide association analyses may turn out to be slightly smaller, as extra analyses of the genotype data may lead to removal of additional subjects. For both MDD cases and for controls there is information available on the website about demographics, personality traits (eg, neuroticism), and lifestyle (eg, smoking and alcohol use). The phenotype information for NESDA respondents is at present mainly from the baseline interview data. The baseline interview took 4 h on average and consisted of the CIDI, a series of self-report questionnaires, a medical interview, and a fasting blood draw. For NTR respondents, a total of up to seven surveys were mailed to the participating twin families over a period of 15 years. NTR respondents who took part in the biobank protocol were visited at home for an additional medical interview, a fasting blood draw, and collection of urine samples.

Table 1 Characteristics of MDD cases by setting and total MDD group
Table 2 Characteristics of NTR comparison sample and GAIN controls

Table 1 presents the phenotype data for MDD cases, as a function of the recruitment setting, and for the total group. Patients from mental care are on average somewhat younger, have less often attained a higher educational level, less often have a spouse/partner and tend to smoke more heavily. Their profiles on the neuroticism and depression scales are more unfavourable. From comparing the total MDD group to the GAIN control group in Table 2, it may be seen that the control group is slightly older, better educated, and more often with partner/spouse. With respect to lifestyle, MDD cases smoke more often (42 vs 20%) but are less often current drinkers (66 vs 80%).

Table 2 shows characteristics of GAIN controls. In addition, Table 2 provides descriptive statistics, based on the survey data, for a large group of NTR participants who are not in GAIN. The subjects in the comparison group were selected as follows: the subject was not included in GAIN and was not a family member of a GAIN participant; was 18 years or older and survey data on personality traits were available. Only unrelated persons were included in the comparison group and subjects from the comparison group were matched on age and sex with the GAIN control group. Table 2 shows that the GAIN controls are ‘hypercontrols’: they are better educated, are more often married, and smoke less often. Their scores on personality traits related to MDD and on depression indices from the survey data show a somewhat more favourable profile than the NTR comparison sample.

Discussion

Progress in unravelling the genetic architecture of multifactorial traits such as MDD has been slow. The reasons have been noted numerous times and include inadequacies in study design, inappropriate sample-recruitment strategy, and inadequately powered samples.53 We have tried to carefully select patients and controls who are at low liability of developing MDD. Given the sample size of our study we have 80% power to detect a QTL with genotype relative risk of 1.33 if the minor allele frequency is at least 0.40 (under the assumption of a log-additive model with a population prevalence for MDD of 15%54). At 80% power, the genotype relative risk increases to 1.38 and 1.58 if the minor allele frequency is 0.25 and 0.10, respectively. The majority of genetic studies performed to date have had insufficient power to uncover genetic variants of small effect.

The genetic basis of liability to MDD and related disorders has been established through meta-analysis of data from twin and family studies.13 The meta-analytic heritability estimate was 37%. We found in the Dutch population a comparable estimate of heritability, based on twice the correlation in dizygotic twin and sibling pairs, of 36% for major depression.49 Furthermore, no qualitative or quantitative sex differences were observed in genetic architecture, which is a promising starting point for genetic linkage and association analyses, as it allows pooling of data from men and women.

The GAIN MDD study will generate GWA data on MDD cases and controls at low risk of MDD derived from two large biobank enterprises in the Netherlands. All cases were diagnosed as having DSM-IV MDD using the CIDI psychiatric interview. Cases are a highly representative sample of those affected with this disorder in the Netherlands through ascertainment in a variety of health case settings. The control sample consisted of so-called ‘hypercontrols’ who were selected from the lower end of the relevant risk distribution.32 This should increase the power of the study, compared to the use of unselected control samples.

One of the major strengths of the NTR and NESDA studies is that participants are followed longitudinally and that additional phenotyping and new biological sample collection are both possible. This means that genome-wide genotype data can be used in future studies of, for example, treatment outcomes or of MDD trajectories. Also, to follow up on the independently replicated genes, detailed information is available on two major theoretical correlates of MDD, that is, hypothalamic–pituitary–adrenal axis (HPA-axis) and autonomic nervous system functioning. Additional association to such correlated biological phenotypes would give construct validity to the identified risk genotypes and point to biological mechanisms. A genotype that is associated with both depression and increased sympathetic nervous system activity, for instance, would suggest that the biological action of the gene may involve the noradrenergic synapse and its signalling cascade.

NESDA and NTR subjects have provided information on major life events, and will, in follow-up studies, also be queried about the occurrence of these events. Longitudinal data, including the course of MDD, will be available from NESDA follow-up assessments at 1, 2, 4, 6, and 8 years after baseline. For NTR participants, up to seven surveys are available and the eighth survey is scheduled for 2008. These data can be used for Stage 2 studies, in which gene-environment interactions are assessed. Further plans for future use of the GAIN data include data pooling meta-analytic studies of MDD and other psychiatric disorders and the detailed analyses of genotypic and phenotypic data in monozygotic twins who are MDD discordant.