Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Genomic structural equation modelling provides insights into the multivariate genetic architecture of complex traits

Abstract

Genetic correlations estimated from genome-wide association studies (GWASs) reveal pervasive pleiotropy across a wide variety of phenotypes. We introduce genomic structural equation modelling (genomic SEM): a multivariate method for analysing the joint genetic architecture of complex traits. Genomic SEM synthesizes genetic correlations and single-nucleotide polymorphism heritabilities inferred from GWAS summary statistics of individual traits from samples with varying and unknown degrees of overlap. Genomic SEM can be used to model multivariate genetic associations among phenotypes, identify variants with effects on general dimensions of cross-trait liability, calculate more predictive polygenic scores and identify loci that cause divergence between traits. We demonstrate several applications of genomic SEM, including a joint analysis of summary statistics from five psychiatric traits. We identify 27 independent single-nucleotide polymorphisms not previously identified in the contributing univariate GWASs. Polygenic scores from genomic SEM consistently outperform those from univariate GWASs. Genomic SEM is flexible and open ended, and allows for continuous innovation in multivariate genetic analysis.

This is a preview of subscription content, access via your institution

Access options

Rent or buy this article

Prices vary by article type

from$1.95

to$39.95

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Genomic SEM solutions for p- and neuroticism-factor models with SNP effect.
Fig. 2: Manhattan plots of unique, independent hits from genomic SEM.
Fig. 3: Out-of-sample prediction using genomic SEM- and univariate-based PGSs for psychiatric traits.

Similar content being viewed by others

Data availability

The data that support the findings of this study are all publicly available. Links to the location of summary statistics, linkage disequilibrium scores, reference panel data and the code used to produce the current results can all be found at https://github.com/MichelNivard/GenomicSEM/wiki.

Code availability

GenomicSEM software is an R package that is available from GitHub at https://github.com/MichelNivard/GenomicSEM. The GenomicSEM R package can be installed directly at https://github.com/MichelNivard/GenomicSEM/wiki. Example GenomicSEM code, including code used to produce the results, is provided for each set of analyses at https://github.com/MichelNivard/GenomicSEM/wiki.

References

  1. Lee, S. H. et al. Genetic relationship between five psychiatric disorders estimated from genome-wide SNPs. Nat. Genet. 45, 984–994 (2013).

    Article  CAS  PubMed  Google Scholar 

  2. Bush, W. S., Oetjens, M. T. & Crawford, D. C. Unravelling the human genome–phenome relationship using phenome-wide association studies. Nat. Rev. Genet. 17, 129–145 (2016).

    Article  CAS  PubMed  Google Scholar 

  3. Yang, J., Lee, S. H., Goddard, M. E. & Visscher, P. M. GCTA: a tool for genome-wide complex trait analysis. Am. J. Hum. Genet. 88, 76–82 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. ReproGen Consortium et al. An atlas of genetic correlations across human diseases and traits. Nat. Genet. 47, 1236–1241 (2015).

    Article  PubMed Central  Google Scholar 

  5. Barban, N. et al. Genome-wide analysis identifies 12 loci influencing human reproductive behavior. Nat. Genet. 48, 1462–1472 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Jansen, P. R. et al. Genome-wide analysis of insomnia (N=1,331,010) identifies novel loci and functional pathways. Nat. Genet. 51, 394–403 (2019).

    Article  CAS  PubMed  Google Scholar 

  7. Wray, N. R. et al. Genome-wide association analyses identify 44 risk variants and refine the genetic architecture of major depression. Nat. Genet. 50, 668–681 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Okbay, A. et al. Genome-wide association study identifies 74 loci associated with educational attainment. Nature 533, 539–542 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Verhulst, B., Maes, H. H. & Neale, M. C. GW-SEM: a statistical package to conduct genome-wide structural equation modeling. Behav. Genet. 47, 345–359 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  10. Beaumont, R. N. et al. Genome-wide association study of offspring birth weight in 86,577 women identifies five novel loci and highlights maternal genetic effects that are independent of fetal genetics. Hum. Mol. Genet. 27, 742–756 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Turley, P. et al. Multi-trait analysis of genome-wide association summary statistics using MTAG. Nat. Genet. 50, 229–237 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Cheung, M. W.-L. metaSEM: an R package for meta-analysis using structural equation modeling. Front. Psychol. 5, 1521–1532 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  13. Savalei, V. & Bentler, P. M. A two-stage approach to missing data: theory and application to auxiliary variables. Struct. Equ. Modeling 16, 477–497 (2009).

    Article  Google Scholar 

  14. Yuan, K. H. & Bentler, P. M. Robust mean and covariance structure analysis through iteratively reweighted least squares. Psychometrika 65, 43–58 (2000).

    Article  Google Scholar 

  15. Browne, M. W. Asymptotically distribution‐free methods for the analysis of covariance structures. Br. J. Math. Stat. Psychol. 37, 62–83 (1984).

    Article  PubMed  Google Scholar 

  16. Huedo-Medina, T. B., Sánchez-Meca, J., Marín-Martínez, F. & Botella, J. Assessing heterogeneity in meta-analysis: Q statistic or I 2 index? Psychol. Methods 11, 193–220 (2006).

    Article  PubMed  Google Scholar 

  17. Caspi, A. & Moffitt, T. E. All for one and one for all: mental disorders in one dimension. Am. J. Psychiatry 175, 831–844 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  18. Caspi, A. et al. The p factor. Clin. Psychol. Sci. 2, 119–137 (2013).

    Article  Google Scholar 

  19. Pettersson, E., Larsson, H. & Lichtenstein, P. Common psychiatric disorders share the same genetic origin: a multivariate sibling study of the Swedish population. Mol. Psychiatry 21, 717–721 (2016).

    Article  CAS  PubMed  Google Scholar 

  20. Smoller, J. W. et al. Psychiatric genetics and the structure of psychopathology. Mol. Psychiatry 24, 409–420 (2019).

    Article  PubMed  Google Scholar 

  21. Stochl, J. et al. Mood, anxiety and psychotic phenomena measure a common psychopathological factor. Psychol. Med. 45, 1483–1493 (2015).

    Article  CAS  PubMed  Google Scholar 

  22. Seed, C. et al. Hail: An Open-Source Framework for Scalable Genetic Data. Neale Lab http://www.nealelab.is/blog/2017/7/19/rapid-gwas-of-thousands-of-phenotypes-for-337000-samples-in-the-uk-biobank (2017).

  23. Nieuwboer, H. A., Pool, R., Dolan, C. V., Boomsma, D. I. & Nivard, M. G. GWIS: genome-wide inferred statistics for functions of multiple phenotypes. Am. J. Hum. Genet. 99, 917–927 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Rietveld, C. A. et al. GWAS of 126,559 individuals identifies genetic variants associated with educational attainment. Science 6139, 1467–1471 (2013).

    Article  Google Scholar 

  25. Ruderfer, D. M. et al. Polygenic dissection of diagnosis and clinical dimensions of bipolar disorder and schizophrenia. Mol. Psychiatry 19, 1017–1024 (2014).

    Article  CAS  PubMed  Google Scholar 

  26. Maier, R. M. et al. Improving genetic prediction by leveraging genetic correlations among human diseases and traits. Nat. Commun. 9, 989–993 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  27. Van der Sluis, S., Posthuma, D. & Dolan, C. V. TATES: efficient multivariate genotype–phenotype analysis for genome-wide association studies. PLoS Genet. 9, e1003235 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Allegrini, A. et al. Genomic prediction of cognitive traits in childhood and adolescence. Preprint at biorXiv https://www.biorxiv.org/content/10.1101/418210v1 (2018).

  29. Rhemtulla, M., Brosseau-Liard, P. É. & Savalei, V. When can categorical variables be treated as continuous? A comparison of robust continuous and categorical SEM estimation methods under suboptimal conditions. Psychol. Methods 17, 354–373 (2012).

    Article  PubMed  Google Scholar 

  30. Pers, T. H. et al. Biological interpretation of genome-wide association studies using predicted gene functions. Nat. Commun. 6, 5890–5895 (2015).

    Article  CAS  PubMed  Google Scholar 

  31. Li, Z. et al. Genome-wide association analysis identifies 30 new susceptibility loci for schizophrenia. Nat. Genet. 49, 1576–1583 (2017).

    Article  CAS  PubMed  Google Scholar 

  32. Hu, Y. et al. GWAS of 89,283 individuals identifies genetic variants associated with self-reporting of being a morning person. Nat. Commun. 7, 10448–10453 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. The Autism Spectrum Disorders Working Group et al. Meta-analysis of GWAS of over 16,000 individuals with autism spectrum disorder highlights a novel locus at 10q24.32 and a significant overlap with schizophrenia. Mol. Autism 8, 21 (2017).

    Article  Google Scholar 

  34. Hill, W. D. et al. A combined analysis of genetically correlated traits identifies 187 loci and a role for neurogenesis and myelination in intelligence. Mol. Psychiatry 24, 169–181 (2019).

    Article  CAS  PubMed  Google Scholar 

  35. Martin, N. G. & Eaves, L. J. The genetical analysis of covariance structure. Heredity 38, 79–95 (1977).

    Article  CAS  PubMed  Google Scholar 

  36. Zhu, X. et al. Meta-analysis of correlated traits via summary statistics from GWASs with an application in hypertension. Am. J. Hum. Genet. 96, 21–36 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Ray, D. & Boehnke, M. Methods for meta‐analysis of multiple traits using GWAS summary statistics. Genet. Epidemiol. 42, 134–145 (2018).

    Article  PubMed  Google Scholar 

  38. O'Reilly, P. F. et al. MultiPhen: joint model of multiple phenotypes can increase discovery in GWAS. PLoS One 7, e34861 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. de Vlaming, R., Johannesson, M., Magnusson, P. K., Ikram, M. A. & Visscher, P. M. Equivalence of LD-score regression and individual-level-data methods. Preprint at biorXiv https://www.biorxiv.org/content/10.1101/211821v1 (2017).

  40. Lee, J. J., McGue, M., Iacono, W. G. & Chow, C. C. The accuracy of LD Score regression as an estimator of confounding and genetic correlations in genome-wide association studies. Genet. Epidemiol. 42, 783–795 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  41. Jöreskog, K. G. & Sörbom, D. LISREL 8: Structural Equation Modeling with the SIMPLIS Command Language (Scientific Software International, 1993).

  42. Boker, S. M. & McArdle, J. J. Path analysis and path diagrams. Wiley StatsRef: Statistics Reference Online https://onlinelibrary.wiley.com/doi/10.1002/9781118445112.stat06517 (2014).

  43. Bulik-Sullivan, B. K. et al. LD score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat. Genet. 47, 291–295 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  44. Baselmans, B. M. et al. Multivariate genome-wide analyses of the well-being spectrum. Nat. Genet. 51, 445–451 (2019).

    Article  CAS  PubMed  Google Scholar 

  45. Bates, D., Maechler, M., Davis, T. A., Oehlschlägel, J. & Riedy, R. matrix: Sparse and dense matrix classes and methods. R package version 1.2-12 (2017).

  46. Flora, D. B. & Curran, P. J. An empirical evaluation of alternative methods of estimation for confirmatory factor analysis with ordinal data. Psychol. Methods 9, 466–491 (2004).

    Article  PubMed  PubMed Central  Google Scholar 

  47. Savalei, V. Understanding robust corrections in structural equation modeling. Struct. Equ. Modeling 21, 149–160 (2014).

    Article  Google Scholar 

  48. Yarkoni, T. & Westfall, J. Choosing prediction over explanation in psychology: lessons from machine learning. Persp. Psychol. Sci. 12, 1100–1122 (2017).

    Article  Google Scholar 

  49. Lloyd-Jones, L. R., Robinson, M. R., Yang, J. & Visscher, P. M. Transformation of summary statistics from linear mixed model association on all-or-none traits to odds ratio. Genetics 208, 1397–1408 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  50. Kenny, D. A. Measuring model fit. David A. Kenny http://davidakenny.net/cm/fit.htm (2014).

  51. Kaplan, D. Structural Equation Modeling: Foundations and Extensions Vol. 10 (Sage, 2008).

  52. Tanaka, J. S. Multifaceted conceptions of fit in structural equation models. In Testing Structutal Equation Models 10–37 (Sage, 1993).

  53. Hu, L. T. & Bentler, P. M. Cutoff criteria for fit indexes in covariance structure analysis: conventional criteria versus new alternatives. Struct. Equ. Modeling 6, 1–55 (1999).

    Article  Google Scholar 

  54. Bentler, P. M. & Hu, L. T. in Structural Equation Modeling: Concepts, Issues, and Applications 76–99 (SAGE Publications Inc., 1995).

  55. Bentler, P. M. & Satorra, A. Testing model nesting and equivalence. Psychol. Methods 15, 111–123 (2010).

    Article  PubMed  PubMed Central  Google Scholar 

  56. Purcell, S. et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559–575 (2007).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  57. Consortium, I. H. The international HapMap project. Nature 426, 789–796 (2003).

    Article  Google Scholar 

  58. Ripke, S. et al. A mega-analysis of genome-wide association studies for major depressive disorder. Mol. Psychiatry 18, 497–511 (2013).

    Article  CAS  PubMed  Google Scholar 

  59. Ripke, S. et al. Biological insights from 108 schizophrenia-associated genetic loci. Nature 511, 421–427 (2014).

    Article  CAS  PubMed Central  Google Scholar 

  60. Muthén, L. K. & Muthén, B. O. Mplus: The Comprehensive Modeling Program for Applied Researchers. Version 7.3. https://www.statmodel.com/download/usersguide/MplusUserGuideVer_7.pdf (Muthén & Muthén, 2014).

  61. Euesden, J., Lewis, C. M. & O’Reilly, P. F. PRSice: polygenic risk score software. Bioinformatics 31, 1466–1468 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  62. Eysenck, S. B., Eysenck, H. J. & Barrett, P. A revised version of the psychoticism scale. Pers. Individ. Diff. 6, 21–29 (1985).

    Article  Google Scholar 

  63. Smith, B. H. et al. Cohort profile: Generation Scotland: Scottish Family Health Study (GS:SFHS). The study, its participants and their potential for genetic research on health and illness. Int. J. Epidemiol. 42, 689–700 (2012).

    Article  PubMed  Google Scholar 

  64. Rossel, Y. lavaan: An R package for structural equation modeling and more. Version 0.5–12 (BETA) http://users.ugent.be/~yrosseel/lavaan/lavaanIntroduction.pdf (2012).

  65. Neale, M. C. et al. OpenMx 2.0: extended structural equation and statistical modeling. Psychometrika 81, 535–549 (2016).

    Article  PubMed  Google Scholar 

Download references

Acknowledgements

E.M.T.-D., K.P.H. and A.D.G. were supported by NIH grant R01HD083613. E.M.T.-D., S.J.R. and I.J.D. were supported by NIH grant R01AG054628. E.M.T.-D. and K.P.H. were each supported by Jacobs Foundation research fellowships. E.M.T.-D. and K.P.H. are members of the Population Research Center at the University of Texas at Austin, which is supported by NIH grant P2CHD042849. M.G.N. is supported by a Royal Netherlands Academy of Science Professor Award to D. I. Boomsma (PAH/6635), ZonMw grant: ‘Genetics as a research tool: a natural experiment to elucidate the causal effects of social mobility on health’ (pnr: 531003014) and ZonMw project: ‘Can sex- and gender-specific gene expression and epigenetics explain sex-differences in disease prevalence and etiology?’ (pnr: 849200011). H.F.I. is supported by the ‘Aggression in children: unraveling gene–environment interplay to inform treatment and intervention strategies' (ACTION) project. ACTION receives funding from the European Union Seventh Framework Program (FP7/2007-2013) under grant agreement number 602768. P.D.K. and R.d.V. were supported by ERC Consolidator Grant 647648 EdGe. I.J.D., A.M.M., S.J.R., R.E.M. and W.D.H. are members of the University of Edinburgh Centre for Cognitive Ageing and Cognitive Epidemiology, which is part of the cross-council Lifelong Health and Wellbeing Initiative (MR/K026992/1). W.D.H. is supported by a grant from Age UK (Disconnected Mind Project). PGS analyses for the p factor were conducted under UKB dataset resource–application number 4844. PGS analyses for neuroticism were conducted using data from Generation Scotland. Generation Scotland received core support from the Chief Scientist Office of the Scottish Government Health Directorates (CZD/16/6) and the Scottish Funding Council (HR03006). Genotyping of the Generation Scotland:Scottish Family Health Study samples was carried out by the Genetics Core Laboratory at the Wellcome Trust Clinical Research Facility, Edinburgh, Scotland, and was funded by the Medical Research Council UK and Wellcome Trust (Wellcome Trust Strategic Award ‘STratifying Resilience and Depression Longitudinally’ (STRADL) reference 104036/Z/14/Z). Ethical approval for the Generation Scotland:Scottish Family Health Study was obtained from the Tayside Committee on Medical Research Ethics (on behalf of the National Health Service). The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript.

Author information

Authors and Affiliations

Authors

Contributions

A.D.G., M.R., H.F.I., M.G.N. and E.M.T.-D. developed the software. A.D.G., M.G.N. and E.M.T.-D. developed the theory underlying genomic SEM. A.D.G., M.R., R.d.V., M.G.N. and E.M.T.-D. developed the techniques and mathematical derivations. A.D.G., T.T.M., M.G.N. and E.M.T.-D. performed the simulation studies. S.J.R., R.E.M. and E.M.T.-D. performed the polygenic prediction analyses. A.D.G., M.G.N. and E.M.T.-D. wrote the manuscript. M.R., S.J.R., T.T.M., W.D.H., A.M.M., I.J.D., R.E.M., P.D.K. and K.P.H. provided feedback and edited the manuscript.

Corresponding author

Correspondence to Andrew D. Grotzinger.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Methods, Supplementary Results, and Supplementary Figures 1–27.

Reporting Summary

Supplementary Dataset

Study raw data presented in Supplementary Tables 1–21.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Grotzinger, A.D., Rhemtulla, M., de Vlaming, R. et al. Genomic structural equation modelling provides insights into the multivariate genetic architecture of complex traits. Nat Hum Behav 3, 513–525 (2019). https://doi.org/10.1038/s41562-019-0566-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41562-019-0566-x

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing