Introduction

Strong evidence for substantial heritability of human personality comes from family, twin, and adoption studies [1]. However, the genetic and phenotypic architecture of human personality is complex and has remained uncertain despite recent advances in genomics and phenomics [2,3,4]. In general, geneticists must expect the likelihood that many genes affect each trait and each gene affects many traits [5]. When the architecture is complex, the same genetic networks may lead to different phenotypic outcomes (a phenomenon called multifinality in development or pleiotropy in genetics) [6,7,8]. Likewise, different genetic networks in complex systems may lead to the same outcome (equifinality, which is also described as heterogeneity) [8, 9].

Human personality is a striking example of the challenges involved in identifying the specific genes and molecular processes that influence complex traits. Twin studies indicate that between 30% and 60% of the phenotypic variance in personality, as assessed by a variety of instruments, is genetic in origin [10,11,12,13,14]. However, adoption studies and studies that include other family members along with twins show that most of the heritability of personality, as assessed by a variety of instruments, is likely to depend on complex interactions among multiple gene loci (i.e., epistasis) or multiple alleles at a locus (i.e., dominance), rather than the average effects of individual genes [11, 13,14,15,16,17]. Put another way, many genes are likely to operate in concert, not separately, to influence the heritability and development of personality. Nevertheless, despite extensive past effort, genome-wide association studies (GWAS) of personality have found few significant associations using a variety of personality instruments [18,19,20]. The frequent failure to account for most of the heritability of complex traits has been called the “missing” [21] or “hidden” [22] heritability problem.

The Temperament and Character Inventory (TCI) measures two domains of personality hypothesized to be related to different genetic and neuronal networks [23]. Imaging studies show that TCI character traits are associated with brain networks for intentional and meta-cognitive processes, such as self-reflection, goal-setting, empathy, and episodic learning, whereas temperament traits are related to generating and conditioning automatic behaviors, such as stress reactions [24,25,26,27,28]. In this article, we focus on TCI character traits of self-directedness (i.e., purposeful, responsible vs. aimless, blaming), Cooperativeness (i.e., helpful, empathic vs. hostile, self-centered), and self-transcendence (i.e., altruistic, spiritual vs. individualistic, skeptical). These are the self-regulatory components of personality that determine the degree to which a person's adaptive functioning is healthy or unhealthy [29]. In related articles, we examine temperament traits and their relations with character in the same samples.

We have chosen to apply strictly data-driven machine learning methods in a person-centered approach to GWAS to uncover the complex genotypic and phenotypic architecture of personality [6, 30, 31] (Supplementary Figure S1). We postulate that personality heritability is not missing, but is distributed in multiple networks of interacting genetic and environmental variables that influence different people [6, 31,32,33].

Subjects and methods

Description of the samples

Our discovery sample was the Young Finns Study, an epidemiological study of 2149 healthy Finnish children followed regularly from 1980 (ages, 3–18 years) to 2012 (ages, 35–50 years) [34]. Childhood environments were directly assessed with the rearing parents in 1980 and 1983 [35,36,37,38,39]. Adult environments and life events were assessed with subjects in 2001 [40, 41]. All subjects (56% women) had thorough standardized genotypic and phenotypic assessments, including administration of the TCI in 1997, 2001, 2007, and 2012 [34, 42].

We replicated the results in two independent samples of healthy adults from Germany [43, 44] and Korea [45, 46], in which comparable genotypic and phenotypic features were available (see Supplement). The Korean study involved 1052 unrelated individuals extracted from a national register (aged 28–81, 57% women). The German study involved 902 subjects (aged 20–74, 49% women) randomly selected from Munich city registry and screened to exclude anyone with a history of psychiatric illness in themselves or their first-degree relatives.

Personality assessment

All subjects completed the TCI to assess seven heritable dimensions of personality [23, 47]. The TCI measures four dimensions of temperament and three dimensions of character (self-directedness, cooperativeness, and self-transcendence) with strong reliability, as described in Supplementary Section 1 and Supplementary Table S1 [23, 47]. The 13 subscales of character from the TCI were used as the primary data about character in all three samples (Supplementary Section 2). Character profiles for each person were based on median splits of each subscale to distinguish high and low scorers [48].

Personality health indices

People at risk of unhealthy personality were identified as the bottom decile of the sum of TCI self-directedness and cooperativeness [48]. Prior work shows this criterion indicates ill-being or personality disorder (i.e., poor physical, mental, and social functioning) [49, 50]. In contrast, people with healthy personalities were identified as the top decile of the product of all three TCI character traits. Prior work shows that this criterion indicates well-being or flourishing (i.e., superior physical, mental, and social functioning) [29, 48, 51]. These indices provided consistent measures of the health status of subjects in all three samples. The health value of a set (i.e., group of people) is the average value of its members.

We also identified an empirical index of character functioning by clustering the 13 character subscales of the TCI (Supplementary Section 3 and Table S2). The empirical index of character provided a single comprehensive measure of character functioning that could be associated a posteriori with each SNP set based on semi-supervised learning [52] and used in SNP-set Kernel Association Test (SKAT) [32, 33] and heritability analyses. It was highly correlated with the other health indicators (p < E-20, RMSE 0.03).

Genotyping

The Finnish sample was genotyped by using Illumina Human670-Quad Custom, (i.e., Illumina 670k custom) arrays [53]. The Korean sample used Affymetrix Genome-Wide Human SNP Array 6.0 and Illumina HumanCore [45]. The German sample used Affymetrix Genome-Wide Human SNP Array 6.0, Illumina OMNI Express and the 300 Array, prephased and imputed with SHAPEIT2 and IMPUTE2. Some German individuals had also been genotyped on Illumina Omni1-Quad. Quality control was performed for all samples as in prior work [6] (Supplementary Section 3).

After quality checks, a subset of SNPs were preselected with the PLINK software suite [54] to reduce the large search space using a generously inclusive threshold (p-value <0.01 without Bonferroni correction) for possible association with character, taking gender and ethnicity into account as covariates of the individual SNPs. Preselecting SNPs identified SNPs that have weak associations with character that are not individually significant genome wide after Bonferroni correction, but provided presumptive candidates for epistatic interactions in a SNP set. The preselection also identified SNPs with a strong additive effect individually, thereby providing a manageably sized initial pool of SNPs as candidates for both the additive and non-additive components of the genetic architecture of character. We accounted for ethnicity in each sample by using the first three principal components for ancestral stratification of SNP genotypes (Supplementary Section 3) [55].

Computational procedures

The cluster analyses used the validated Generalized Factorization Method, which utilizes deep non-negative matrix factorization (NMF) to uncover naturally occurring (i.e., unsupervised) associations between patterns across different types of data, including genetics [56,57,58,59] and neuroimages [30, 60]. The clustering was entirely data driven without restrictive assumptions about the number or content of the clusters [31]. For example, clusters may have different features, and one subject can belong to more than one cluster [6, 30, 31, 56, 61]. The recurrent application of the clustering process is summarized and schematically related to unsupervised deep NMF learning in Supplementary Figure S1 [62]. The advantages of this clustering approach over alternative analyses of single or multiple markers are described in Supplementary Section 4.

Our web server application for phenotype–genotype many-to-many relations analysis (PGMRA) in GWAS is published [31] and online at http://phop.ugr.es/fenogeno. The PGMRA method and algorithm are also summarized in Supplementary Sections 5 and 6, which includes a semi-supervised classifier of phenotypes from genotypes. PGMRA properly accounts for linkage disequilibrium (LD) efficiently (i.e., without loss of information about complex genotypic–phenotypic relations) (Supplementary Section 4). Statistical analysis correcting for multiple comparisons, as well as gender and ethnicity as covariates of the SNP sets, was performed by the SKAT [32, 33], also accessible via PGMRA. Heritability was estimated from a trimmed regression of SNPs on the empirical index of character controlling for outliers and environmental variables [63, 64] (also see Supplementary Section 7).

Replicability of results was evaluated in the three independent samples for SNP sets, phenotypic sets, and genotypic–phenotypic relations using multi-objective optimization techniques [6], as detailed in Supplementary Section 8. We also evaluated how well the individual genotypic sets were able to predict the classification of the phenotypes in each sample using the PGMRA classifier (Supplementary Section 9). Further details are available in Supplementary Information and elsewhere [56,57,58,59].

Results

Identifying SNP sets as candidates for causal variability

We exhaustively identified 902 non-identical but possibly overlapping SNP sets in the Finnish sample using PGMRA without knowledge of the phenotype. The SNP sets were comprised of different numbers of SNPs and/or subjects, regardless of their phenotypic status. The SNPs were mapped to diverse functional classes of genetic variants that may be located on different chromosomes, frequently even within a single SNP set (Figs. 1a, 2a–d). SNP sets are organized as networks of multilocus genotypes (Fig. 1a, b; Supplementary Figure S2, Supplementary Table S3). They were labeled by a genotypic identification ‘G’, followed by two numbers: the first indicates the maximum number of clusters and the second indicates the order of selection by the algorithm. SNP sets were associated with different health risks (Table 1, Supplementary Table S2).

Fig. 1
figure 1

a Two examples of SNP sets are represented as heatmap submatrices or biclusters. SNP sets were identified by distinct patterns of molecular features of SNPs in subgroups of subjects. Allele values are indicated as BB (dark blue), AB (intermediate blue), AA (light blue), and missing (black). SNP sets were labeled for specificity by a pair of numbers representing the maximum number of clusters from which the bicluster was selected (e.g., 33 clusters may produce more specific than 21) and the order in which they were selected by the method (e.g., 4th bicluster or factor selected by FNMF when the maximum number of clusters was 21) and usually have a prefix G for genotype or P for phenotype. Only a subset of optimal and cohesive sets are selected across all number of clusters (See Supplementary Methods). The SNPs within each SNP set can map to different chromosomes (e.g., 6 and 8) and exhibit distinct molecular consequences (see Supplementary Table S3). The pie chart shows the percentage of SNPs within a SNP set that belong to each type of consequence. b Dissection of a GWAS in a Finnish population to identify the genotypic and phenotypic architecture of personality measured by the TCI. The genotypic network is depicted as nodes (SNP sets) linked by shared SNPs (blue lines) and/or subjects (red lines) (see also Supplementary Figure S3A for additional subnetworks). Each SNP set maps to one or more genes (see Supplementary Table S6 for full list of genes associated with each SNP set). SNP sets associated with each of the five general character profiles are distinguished by color-coding as shown in the legend (see Table 3). c, d Comparison of level of ill-being (c where high values indicate ill-being) and for level of well-being (d where high values indicate well-being) in groups of subjects with each of the five character profiles specified by both phenotypic and genotypic information (evaluated by ANOVA). (Compare with either genetic or phenotypic assessment alone in Supplementary Figure S6). e Variation in health status of SNP sets: well (blue, see d), ill (orange, see c), intermediate (gray). f 12 genotypic-phenotypic pipelines connect different sets of genes to the same character dimension (see also Supplementary Tables S9S12). Red lines indicate direct connections, whereas blue lines and “&” indicate composite connections. g Surface showing the pattern of health status of the subjects in this study based on SNP set information only (i.e., interpolation from Table 1). The probability of well-being in the z-axis varies from high (red for high well-being) to low (green). The order of the SNP sets is based on shared subjects (x-axis) and on shared SNPs (y-axis) measured by hypergeometric statistics, so SNP sets sharing more SNPs and/or subjects are nearby (see ill health surface in Supplementary Figure S4). h Surface showing the pattern of health status of subjects based on both genotypic information (SNP sets) and phenotypic information (character sets) (as in Table 3). The probability of well-being in the z-axis varies from high (red, high well-being) to low (green). The sharing of subjects is shown for both SNP sets (x-axis) and character sets (y-axis) (see ill health surface in Supplementary Figure S5)

Fig. 2
figure 2

a, b Types of genetic variants mapped by SNP sets associated with character: a Specific molecular consequences (Supplementary Table S5) and b their subtypes. Genes related only to character sets (red) were less often protein coding and more often RNA genes than those also associated with temperament sets (blue color). c Cell displaying the molecular pathways containing genes associated only with the organized profile. The uncovered genes influence the phosphatidyl inositol/calcium second-messenger signaling system that regulates the seeking of food and other goals in response to external environmental signals (see also Supplementary Tables S4, S7). d Multiple SNPs within a SNP set can affect a single or multiple genes in many ways (Supplementary Table S3). Within the MTA3 gene, SNPs in the SNP set G_12_1 may affect both coding and regulatory regions (thereby inhibiting transcription), whereas SNPs from SNP set 40_26 are mostly located in intronic regions (thereby blocking or decreasing protein production). The SNP sets are associated with profiles exhibiting distinct character features (creative vs. apathetic)

Table 1 Description of 42 SNP sets associated with character sets (p < 1E-05)

Identifying clusters of subjects with distinct character profiles

We identified 342 non-identical but possibly overlapping character sets using the 13 character subscales without knowledge of the genotype. Character sets were labeled by a phenotypic identification “C” to distinguish them from the SNP sets. These fine-grained character sets were nested within five character supersets that were identified by recurrently applying PGMRA to minimize the cophenetic correlation coefficient (Table 2) [62]. In other words, five groups of people had highly distinct character profiles.

Table 2 Description of the five character profiles (supersets) and composite character sets identified by PGMRA from profiles of TCI subscales (Y = yes)

The people in three of the five character profiles had healthy personalities, which we named resourceful, organized, and creative to be consistent with traditional labels for TCI profiles (Table 2). For example, people with the "organized" character profile were high in most subscales of self-directedness and cooperativeness, but were low in all subscales of self-transcendence (i.e., they were controlling, individualistic, and skeptical). People with the "creative" profile were high in all aspects of character, whereas the "resourceful" were only self-directed.

In addition, there were two profiles of people with unhealthy personalities. The people with a "dependent" character profile were highly forgiving when abused (CO4), conscientiously considerate of others (CO5), self-deprecating (SD4), and otherwise low in self-directedness and self-transcendence. The people with an "apathetic" character were low in all aspects of character development (Table 2).

Association of SNP sets with character

We tested the association of SNP sets with character. The empirical index of character, a single quantitative measure of character functioning, was more strongly associated with SNP sets than with the average effects of their constituent SNPs according to SKAT (Table 1). Forty-two SNP sets had significant associations with character (p < 1E-05). For example, the SNP set G_11_4 has a p-value of 1.22 E-19, whereas the best and average SNPs within this set have 9.00 E-05 and 3.29 E-02 p-values, respectively (Table 1). SKAT [32] and PLINK [54] methods estimated similar p-values for the individual SNPs (R2 = 0.99, F statistics, p < 3.8 E-46), showing that SKAT did not inflate results.

Forty-two SNP sets significantly associated with character are described in Table 1. We assigned names to the SNP sets based on prominent molecular processes and pathways that distinguished them (Supplementary Table S4). The character-related SNP sets were comprised of networks of SNPs that mapped 727 genes, nearly all of which are known to influence individual differences in brain functions, particularly regulation of neurodevelopment, neuroplasticity, neuroprotection, connectivity, energy metabolism, stress reactivity, resilience, longevity, learning, and memory (Supplementary Tables S5, S6).

Complex genotypic–phenotypic relationships in personality profiles

We found that 55 of the 342 character sets were significantly associated with particular SNP sets (hypergeometric statistics, 1E-11 < p < 1E-03, Table 3). The genotypic–phenotypic relations were complex, demonstrating pleiotropy and heterogeneity. For example, G_5_1 involved neuroplasticity and was frequently associated with dependent character sets, but sometimes with apathetic or creative profiles (Table 3). The 55 character sets were associated with the 42 SNP sets in 128 relationships that were significant by a permutation test (Table 3, empirical p < 4.7 E-03).

Table 3 The strength of the genotypic–phenotypic relationships among SNP and character sets and their corresponding health measurements

SNP sets (Fig. 1b, Supplementary Figure 2A) often had similar character profiles associated with particular molecular processes (Table 3, Supplementary Tables S4, S7). For example, the organized profile was strongly associated with many SNP sets involving the regulation of inositol–calcium signaling for obtaining food and other goals (e.g., G_8_8, G_11_4) and for neuroprotection against injury (G_12_8). SNP sets regulating episodic learning and hippocampal neurogenesis (e.g., G_7_3, G_12_1) were associated with a creative profile.

Relations among SNP sets to one another and to molecular processes

We found 12 single and disjoint nodes, and at least three subnetworks composed of highly connected nodes, shown in Fig. 1b and Supplementary Figure S3A. These networks were relatively disjoint (i.e., sharing few SNPs and subjects; see Supplementary Information 9. Identification of Sub-networks), suggesting that these are distinct antecedents of personality. These nearly disjoint networks vary in size and complexity: one subnetwork connected eight SNP sets (Supplementary Figure S3A), whereas others had only a single SNP set.

One network contained SNP sets primarily connected by shared SNPs, but not subjects (e.g., G_10_1 learning/memory and G_7_7 olfaction, Fig. 1b), as expected when the same SNPs had different allele values. This network was associated with dependent and organized personality profiles (Fig. 1b).

Both shared subjects and SNPs connected the other two networks (Fig. 1b), as occurs when one network is a subset of another. The first network was primarily composed of organized (e.g., components of inositol signaling by G_11_4, G_8_8, G_3_1) and apathetic (e.g., G_21_3 cellular senescence, G_7_2 GPCR dysregulation) profiles. The second network displayed creative (e.g., G_3_2, G_7_3, G_9_8) and dependent (e.g., G_38_8, G_5_1) profiles.

Finally, some SNP sets within a network do not share SNPs, but independently specify almost the same individuals (e.g., G_8_8 inositol/chemokine signaling, G_7_2 GPCR dysregulation, Fig. 1b), as expected when distinct subsets of genotypic features influence a common pathway or consequence.

Heterogenic pathways influence the same character trait

The genes associated with each of the five character profiles are largely different. In all, 68% of the 727 genes associated with character were unique to a single character profile: 208 with organized, 89 with creative, 70 with dependent, and 130 with apathetic (Supplementary Table S8). Consequently, there were multiple groups of genes that lead to each individual character trait, as depicted in Fig. 1f. For example, high self-directedness occurs in individuals with the resourceful, organized, and creative profiles, even though these profiles have different genetic backgrounds. Put another way, individual character traits were genetically more heterogeneous than the multidimensional character profiles.

We refer to the multiple genotypic–phenotypic networks that contribute to individual traits as a pipeline, as outlined in Fig. 1f. Detailed descriptions of the specific genes and molecular processes we found in the pipelines for each of the three character traits are presented in Supplementary Tables S9S12.

Complex genotypic–phenotypic relationships influence health status

The combination of genotypic and phenotypic information provided more information than either alone for both well-being (Fig. 1g vs. Fig. 1h) and ill-being (Supplementary Figures S4 vs. S5). When health status was based on the joint relationship of SNP sets and character sets, all five character profiles were well distinguished in terms of the probabilities of ill-being (p < 3.89E-26, ANOVA statistics, Fig. 1c) and well-being (p < 3.68E-65, ANOVA, Fig. 1d). In contrast, when health status was based on character scores only, the probability of ill-being was greater in only two profiles and that of well-being was greater in only one profile (Supplementary Figure S6).

We identified candidate regulatory genes that we called switch genes because of their relationship to changes in health status among people with the same character profile (Fig. 1e). For example, all apathetic SNP sets were associated with ill-being except G_9_3, which was associated with well-being. In contrast, the creative SNP sets were associated with well-being except for G_7_7, which was associated with ill-being. The 150 switch genes included 50% protein coding genes, 18% RNA genes, 15% pseudogenes, 3% transcription factors, and 4% others (Supplementary Table S13).

Overall about 67% of the 727 genes associated with character sets may be involved in regulatory processes: these included transcriptional regulators (10%), lncRNAs (24%), other RNA genes (6%), and targets of microRNAs (27%), as identified in the TRANSFAC® release 2017.1 database (Supplementary Table S14). We identified two microRNAs (MIR431, MIR1762) in association with character, and they target 74 and 119 of the 727 genes we found associated with character in TRANSFAC, respectively. In particular, lnc RNAs were more commonly associated with character only then with temperament and character, whereas protein-coding genes were more commonly associated with both temperament and character, as shown in Fig. 2a, b.

Replication of results in two independent samples

We tested the replicability of our findings in the Finnish study by carrying out the same analyses in the German and Korean samples. In all, 95% of the 42 SNP sets associated with character sets in the Finnish sample were identified in one or both of the replication samples: 36 were identified in both the Korean and German samples, three in the Korean sample only, and one in the German sample only (Supplementary Table S15). In addition, 96% of the 55 character sets associated with SNP sets in the Finnish sample were replicated in one or both of the replication samples: 46 in both, six in Korean sample only, and one in the German sample only (Table S16). The genotypic–phenotypic relations between SNP and character sets identified in the Finnish sample closely matched those observed in the Korean study (94%) and in the German (84%) study (Table S17). The replication of the 25 character sets associated with ill-being in the Finnish sample was reduced in the German sample (72%) compared with the Korean sample (84%)(ANOVA, p = 0.01), as expected because the Germans had been screened to exclude psychopathology, including personality disorders, in themselves or their first-degree relatives (Supplementary Figure S7). The strength of the identity of replicated sets was calculated using hypergeometric statistics and multi-objective optimization techniques (see Pareto values in Supplementary Tables S18, S19).

We also surveyed prior literature reporting associations with TCI character-related keywords systematically from PubMed, and identified genes that had been reported to be associated with one or more of the TCI character traits in one or more investigations (Supplementary Tables S20, S21). We found that 116 of our detected genes were related to genes, family of proteins, or pathways of genes previously associated with TCI traits (Supplementary Table S20). Among the genes in character-related SNP sets, we also detected 74% of the 111 genes that had been previously associated with TCI traits, and 75% of the 63 genes that had previously been reported in association with TCI character traits (Supplementary Table S21). Considering all genes previously related to the TCI (Supplementary Table S21), we recovered seven genes with the same exact name, another 34 variants from the same family of proteins, and another 41 genes in the same KEGG pathway previously reported.

Estimation of heritability and environmental influences

The heritability of character controlling for outliers was estimated as 57% in the Finns, 58% in the Germans, and 50% in the Koreans (Supplementary Table S22). In addition, 95% of the SNP sets were strongly associated with the empirical character index (5E-11 > p-value > 5E-77). In other words, the SNPs that comprise different SNP sets strongly distinguished the character values of the subjects in each set, indicating that each individual SNP set contributed significantly to explain the total distributed heritability (Supplementary Section 9). Consequently, when the genotypic sets were used to classify the well-being and ill-being of the subjects as measured by their character values, the predicted values were highly accurate (average areas under curve of the classifications were 0.928 and 0.932, respectively) (Supplementary Figure S9).

We also considered environmental influences in the Finnish sample. There were direct associations of sets of environmental influences in childhood and adulthood with character sets (Supplementary Table S23A) and with SNP sets (Supplementary Table S23B). The impact of these correlations was small, so the heritability estimate was still 56% in the Finnish sample when adjusted for gene-environment correlation (Supplementary Table S23C). In addition, five novel associations between SNP sets and character sets emerged when environmental influences were used as mediators: years of education in childhood and stressful life events in adulthood had significant effects on organized and dependent character profiles (Supplementary Table S23D, p < 2.9 to 8.4 E-03).

Discussion

This is the first data-driven study to examine the genotypic–phenotypic architecture of human character traits, which are the self-regulatory components of personality that modulate physical, mental, and social well-being [48, 65]. As such, it represents a pioneering effort to describe the psychobiology of character as a complex network of genotypes with specific molecular processes and neuronal functions that regulate personality development. We explained 50–58% of the heritability of human character and replicated our results in independent samples, thereby accounting for nearly all the heritability expected from twin studies.

Complexity of genotypic–phenotypic pipelines

We observed that 68% of the 727 genes for character were unique to a single character profile and were regulated by distinct molecular processes and neuronal functions. Such minimal overlap in genes and molecular mechanisms between personality profiles is very surprising from a trait perspective. For example, both the organized and creative character profiles are high in self-directedness and cooperativeness, and differ only in self-transcendence. The resourceful profile differs from the apathetic profile only in being high in self-directedness. Thus, we hypothesize that people can become highly self-directed by multiple mechanisms: a creative or intuitive route involving enhancing self-awareness in episodic memory, an organized or analytical route involving executive control of what is known from past experience, and/or taking initiative by learned resourcefulness.

Likewise, there are three or more routes via distinct genetic pipelines to cooperativeness and/or self-transcendence. Consequently, individual personality traits are genetically heterogeneous and their development depends on multiple mechanisms that can only be distinguished by consideration of the whole person. Individual traits may still be important for study of development or treatment, but they do not appear to be the fundamental building blocks of personality.

Regulatory processes and functions associated with character

We observed that 67% of the 727 character genes were involved in regulatory systems. In particular, lncRNAs were more common in association with character only than with both temperament and character (Fig. 2a, b). The identified genes are reported to influence neuroplasticity, energy metabolism, and the regulation of adaptations to a wide variety of biological, psychological, and social stressors through processes for intentional goal-seeking, self-control, empathy, and episodic memory (Table 1). These genetic findings are supported by independent neuroimaging findings that TCI character traits are associated with brain networks for these same intentional and meta-cognitive functions [24,25,26,27].

An interesting sign of the high predictability of variability in health status was our finding that a few genes could dramatically alter the health status of people with each specific SNP set, including 150 putative switch genes across all 42 SNP sets. The dramatic effect that a few switch genes can have on overall health status is further evidence of the importance of epistasis for understanding personality and its development.

Strengths and limitations

Our unbiased analytical PGMRA method used deep cluster analysis to identify association between possibly interactive sets of features instead of between individual SNPs or character traits. The results were strongly replicated in independent samples, demonstrating remarkable robustness. Furthermore, the neuronal functions of the identified genes are supported by independent research about brain networks related to TCI character.

Our initial pool of SNPs was preselected to be the best candidates to have additive and/or non-additive effects on character. The threshold for possible association (p-value of 0.01 without Bonferroni correction) in our initial pool of SNPs was more than six orders of magnitude below what is required for genome-wide significance. We sought to evaluate the cooperative effects of groups of SNPs with possible non-additive gene–gene interactions and those with strong additive effects individually (i.e., very low p-values). Therefore, we included SNPs that were either weakly or strongly associated with character singly, and then compared their significance as a group vs. that of the best SNP within the group. Consequently, these candidate SNPs may have no main (additive) effect on the phenotype at all, but when organized as SNP sets, they presented consistent evidence of epistasis (i.e., each SNP set had stronger associations with character than their best single constituents). In addition, the SNPs we identified were sufficient to account for nearly all the heritability expected from twin studies (about 50%), which includes both additive and non-additive effects.

Our findings are based on associations only, which precludes definite conclusions about causation. Nevertheless, the circumstantial evidence for our causal hypotheses is strong and merits further testing.

Conclusions and recommendations for future research

We were able to characterize and replicate the complexity of the genotypic–phenotypic risk architecture of self-regulatory character traits in three large samples. Our findings demonstrate that data-driven analysis of the architecture of genotypic–phenotypic relationships enables investigators to overcome the hidden heritability problem (i.e., the consistent inability to account for most of the heritability of complex traits when only the average effects of genes are considered). We conclude that self-regulatory personality traits are strongly influenced by organized interactions among more than 700 genes, despite variable cultures and environments. We recommend studies that dissect detailed phenomic and genomic data, including brain images and physiological measurements, and integrate these in a multi-faceted view of each person. We also recommend an extended replicability analysis, in which a marker can be replicated at different multi-omic levels, such as genes, family of proteins, or pathways. The precision of our person-centered approach now allows such in-depth analysis and replication, even for complex traits in moderate-sized samples.