Introduction

The current paper presents a developmental study of behavior problems (BP) and their relationship to timing of menarche. The unique measurement feature of our data is the availability of measures of behavior/conduct problems at three different age periods (two of which use comparable instrumentation): childhood, young adolescence, and older adolescence. There are two unique design features. First, in relation to timing of menarche, BP are measured before, approximately concurrently, and after menarche. Second, we use kinship patterns to decompose the multivariate relationship across our developmental measures into biometrical components, with timing of menarche as a moderator.

The timing of menarche marks an important developmental moment in the life of an early adolescent girl. Social scientists have proposed that it may also mark, directly or indirectly, other developmental processes as well. Recent research has focused on the relation between BP and menarcheal timing (MT). But the causal directionality involved in the relation between MT and BP is still unspecified. We define a taxonomy of six possible causal directions: (a) MT has some causal influence on BP; (b) BP have an influence on MT; (c) both causal processes occur simultaneously; (d) MT moderates the developmental process associated with BP; (e) some other mediating or moderating set of processes cause them to covary with no direct causal influence of any kind; (f) there is no relationship. Social scientists confidently have stated each interpretation above (but seldom have accounted for the possibility of simultaneity).

Both theoretical and empirical research has marked MT as one of the lynchpins in a complex and interrelated set of processes. Although timing of menarche clearly contains important information, the causes and correlates of MT are not easily modeled. In particular, timing of menarche cannot be manipulated experimentally. Therefore, quasi-experimental designs are used to study its causes and consequences, and such designs are replete with threats to internal validity (see Shadish et al. 2002).

To investigate alternative a–c above, we use one of the strongest leverage points within quasi-experimental designs, the logic of temporal precedence. Simply stated, it is logically impossible for something later in time to directly cause something that occurred earlier in time. Pearl (2000) discussed “causal symmetry,” which implies that a natural threat to the validity of the assertion that A causes B is the possibility that B may cause A instead of (or in addition to) A causing B. Pearl noted (p. 250), “By imposing the constraint that an effect never precede its cause, the symmetry is broken and causal inference can commence.” A prerequisite for a causal link from BP to MT would be a correlation between BP and MT, with BP measured before menarche. A prerequisite for a causal link from MT to BP would be a correlation between MT and BP, with BP measured after menarche. Ge and Natsuaki (2009) identify several theoretical reasons to expect such a correlation, and focus on this latter directionality. Or, both correlations of MT with BP measured before and after menarche could be significant, supporting investigation of reciprocal causes. A cross-sectional twin study (Burt et al. 2006) found that MT moderated biometrical structure in early adolescent conduct disorder. It is possible for there to be no causal relation between BP and MT (i.e., none of a–c are correct), and yet MT could moderate developmental patterns (alternative d above). Or, causal patterns could occur in addition to the moderating patterns. Obviously, the complexity of the causal questions have only begun to be addressed.

The ultimate methodological and substantive goals of the current study are to compare and interpret patterns in developmental behavior problem data in relation to the explanations in a–f above (with a particular focus, motivated by previous literature, on (d)). We begin with a review of the literature relating menarche and BP. Next, we describe our data and analyses, and fit phenotypic and biometrical models to the correlations/covariances reflecting multivariate relationships between our critical variables. Finally, we discuss our results, and place them in the context of past findings in the literature.

Literature review

Research on timing of menarche and behavior problems

Several scholars have developed theories of childhood and adolescent BP (e.g., Conradt et al. 2013; Ellis 2004; Lahey and Waldman 2003; Moffitt 1993; Repetti et al. 2002). Belsky, Steinberg, and Draper (1991; also see Belsky 2000; Draper and Harpending 1982) suggested that stress/violence in a female’s early family environment could stimulate a reproductive strategy of early menarche, early sexual involvement, and early reproduction, compared to females raised in less-stressful environments, an adaptive evolutionary response to insure reproductive success. The mechanisms stimulating this response include father absence, and family aggression and stress, a process Ellis (2004) referred to as “psychosocial acceleration theory;” the role of father absence he called “paternal investment theory.” Researchers evaluating this theory have found correlations between father absence and MT (Chisolm et al. 2005; Deardorff et al. 2011; Doughty and Rodgers 2000; Hoier 2003; Mustanski et al. 2004; Rowe 2002; see Brooks-Gunn 1988 for an exception).

Others have challenged the Belsky et al. hypothesis. Maccoby (1991) and Rowe (2000) proposed an alternative suggesting that genetic and/or environmental contingencies from previous generations are the causes of both early family stress and later child outcomes. This argument specifies a selection model, in which the causal interpretation is not a direct path from family stress to child outcomes, but rather suggests that other background factors select parents for whom family stress and giving birth to children with certain child outcomes will co-occur. Molecular/behavior genetic studies have empirically evaluated this argument (Comings et al. 2002; Jorm et al. 2004; Rowe 2002); each supported the cross-generational selection model. Mendle et al.’s (2006) analyses of children-of-twins designs “strongly suggest selection, and not causation, accounts for the relationship between stepfathering and early menarche” (p. 533). Belsky has expanded his perspective to partially accommodate cross-generational selection, by defining the differential susceptibility of children to both “negative effects of risky environments,” as well as “positive effects of supportive environments” (Belsky et al. 2007 p. 300). The theoretical origins of differential susceptibility extend back before the child’s specific family environment (as in the selection argument), but have evolutionary origins.

This larger research agenda is relevant to the current paper because the link between BP and MT has also been used to distinguish cross-generational selection from direct causation. Moffitt et al. (1992) tested genetic influence on problem behaviors as the basis for selection (also see Pickles et al. 1998). Of particular relevance for the current study, they found a nonsignificant correlation (r = .04) between BP at age seven and self-reported age at menarche among 378 adolescent girls in New Zealand, and this correlation remained small when BP were separated into internalizing and externalizing dimensions, findings supportive of the selection interpretation. Caspi et al. (1993) used the same data to link MT and later adolescent delinquency. As in the current study, they obtained ratings from different development periods, conduct problems from childhood, norm-breaking behaviors in early adolescence, and self-reported delinquency at middle adolescence. However, rather than analyzing the trajectory of conduct problems, they compared girls in all-girls schools to those in mixed-sex schools. They found a moderating effect of MT on conduct problems at age 13, but only for girls in mixed-gender schools.

Burt et al. (2006) used data from 354 twin pairs (218 MZ and 136 DZ) from the Minnesota Twin Family Study to evaluate the link between MT (defined into three categories: early, average, and late) and conduct disorder (CD; measured using a DSM-III-R symptom-count) during the age range 13–16. They found a significant, though small, correlation between CD and MT, r = .09 (with MT reverse-coded). Further, mean age at menarche did not differ for girls with early-onset CD versus those with average timing of CD, further supporting the absence of a MT and CD link, measured both earlier in childhood, and during early adolescence. The results of their biometrical analysis will be reported in the next section. Several researchers have focused on shared genes as the source of the selection process. Empirical work by Van Hulle et al. (2009) showed overlapping genetic variance in antisocial behavior between late childhood and early adolescence (though MT was not a part of this framework). Button et al. (2005) found that a “risk genotype conferring susceptibility to family dysfunction is responsible for most of the variance in antisocial symptoms in childhood and adolescence” (p. 115). The findings reviewed above treat MT as an outcome in the causal framework (though no explicit link from childhood problem behaviors to MT has been proposed). But MT also has been treated as endogenous to child outcomes, and to problem/antisocial behaviors in particular.

Several different theories directly link MT and BP. First, the accentuation hypothesis (Caspi and Moffitt 1991) posits that disruptive transitions in the life course, such as puberty, can potentially magnify already-present behaviors (e.g., problem behaviors). They used data from the Dunedin Multidisciplinary Health and Development Study, and their empirical results studying timing of menarche in relation to BP supported an “early timing hypothesis.” Second, Ge and Natsuaki (2009) reviewed four theories (including the accentuation hypothesis). The maturation disparity hypothesis focuses on the difference between physical and social/psychological development to explain how early puberty could cause problem behaviors. They suggested that this hypothesis has seldom been directly tested, and noted that “the higher rates of psychopathology among early maturers are expected because their slow developing neurocognitive systems are mismatched with the fast-approaching social and affective challenges at the onset of puberty” (p. 329). The other theoretical orientations they review—contextual amplification and hormonal influence—are also designed to explain the directional link from MT to BP. These past results provide motivation for looking at disparate sources and directions for the causal links among MT and problem behaviors.

To summarize, there is a consistent and important empirical finding that has not been given its proper attention in past research. Because of the natural interest in the link between father’s absence and early family stress with subsequent timing of menarche, the absence of a correlation between problem behaviors/conduct problems and MT is notable. But MT also played a moderating role in Burt et al. (2006). Caspi et al. (1993) and others support that the relation between MT and behavior/conduct problems can be subtle (see Celio et al. 2006; Posner 2006).

Biometrical research on behavior/conduct problems and timing of menarche

Considerable biometrical research exists studying behavior/conduct problems in univariate and multivariate contexts. These studies support moderate heritability in BP/delinquency, and low levels of shared environmental variance (Ellis 2004). Rhee and Waldman (2002) conducted a meta-analysis of biometrical studies of conduct disorder, and estimated the proportion of additive genetic variance = .32, slight evidence of non-additive genetic variance (dominance) = .09, the proportion of shared environmental variance = .16, and the proportion of nonshared environmental variance/measurement error = .43. Using data from the Children of the National Longitudinal Survey of Youth (NLSYC), (Rodgers et al. 1994a) documented heritability in the Behavior Problem Inventory (BPI) in the NLSYC, and (Rodgers et al. 2001a, b) reported delinquency heritability patterns in the original adolescent NLSY79 sample (which includes the mothers of the NLSYC).

Menarche has also been studied in univariate biometrical analyses, and consistent patterns have been obtained. Twin studies showing heritability in MT include Pickles et al. (1998) and Kaprio et al. (1995). Rowe (2000) used the Add Health twin data, and reported a menarche heritability (estimating proportion of menarche variance that is genetic) of h2 = .44 and shared environmental variance of c2 = .08. Doughty and Rodgers (2000) used the NLSY79 family data to examine the etiology of MT and obtained h2 = .51 and c2 = .01. Mustanski et al. (2004) used longitudinal pubertal development data from Finnish female twins, and found h2 = .42 and c2 = .21 (obtained by squaring the standardized menarche coefficients in their Fig. 1). Ellis (2004) summarized the twin literature as having “converged on the conclusion that genotypic effects account for 50–80 % of the phenotypic variation in MT, and that the remaining variance is attributable to nonshared environmental effects and measurement error” (p. 922). Meyer et al. (1991) suggested that there exists some nonadditive genetic variance in MT, which could hide potential shared environmental influence.

Fig. 1
figure 1

a Design structure for phenotypic correlational analysis (note: no correlations reported in Caspi et al.). b Design structure for biometrical moderation analysis

The previous subsection identified a correlational finding that appears inconsistent with direct causation from the early family to later BP, the absence of a bivariate correlation between timing of menarche and conduct problems. Past arguments suggest that if the family environment affects menarche directly, significant c2 estimates would emerge (Rowe 2000). But whether smaller c2 values are supportive of selection models is questionable, because of the potential for the psychosocial acceleration theory processes to operate through nonshared environmental as well as shared environmental sources (Ellis 2004).

Examining the menarche-behavior problems link within a developmental biometrical design

The review above identifies a gap in otherwise extensive past research, which the current research will help fill. Few studies exist that link timing of menarche and behavior/conduct problems using genetically-informative data. Burt et al. (2006) reported a moderator analysis in which they found that heritabilities for conduct problems were higher for females who reached menarche with average timing than for females reporting either early or late menarche, and the shared environmental patterns were the reverse. This same study found relatively low phenotypic correlation between menarche and conduct disorder. No bivariate structure linking either genetic or shared environmental variance in conduct disorder and menarche was identified.

Harden and Mendle (2012) used the biometrical structure in 924 twin and sibling pairs from the Add Health data to study the link between pubertal timing and later delinquency measures. They found significant heritability in both female pubertal timing and delinquency measures. In biometrical analysis, they found a moderating effect of pubertal timing on the heritability of both violent and nonviolent delinquency. For non-violent delinquency, h2 increased and c2 decreased for increasing age at menarche; for violent delinquency, the reverse patterns was obtained. When they modeled the overlapping variance, they found that pubertal timing and both types of delinquency shared common genetic etiology, but did not share any environmental variance. They concluded that “our results challenge a purely socioenvironmental account of why early maturing girls show increased involvement in delinquent behavior” (p. 81–82).

A study by Dick et al. (2000) used a Finnish twin sample to link pubertal timing and adolescent substance use. They used within-family correlations to control for between-family selection bias. If the apparent causal link were indeed through background selection processes, the relation between pubertal timing and substance use would be fully explained by between family confounds. Instead, a direct link between pubertal timing and adolescent substance use remained. Other studies linking early puberty and increased adolescent substance use include Lanza and Collins (2002) and Stice et al. (2001). Both of these studies found small to moderate correlations between MT and subsequent substance abuse, but Stice et al. found no correlation between MT and comorbidity among outcomes such as substance abuse and eating disorders.

To summarize, there is consistent heritability in behavior/conduct problems, at various ages. There is heritability in menarche. But few multivariate designs have linked the two, and none developmentally. Menarche may function as a moderator, rather than as a primary causal variable itself. In an earlier review, (Moffitt 2005) noted the value of genetically-informed data in studying family influences on antisocial behavior: “…causation is not well understood. Studies that cannot disentangle genetic and environmental influences cannot help” (p. 533).

Method

The data and the kinship links

The data for the current study come from the National Longitudinal Survey of Youth Children (NLSYC) dataset. On December 31, 1978, a probability sample of US households containing 14–21 year old adolescents was drawn, the NLSY79, which included an oversample of minority and poor white households and a small military sample of older adolescents, with N = 12,686. Since 1979, the 5,827 females in the non-military sample have provided complete pregnancy and birth histories. By 2004, when NLSY79 females were 39–46, childbearing was almost completed. 11,428 biological children had been born, well over 90 % of the projected ultimate childbearing within this cohort (Center for Human Resource Research, 2006, p. 5).

Since 1986, a biennial survey effort has collected in-depth data from all biological children born to the NLSY79 mothers, the NLSYC sample. In 2004, those children ranged in age from infants to adults in their mid-30′s (older NLSYC respondents had mothers who had already given birth before the first year of the NLSY79 survey), and are representative of the US population of children born to women in this age cohort in the US (except for attrition, nonresponse, and design features that will be discussed).

There is a well-known selection bias that characterizes the NLSYC data, which necessarily corresponds to children born to younger mothers. Although this selection bias is disappearing over time, it is still a feature of most NLSYC research (Chase-Lansdale et al. 1991). Methods to measure and model this selection bias have been developed (see Rodgers and Rowe 1988, among others). We further discuss this form of selection bias in our Discussion section. Another potential bias in the retrospective reports in the NLSY data is the potential for telescoping, whereby the time since an event causes a time-dependent bias in reporting the event; telescoping has also been investigated in past research (Rodgers et al. 1992).

Our design dictated that we trade-off some advantages of the probability sampling process against practical requirements of our design. Our sample still supports higher external validity than most similar studies, because it originated as a probability sample. However, a complex screening process precludes valid use of the sampling weights that adjust for attrition, nonresponse, and oversampling. Though our study is focused on internal validity, we note that using the NLSYC automatically provides more external validity than many other comparison samples that did not originate from a probability sample (many of which are based on twin registries, with well-known problems of generalizability; see Lykken et al. 1987; Stoolmiller 1999).

Our research design used longitudinal information on behavior/conduct problems for ages 6–9, 12–14, and 15–17, as well as information from intact female kinship pairs in the NLSYC. Although the NLSY79 females have virtually completed childbearing, some children have not aged past late adolescence, and many families do not contain two female children. When we screened for these requirements, we obtained 890 females in 445 female–female kinship pairs, representing five different types of genetic relatedness across pairs. This is a reduced sub-sample of the NLSYC database, though it is comparable in size to many of the larger genetically informed databases used in biometrical studies (many of which are based on twin samples of MZ and DZ twins). For example, the Burt et al. (2006) study reviewed above used 354 female twin pairs, with conduct disorder measured at a single point in time.

Although the 2004 NLSYC database did not contain any information to directly ascertain genetic relatedness, a kinship linking algorithm to ascertain kinship relatedness was developed in 1994, with an update completed in 2005. This algorithm identified genetic relatedness for the NLSYC kinship pairs, and also identified cousin pairs (whose NLSY79 mothers are full siblings). The kinship pair categories (and associated genetic coefficients) included MZ twins (R = 1.0), DZ twins/full siblings (R = .50), half siblings (R = .25), cousins (R = .125), and an ambiguous full/half sibling category (R = .375; see below). The sample sizes within these categories are approximately representative of those in the population; there are 104 cousin pairs, 168 half-sibling pairs, 110 ambiguous half/full siblings pairs, 502 full sibling/DZ twin pairs, and 6 MZ twin pairs. These sample sizes are disproportionate because they are structured that way in the population. We have run many sensitivity analyses, to assess the impact of the small MZ twin category, the presence of cousins (who did not share a family environment), and the effect of the ambiguous sibling category. Results were relatively robust to these adjustments, except for a few cases which are discussed in detail below.

By design, all sibling pairs in the NLSYC are biological offspring of an NLSY79 mother, and thus all kinship pairs share a common biological mother (or a biological mother and an aunt, in the case of cousins). Whether siblings share a father is not directly indicated, but is logically implied by maternal responses. If two children have lived in the same household with their biological mother and father for some time, they are assigned full sibling status. If two siblings lived in the same household with their biological mother, but only one lived with her biological father, they are assigned half-sibling status. If neither lived with their biological father, additional location questions are used for classification. If a sibling pair cannot be classified with available NLSY information, then they are assigned a mid-range genetic coefficient of R = .375 (halfway between the sibling R = .50 and the half-sibling R = .25). A number of research studies have been published using the 1994 kinship links (e.g., Rodgers et al. 1994a, b; van den Oord and Rowe 2000), and more recent studies using the updated kinship links have been conducted (D’Onofrio et al. 2008; Goodnight et al. 2012; Rodgers et al. 2008a, b; Van Hulle et al. 2007; Van Hulle et al. 2009). The algorithm is documented in (Rodgers et al. 2005), which contains a validity analysis showing that the biometrical structure of adult height is similar to that obtained in meta-analyses of height published in the behavior genetic literature (see e.g., Plomin 1990; Mittler 1971). The kinship links are publicly available from the first author.Footnote 1

Measurement

We used four critical variables for which we obtained complete longitudinal information for a relatively large sample size of NLSYC female kinship pairs: BP in young childhood (ages 6–9), BP in early adolescence (ages 12–14), BP (delinquency) in late adolescence (ages 15–17), and timing of menarche. Descriptive statistics for MT and the three behavior problem scales are presented in Table 1.

Table 1 Overall descriptive statistics for the four constructed measures

Behavior problem scale scores for the first two age ranges were obtained from the NLSYC BPI, which is obtained by summing maternal rating of 28 behavior problem indicators, with mothers indicating “often true,” “sometimes true,” and “not true” in regards specific BP exhibited within the previous 3 months among children age 4 to 14. The NLSYC BPI instrument is a short form of the Child Behavior Check List (Achenbach 1978; Peterson and Zill 1986). We used ages 6–9 for the younger age category (specifically, between 6 years, 0 months and 9 years, 11 months at interview), and an age range of 12–14 for the older. We used a four-year age range for the younger age category to increase sample sizes; there were no systematic patterns across age in standardized female BPI scores for either age range. The overall BPI is already normed for age and gender. When respondents had two BPI measures within an interval, we averaged those scores. The age-and-gender standardized versions of the BPI scores were used, which are scaled to have a population mean of 100 and a standard deviation of 15; in our sample, the range was 71–146 (with higher values indicating higher BP), with means around half a standard deviation above the population mean (at least partially reflective of the selection bias discussed earlier of this sample being children born to younger mothers). The overall BPI reliability is .87.

Because the BPI scales contained non-conduct trait-like items (measures of dependence, anxiety, etc.), we also created a subscale specifically measuring conduct problems, based on the following seven items: cheats or tells lies; has trouble getting along with teachers; disobedient at home; disobedient at school; bullies or is cruel or mean to others; breaks things on purpose or deliberately destroys his/her own or another’s things; does not seem sorry after misbehaving. Across ages, the median coefficient alpha reliability of the seven-item scale was α = .74. This measure has been used in other NLSYC studies focused on conduct problems (e.g., Lahey et al. 2006). We will report correlations, heritabilities, and models for both complete BPI (BP_6-9 and BP_12-14) and the conduct problem subscale (CP_6-9 and CP_12-14). Results were slightly cleaner using the conduct problem scores, but the BPI models had notably better goodness of fit indicators. Final interpretations are similar across the two measure.

The later-adolescent behavior problem scale was based on delinquency scores (BP_15-17; note that there was no corresponding conduct problem subscale associated with this measure). This scale was difficult to construct, because of changes in the items used to measure delinquency across time.Footnote 2 The BP_15-17 scale was a count of the number out of seven delinquency items that the respondent indicated they had performed within the last year. We standardized the overall NLSYC female sample at each single age (using, e.g., all respondents between 15 years 0 months to 15 years 11 months to do the age 15 norming), to adjust for age changes in delinquency levels. Finally, we used separate standardization samples before and after 2000, to account for delinquency changes over time. The pre-2000 sample standardization means (and variances/sample sizes) before standardization were M = 1.52 (V = 2.49, N = 474) for age 15, M = 1.52 (V = 2.58, n = 419) for age 16, and M = 1.56 (V = 2.14, N = 307) for age 17. The equivalent post-2000 statistics were M = 1.03 (V = 1.46, N = 384) for age 15, M = 1.03 (V = 1.21, N = 404) for age 16, and M = 1.08 (V = 1.37, N = 412) for age 17. Because we used only female respondents in this study, gender norming was not an issue. As before, we averaged when there were multiple standardized delinquency measures within the three-year age range from 15–17 years.

Virtually no skewness existed in the BP_6-9, BP_12-14, and MT measures; means, medians, and skewness coefficients are presented in Table 1. Even for BP_15-17, which had a moderate skewness coefficient, the mean and median differed by less than a third of a standard deviation. We ran a few analyses using a square root transformation (pre-standardization) of the BP_15-17 scale, and results were virtually identical to those without the transformation. The raw and square root transformed BP_15-17 scales correlated r = .99. After this investigation of skewness, we ran all analysis on the constructed variables.

The final variable to support our design was timing of menarche (MT), which was self reported by each girl in two ways, as age and as month and year of menarche. Such retrospective reports are considered reliable and valid (Brooks-Gunn et al. 1987; Damon et al. 1969; Moffitt et al. 1992), especially when reporting occurs shortly after menarche. We constructed measures for stated age, and calculated age, and used the more precise calculated age based on month/year reports. In a few cases with differences greater than 2 years between calculated and stated age, we imputed stated age as the more likely correct response (N = 37 cases), or if calculated age was missing, stated age was used (the two measures correlated r = .91). Age at menarche responses ranged from 0–19; 14 respondents were disqualified for giving responses of less than 7 years of age. Finally, when stated age was imputed in place of calculated age, .5 was added to correct for the step-function feature of reported age (e.g., stated age 12 would refer to an average of 12.5 from the interval from age 12.00 to 12.99). Most respondents were only asked once, but for 604 respondents the question was re-asked at a second time (and their responses were averaged). The correlation between these—a test–retest reliability estimate—is r = .65, a somewhat low reliability for this type of response. However, 75 % of the responses were within 1 year of one another.

Scores on all four critical variables were obtained for N = 1651 NLSYC females. However, we only used the 890 respondent subset of these respondents who also had a female twin, sister, or cousin within the dataset. In a very few cases, one NLSYC female contributed to more than one pair (e.g., if there were three daughters in the family, or if an NLSYC female had both a sister and cousin in the sample). We reran analyses by dropping duplicate kinship links, and results were virtually identical to those for the full sample.

Several demographic control variables were included within the analysis. The standardization procedures described above already adjusted for age and test-administration period differences between female respondents within each of our age categories. Race was accounted for within the design, although only the Black and Non-white Non-Hispanic (Caucasian) categories were large enough to support separate analyses.

Data analysis and modeling

The phenotypic analysis helps address the question, “Do behavior problems and timing of menarche covary?”(see Fig. 1a), the first time a data source has allowed simultaneous investigation of the correlation between measures of BP at three ages to timing of menarche (Caspi et al. 1993, could have evaluated this question, but had different goals).

The biometrical modeling emerged from the part of our design that allows the separation of environmental and genetic variance for each of the three time periods in which BP were measured. This is a straightforward developmental analysis, except with the data conditioned in two interesting and useful ways. First, timing of menarche is treated as a moderator within the biometric analysis of BP. Second, relationships across the developmental periods are separated by kinship category, supporting partitioning of variance and covariances into genetic and environmental components. This analysis is motivated empirically as a developmental extension of the Burt et al. (2006) study, and is motivated theoretically in relation to the causal goals a–f defined in the introduction.

Whether genetic and environmental factors that influence BP among young children are the same (at least in part) as those still causing BP in older adolescents is an important research problem. Van Hulle et al. (2009) addressed this question using a different subset of the same data, and found significant overlap, especially for the latter two age categories. Our design allows this question to be addressed simultaneously in relation to timing of menarche as a moderator of these patterns. We partition timing of menarche in two different ways, as early, average, and late menarche (e.g., Burt et al. 2006), by using the cut points of MT < 11.5, MT between 11.5 and 13, and MT > 13; and also as a two-category indicator, separating females who matured early, MT < 11.5, from those who do not, MT > 11.5 (see Fig. 1b). The biometrical analysis models we use are based on correlation/regression procedures and structural equation models (SEMs). The regression procedure is based on a biometrical approach called DF analysis (named for DeFries and Fulker 1985). This approach uses simple OLS regression to obtain unbiased estimates of the proportion of total variance attributable to genetic and shared environmental variance (Rodgers and McGue 1994). Many DF analysis innovations have been published (Cherny et al. 1992; Kohler and Rodgers 1999; Kohler and Rodgers 2001; LaBuda et al.1992; Purcell and Koenen 2005; Rodgers and Kohler 2005; Waldman et al. 1992; Waller 1994). More complex univariate moderator analyses, and multivariate analyses, rely on Mx, a SEM program originally developed as a dedicated biometrical software system (Neale and Cardon 1992). Our analytic approach—using DF analysis models in combination with SEM methods—has been presented in methodological detail in past NLSY studies (see Rodgers et al. 1994a for a DF analysis of NLSYC BPI measures; Rodgers et al. 2001a, Rodgers et al. 2001b, for a study of NLSY79 fertility patterns using DF analysis, SEM, and a comparison of the two; and Rodgers, Buster, and Rowe, Rodgers et al. 2001a, for a DF analysis of delinquency in the NLSY79).

In both DF analysis and SEM, the biometrical model is an ACE model; A refers to additive genetic variance, C refers to common (or shared) environmental variance, and E refers to nonshared variance/measurement error. Using the different genetic categories, and the methods and assumptions of quantitative genetics (Falconer 1979; Jinks and Fulker 1970), these sources of variance can be partitioned into the ACE variances (Neale and Cardon 1992). The C component was assumed constant across all categories. Assumptions of the additive genetic model are ignorable assortative mating, equal shared environments, and an additive genetic model (Plomin et al. 1990). Within our study, there is likely to be some assortative mating on BP, although the level of that assortment is unlikely to substantially bias our results (Maes et al. 1998; Taylor et al. 2000; but also see Krueger et al. 1998). The equal environments assumption (EEA) is usually considered a larger threat to internal validity in relation to MZ twins than to other kinship categories; we have only six MZ twin pairs in our study, and these make only a minor contribution to our outcomes. However, we do include cousins, who grew up in different households, and for them the EEA is conceptually tenuous. Half-siblings may also differ from full siblings in relevant ways (e.g., they have typically experienced divorce), although the half siblings within our study virtually all reside in a common household (with their mothers). We evaluate the EEA within our analytic approach in sensitivity analysis by deleting cousins, who do not share the family environment. Violations of the additive model are diagnosed through negative variance estimates (though small negative estimates can occur by sampling error).

Results

Menarcheal timing-behavior problem relationships

In the first set of analyses we present bivariate estimates of the relationships between BP and timing of menarche (see Table 2; Fig. 1a). Low correlations occur for the link between BP and menarche with BP measured before, approximately concurrently, and after menarche. In many studies these patterns would simply be the end of the discussion, because of the consistently low correlations. The first column of Table 2 shows a lack of relationship between menarche and BP within an individual, using all 890 individuals within our study. In the second through fifth columns, we show the kinship pair correlations for different genetic categories. There is only one significant correlation among the 25 presented in Table 2; the median across these 25 correlations is r = −.03. Earlier work (e.g., Burt et al. 2006; Caspi et al. 1993; Posner 2006) provided theoretical motivation to view precocious/early menarche as of particular relevance. Thus, we also defined a dummy variable accounting for timing of menarche before the age of 11.5 or not, and recomputed the correlations; they were very similar to those in Table 2.

Table 2 Correlations between menarcheal timing (MT) and behavior problems (BP), overall and separately by levels of kinship relatedness (R is the coefficient of genetic relatedness); MZ Twins are not included because of small sample size; conduct problems subscale correlations after slashes
Table 3 Biometrical results (ACE Models) using DF analysis, behavior problems before menarche (ages 6–9)—BP_6-9; approximately concurrent with menarche (ages 12–14)—BP_12-14; after menarche (ages 15–17)—BP_15-17 (conduct problem subscale analysis results after slashes)

Burt et al. (2006) also found low correlations, but then found an interesting moderating role for timing of menarche in studying conduct disorder. Thus, we redefined menarche into categories reflecting early, average, and delayed menarche in a set of univariate and multivariate biometrical analyses.

Longitudinal relationships across behavior problems, conditional on menarcheal timing

Although there are no relationships between behavior/conduct problems and MT, there are moderate-to-strong relationships between the behavior/conduct problems measures across time (the developmental patterns in Fig. 1b). Using the overall BPI-based measures, correlations within individuals (N = 890) are r(BP_6-9,BP_12-14) = .53, r(BP_6-9,BP_15-17) = .14, and r(BP_12-14,BP_15-17) = .24 (all p < .0001). Not surprisingly, the BPI-based measures correlated more highly than either measure with late-adolescent delinquency; this pattern likely reflects similar instrumentation and rater effects (BPIs at the first two ages were based on maternal measures, delinquency scores at the third age were adolescent self reports), but also reflects that the nature of BP shifts from childhood to adolescence. There were similar correlations for the conduct problem subscales.

The three correlations in the paragraph above are within-individual correlations. The critical relationships for our biometrical analysis are the kinship correlations. Kinship pair correlations are presented for the overall dataset in Tables 4 and 5. In Table 4 the correlations are separated into kinship pair categories using the five different kinship categories described in the Methods section (cousins, genetic coefficient R = .125; half-siblings, R = .25; ambiguous siblings, R = .375; full siblings, R = .50; and MZ twins, R = 1.0). In Table 4, we further partition these by timing of menarche. In a preliminary analysis, we defined three MT categories (MT < 11.5, 11.5 < MT < 13, and 13 < MT), which creates approximate 25, 50, and 25 % categories and approximately matches Burt et al. (2006). However, correlation patterns were virtually identical across average and late menarche, and these were combined into a single category in Table 5. The correlations in Table 5 suggest that there may be lower univariate heritabilities for BP in the early menarche group (compare corresponding on-diagonal correlations across the kinship categories). For the correlations reflecting multivariate relationships, the correlations for early menarche between BP_6-9 and BP_12-14 suggest shared environmental variance, and for average/later menarche possible heritability; few of the cross-variable correlations involving BP_15-17 are high enough to suggest any patterning (compare corresponding off-diagonal correlations across kinship categories). These results will be formalized in the modeling sections.

Table 4 Overall behavior problem kinship correlations (note:1 refers to the first member of the kinship pair, 2 refers to the second member; these correlations are based on double-entered data, and therefore are intra-class correlations; correlations after slashes are for Conduct Problem subscale scores; N = 890)
Table 5 Behavior problem kinship correlations, separate for early menarche (<11.5 years old) and combined average/late menarche (>11.5 years) (Note: 1 refers to the first member of the kinship pair, 2 refers to the second member; these correlations matrices are based on double-entered data, and therefore are intra-class correlations; matrices are asymmetric because both members of a kinship pair will not necessarily be in the same menarche category; these can be interpreted as “proband—co-kin” correlations—see text for further explanation; correlations after slashes are for conduct problems subscale)

Univariate biometrical analysis

The univariate DF Analyses (ACE Models) are based on the full double-entered dataset of 890 individual observations, representing 445 kinship pairs across the five categories (see Rodgers et al. 2001a, b, for discussion of double entry). Kinship correlations are presented in Tables 1, 4, and 5, and univariate heritabilities (h2′s) and shared environmental variances (c2′s) are presented in Table 3, for the overall dataset and for the two separate menarche categories. For the first two age categories, analyses are reported using both the overall BPI and the CP measures (from the BPI conduct problem subscales). The older two menarche categories were similar in h2 and c2 patterns (as they were for their correlations), and thus were combined. A statistical test of the similarity between the two older MT categories was run for each BP measure using a dummy variable accounting for category differences; each test failed to reject equivalence of the two categories, providing statistical support for combining these two categories.

Within Table 3, a clear and statistically distinguishable difference emerges that differentiates the females who reached early menarche from those who were average/late in their MT (based on testing dummy variables coding for menarche categories). All heritabilities for the average/older menarche category for the BP_6-9 and BP_12-14 measures were significant, and of substantial magnitude; for the younger menarche category, all were small and nonsignificant. None of the c2‘s from this table were significant, although several were moderate in size. The patterns for the CP_6-9 and CP_12-14 conduct problem subscales were identical to those for the overall BPI results, with one exception. The CP_12-14 measure for early menarche showed significant heritability of approximately the same magnitude as the overall and average/later menarche categories. However, there is reason to doubt the stability of this result, based on sensitivity analyses of these patterns. When we dropped the six MZ twin pairs from the analysis (which accounted for 1 % of the total pairs), the Table 3 results were virtually identical, with the exception of the early menarche CP_12-14 h2 and c2, which were somewhat unstable in relation to the MZ twin correlations; the other 14 categories showed little sensitivity to the presence/absence of the MZ twin pairs.

The menarche measure itself had a significant heritability (h2 = .48), and nonsignificant shared environmental variance (c2 = .12). The kinship correlations underlying these patterns were r = .38 for cousins (N = 52 pairs), r = .06 for half siblings (N = 84 pairs), r = .37 for ambiguous siblings pairs (N = 55 pairs), and r = .39 for full sibling pairs (N = 251 pairs). In previous research, similar ACE models were estimated for mothers of the NLSY females, with h2 = .52 and c2 = .01 (Doughty and Rodgers 2000), and for the NLSYC females themselves, with h2 = .66 and c2 = .00 (Bard and Rodgers 2006).

To assess the EEA in relation to cousin pairs we can study correlation patterns along the diagonals in Table 5. The cousin correlations do not appear inconsistent with other patterns. When analyses were replicated without cousins, results were similar to Table 3. Heritabilities increased slightly for the BP measures in the early menarche category, but none changed in statistical significance. Heritabilities for the average/late menarche category remained large and statistically significant. The c2 estimates remained small and non-significant.

We also replicated the analyses above by race using three categories, Hispanic, Black, and Non-Black/Non-Hispanic; only the latter two were large enough to run DF Analysis (although barely, and power was uniformly low). For Black females, there were N = 114 in the early menarche and N = 226 in the average/late menarche category. For the non-Black/non-Hispanic (primarily Caucasian) female category, there were N = 64 in the early menarche, and N = 312 in the average/late menarche category (N’s reflect earlier puberty for Black females; Rowe and Rodgers 1994, document and discuss these patterns in the NLSY79 females). For BP_6-9 and BP_12-14, the heritabilities were uniformly higher for the average/older menarche category than for the younger menarche category, consistent with the basic finding in Table 3. For the BP_15-17, heritability patterns were idiosyncratic and unstable.

Multivariate biometrical analysis

The purpose of performing multivariate biometrical modeling is to estimate shared or overlapping variance with genetic and/or environmental etiology (Fig. 1b). Thus, for example, significant heritabilities for the average/late menarche category identified in the previous section could derive from genetic influences shared in common across the longitudinal measures of BP or from different genetic sources. In Table 5 there are no obvious multivariate patterns associated with BP_15-17 that are suggestive of overlapping genetic or shared environmental variance (compare the 1,3 and 3,1 elements and the 2,3 and 3,2 elements in Table 5 across genetic categories; there are low correlations, and no genetically-related pattern). However, early menarche bivariate correlations linking BP_6-9 and BP_12-14 are suggestive of overlapping shared environmental variance (note consistently moderate correlations in 1,2 elements across genetic categories), and the average/late menarche correlations are suggestive of overlapping genetic variance, and also some shared environmental variance overlap (note generally increasing patterns in 1,2 elements across genetic categories).

The univariate ACE models estimated and reported in the previous section were moderated by menarcheal category, including a statistically significant test using dummy variables. We performed a similar multivariate test using the Box M statistic to evaluate the moderating effect of menarcheal category on the covariance matrices that will be subjected to biometrical modeling. We defined covariance matrices for the overall dataset using the four behavior problem variables BP_6-9_1, BP-_12-14_1, BP_6-9_2, and BP_12-14_2, defined separately for early-menarche respondents and those who reached menarche on time or late. A standard statistical test of the null hypothesis that two independent covariance matrices came from the same population was developed by Box (1950), based on the Box M statistic (see Timm 1975, p. 251 for explanation and elaboration). When we compared the separate covariance matrices, we obtained M = 31.6, p < .001, rejecting the null of equal covariances and supporting that the early and on-time/late menarche group correlation patterns came from different populations. (Note that it is well-known that the Box M statistic is highly sensitive to non-normal distributions; all four of the BP measures defining this correlation matrix had skewness coefficients between .1 and .3, indicating only small deviations from normality. To give some perspective, the mean BP_6-8 was 107.2, with a median of 107.0. The mean BP_12-14 was 108.3, with a median of 107.0.)

We defined a cholesky model (Fig. 2) for BP_6-9 and BP_12-14 for each member of the kinship pair (i.e., four variables), in which BP_6-9 logically preceded BP_12-14, according to the temporal precedence (these cholesky models are biometrical models with temporal ordering). The model was fit to variances/covariances separately for the two menarche categories. To obtain stable covariance estimates, we combined genetic categories into two kinship levels. The R = .125, R = .25, and R = .375 kinship pairs (whose kinship correlations are defined in Table 5) were combined, and assigned their average genetic coefficient of R = .25 (note the approximately equal sample sizes in these three categories). The 502 full sibling R = .50 pairs defined their own partition (and we also ran analyses that included the six MZ twin pairs in this category; the number of MZ twins was small enough that assigning them to the R = .50 category should not affect the estimates; sensitivity analyses verified this).

Fig. 2
figure 2

Cholesky model of bivariate relationship between behavior problems pre-menarche, ages 6–9 (bppre) and behavior problems approximately concurrent with menarche, ages 12–14 (bpcon); bppre1 and bpcon1 refer to the first member of the kinship pair, and bppre2 and bpcon2 refer to the second member of the kinship pair (bppre is BP6-9/CP6-9, bpcon is BP12-14/CP12-14); A is an additive genetic source of variance, C is a shared environmental source of variance, and E is a nonshared environmental source of variance/measurement error

For each set of covariances, Mx estimated the cholesky model in Fig. 2, which optimizes a maximum likelihood (ML) function to estimate model parameters. Mx estimated 90 % confidence intervals (CI) for each parameter (choosing 90 % instead of 95 % results in erring on the side of including variables). When a parameter estimate was not significantly different from zero (i.e., the CI contained zero within the interval), we dropped that parameter and re-estimated the model. Results using the BP_6-9 and BP_12-14, and the CP_6-9 and CP_12-14 measures were slightly different, and will be reported separately.

Using BP_6-9 and BP_12-14, for early menarche, the best-fitting model is shown in Fig. 3a. The ML χ 2 = 29.5 with 15 df’s, RMSEA = .083, and the AIC = −.532 (note that these fit statistics are defined using the Mx definitions; see Neale and Cardon 1992). These are suggestive of adequate fit of this model (compare to the original fits for the overall cholesky model shown in Fig. 2 of χ 2 = 23.2 with 11 df’s, RMSEA = .083 and AIC = 1.21; the original model and the reduced model were statistically indistinguishable using a χ 2 difference test, χ 2 = 6.3 with 4 df’s, p > .15, leading to adoption of the reduced model). This best-fitting model did not include genetic variance but did include common shared environmental variance (like the univariate model). The majority of nonshared environmental variance/measurement error was estimated to be shared between the two BP measures.

Fig. 3
figure 3

a Best submodel, early-menarche subgroup, using overall BPI to measure behavior problems before and concurrently with menarche (bppre is BP6-9, bpcon is BP12-14; path coefficients that begin with a are additive genetic coefficients, with c are shared environmental coefficients, and with e are nonshared environmental/measurement error coefficients). b Best submodel, average/late-menarche subgroup, using overall BPI to measure behavior problems before and concurrently with menarche (bppre is BP6-9, bpcon is BP12-14; path coefficients that begin with a are additive genetic coefficients, with c are shared environmental coefficients, and with e are nonshared environmental/measurement error coefficients)

For the average/late menarche category, the best-fitting model is shown in Fig. 3b. The fit indices for this model were excellent; ML χ 2 = 3.40 with 15 df’s, RMSEA = .000, and AIC = −23.15 (compare to fits for the overall cholesky of χ 2 = 2.32 with 11 df’s, RMSEA = .000, AIC = −17.00; the two models are indistinguishable using a χ 2 differences test—χ 2 = 1.08 with 4 df’s, p > .80, leading to adoption of the reduced model). The best fitting model included common genetic variance, and common shared environmental variance for the two BP variables. This model suggests that essentially all of the genetic and environmental variance in BP is shared in common across childhood and early adolescence. It is notable that, although the model contained within it the ability to account for temporal precedence—i.e., the potential for BP_6-9 to be causal in relation to BP_12-14—the model simply identified common factors underlying these two measures.

This analysis was replicated using CP_6-9 and CP_12-14, and the result was only slightly different, though in interesting ways. In both cases, the model became simpler and more interpretable (though the fit measures were not as good). For the early menarche category, the genetic factors dropped out of the model, and only the common shared environmental factor (and both shared and unique e2) remained in the model (Fig. 4a). The fit was not especially good, χ 2 = 9.38 with 15 df’s, RMSEA = .09 and AIC = 4.16 (compare to χ 2 = 7.59 with 10 df’s, RMSEA = .12 and AIC = 10.16 in the overall model; a χ 2 difference test leads to adoption of the reduced model). For the later menarche category, only the common shared genetic factor remained in the model, along with e2 for the BP_6-9 measure (Fig. 4b). The fit was slightly better, though marginal; χ 2 = 29.80 with 17 df’s, RMSEA = .08, AIC = 18.65 (compare to χ 2 = 17.93 with 10 df’s, RMSEA = .11 and AIC = 24.34 in the overall model; the χ 2 difference test led to adoption of the reduced model).

Fig. 4
figure 4

a Best submodel, early-menarche subgroup, using conduct problems subscale to measure behavior problems before and concurrently with menarche (cppre is CP6-9, cpcon is CP12-14; path coefficients that begin with a are additive genetic coefficients, with c are shared environmental coefficients, and with e are nonshared environmental/measurement error coefficients). b Best submodel, average/late-menarche subgroup, using conduct problems subscale to measure behavior problems before and concurrently with menarche (cppre is CP6-9, cpcon is CP12-14; path coefficients that begin with a are additive genetic coefficients, with c are shared environmental coefficients, and with e are nonshared environmental/measurement error coefficients)

Discussion

Summary of findings

The interpretation of our results is leveraged by a set of two critical empirical findings. First (and consistent with several previous studies), we found no correlation between BP/conduct disorder and timing of menarche. This result should be interpreted in the context of two recent review articles (Mendle et al. 2007; Negriff and Susman 2011), who developed a framework for the role of pubertal timing (in both sexes for Negriff and Sussman) and subsequent outcomes. Even those articles, in reviewing consistent patterns, note inconsistencies in these relationships (e.g., Negriff and Sussman summarized their findings by noting that “The relationship between pubertal timing and psychosocial adjustment is not straightforward, as indicated by the inconsistencies noted in the previous sections” p. 733). Our findings raise doubt as to the legitimacy of an effort to untangle the causal alternatives a–c in our introduction. There is little justification to develop models to explain the absence of correlational patterns. Second (and expanding on previous findings), timing of menarche plays a strong moderating role when developmental variance is partitioned biometrically; alternative d in our introduction is supported, and is even more interesting in the absence of direct causal paths between problem behaviors and MT. Specifically, NLSYC respondents measured at childhood or young adolescence who reach menarche at typical or later ages have the majority (for BP) or all (for conduct disorder) of the variance accounted for by shared genetic variance. NLSYC respondents who reach menarche early have all of that variance (for both BP and CD) accounted for by the shared environment.

Relationship between our results and the Burt results

Our results are a substantial extension of those from Burt et al. (2006). Each set of findings are validated by the close match with one another (though their delayed menarche finding did not replicate in our data). Their design was not longitudinal, and so ours expands the developmental interpretation of the findings. Further, our dataset originated as a national probability sample, with better external validity and a more diverse sample. Finally, the family design of our study, compared to Burt et al.’s twin design, supports expanded interpretation.

Conduct disorder in the Burt et al. (2006) study was measured approximately concurrently with menarche, so that their findings can be compared to those in our Tables 2 and 3 associated with our BP_12-14 and CP_12-14 variables. Our finding of univariate shared environmental variance underlying early menarche and genetic variance underlying typical MT exactly matches their pattern of results. We also replicate the univariate young adolescent results for childhood (but not for late adolescence). We also find shared variance in a multivariate developmental design as well as in a univariate design.

Threats to validity

Before we provide theoretical interpretation, we review several threats to internal validity within our quasi-experimental design. The first threat is the problem of selection bias. Because the NLSYC females in this study necessarily were born to younger mothers than the NLSYC females who are still too young to have their menarche and adolescent delinquency measured, our findings apply to daughters of mothers younger than the population average. In and of itself, this appears to be more of a threat to external than to internal validity (and many studies have made greater sacrifice to obtain a convenience sample; e.g., virtually all recruited twin samples fall into this category). However, because we are studying the distinction between early menarche and average/later menarche, if the selection bias in the NLSYC impacts the distributions of these two groups, internal validity would be threatened. This threat is partially answered by age-standardizing using national norms. In addition, the similarity between the Burt et al. and our findings helps address this concern. In a few years when all of the NLSYC respondents reach late adolescence, the selection bias can be addressed empirically.

The second and third threats to internal validity that we address involve issues of statistical conclusion validity. Correlations are attenuated under range restriction. If the effect of defining the early menarche group creates restriction on the behavior problem distributions, then there simply may not be as much variance to explain, and thus the reduced heritabilities could be an artifact. We offer three responses. First, delayed menarche was defined with approximately the same range restriction as the early group, and did not show a reduction in heritability (though it did in Burt et al. 2006). Second, the shared environmental variances, which would potentially be affected by range-restricted correlations, were higher in the early menarche category. Third, we can observe the phenotypic variances directly. The early menarche group had standard deviations of SDBP_6-9 = 13.6, SDBP_12-14 = 16.3, and SDBP_15-17 = .96. The larger (N = 650 vs. N = 240) and less restricted average/delayed menarche group had standard deviations of SDBP_6-9 = 14.0, SDBP_12-14 = 14.7, and SDBP_15-17 = 1.01. The menarche category definitions do not appear to be creating substantial problems with range restriction.

The second statistical issue is a problem defined in relation to estimating biometrical cholesky models. A challenge has been raised involving boundary conditions, such that the ML χ 2 statistic may not be distributed as a χ 2 under the null (Carey 2005; Dominicus et al. 2006; also Self and Liang 1987). We note that using DF Analysis avoids this problem, because negative variances can be estimated (and inform as to violations of the model), and these models are fit with least squares instead of ML. Our cholesky models were estimated using ML optimization, but we used two additional fit statistics to evaluate model fit. The sequence of model adjustments was equivalent using ML, the AIC, and the RMSEA.

Next, we note a minor concern over the timing of our measures. The measure of BP approximately concurrently with menarche is obviously later in relation to puberty for girls who reached menarche early than those who reached menarche on time or late. This problem, however, is at least partially addressed by our developmental design, and by the 3 year age window. We did not, for example, observe differences in standardized BP/conduct disorder within years across the three-year windows.

Finally, we address the assumptions of the biometrical model as threats to internal validity. We have treated the EEA problem that involved including cousins by evaluating results with the cousins excluded; findings were equivalent. Second, the several c2 estimates that were negative were not large enough negative to arouse concern that dominance or some other nonlinear genetic confound (e.g., epistasis) is a problem. Third, the assumption of vanishing assortative mating, which is probably incorrect (though perhaps not substantially so), has a known effect on patterns of findings. Significant heritabilities in the presence of non-ignorable assortative mating would be partially attributable to this genetic source. However, in our critical comparison, there would have to be differential assortative mating in relation to BP for parents of girls who reached early menarche versus average/delayed menarche. Although this is not fully implausible, we also do not have any information suggesting that this is the case. We urge caution in relation to this issue. The NLSY links between mothers and daughters may be useful to help address this type of cross-generational issue in future research (see Rodgers et al. 2008a).

Implication of these results for theories of early family environment influences

In this last section, we interpret our findings in relation to earlier theories linking the family environment, BP, and timing of menarche. We believe that the literature signals the compelling findings that we report in this paper. The absence of a correlation between MT and child BP obviates any interpretation that the family environment is a direct causal agent in the relationship between the development of problem behaviors and early menarche, because there is no direct relationship. If a common causal source did influence both (family stress, parental discipline, or longer-term family culture), moderate correlations would have been observed. Our results do not appear consistent with the Belsky et al. (1991) direct-causal link hypothesis (though we note that we did not measure family stress directly), and continue a longstanding and growing literature supporting an indirect selection model (also see Comings et al. 2002; Jorm et al. 2004; Mendle et al. 2006; Rowe 2002). It is difficult to develop a behavioral model of influence on MT—evolutionary or otherwise—within which family stress was critical, but which would leave childhood BP and CP entirely out of the explanation. However, there exists an intriguing apparent inconsistency, waiting for an explanation. Father’s absence consistently correlates with MT, whereas childhood BP do not. As one hint in understanding this causal chain, we note that father’s absence—related to divorce—has a more proximal link to MT, because both are related to sexual and reproductive behavior. (We note, however, this link is not the topic of our paper.)

One of our most intriguing findings—and one that required a developmental design to detect—is that the difference between girls who will eventually experience early menarche is already defined in childhood BP. But this difference is not carried through a straightforward link from individual differences in BP to timing of menarche. Rather, it required a biometrical decomposition to detect it, and then it is quite striking. The girls who will eventually go through early menarche are different, at an early age, from those who will not, because their shared family environment affects their BP. For those who will eventually have typical or late MT, their shared genes affect their BP. These differences are still detectable in early adolescence, and have virtually complete shared variance with the childhood patterns. They substantially disappear by later adolescence, when individual differences for both categories are largely genetic.

This finding has a relationship to a theory proposed to explain human fertility patterns. This theory explains why genetic influences, though potentially expressible, lie latent and unexpressed because of the dominance of family environmental influences. Udry (1996); (Rodgers et al. 2007); and (Rodgers et al. 2008a, b) discuss this interpretation in relation to fertility, and research has documented that timing of menarche relates to adolescent sexual and fertility behavior (Flannery et al. 1993; Phinney et al. 1990). In natural fertility settings, in which universal childbearing is socially normative, genetically-based individual differences in fertility motivation cannot be expressed. In contracepting societies, however, these genetically-based sources can emerge.

We posit that this same explanation may exist in relation to behavior/conduct problems in families with both a genetic and cultural tradition of BP. If such BP are normative, consistently expressed, and consistently evoked, typical individual differences emerging from genetic variance might not be easily expressed. In such families, even relatively quiet and docile children may express BP in relation to their volatile social environment, even if only as a protective mechanism against actual physical harm. In contrast, in a less volatile family, their natural genetic tendencies toward problem behaviors could be expressed. Thus, only when children and adolescents have a broader behavioral repertoire—the ability to choose whether to engage in BP or not, with a breadth of both potential behavior and potential responses—can the genetic variance be expressed.

The question of whether such differences arise directly from the specific family environment (Belsky et al. 1991), or through a selection process (e.g., Mendle et al. 2006) is difficult to fully resolve, because there are many confounded features of the two explanations. Mendle et al’s Children-of-Twins design is a powerful method to control many of these confounds, however, and their support for the selection model must be taken seriously. Our findings are also consistent with a selection model—one in which long-term cross-generational patterns are primarily environmental for one type of family, and genetic for another. In fact, we note that these two different models do not necessarily compete. It is both plausible and logical that the selection model could be occurring, and some residual direct influence of the family environment could be occurring as well. However, though plausible, the consistent zero correlations between BP in children/adolescents and timing of menarche is difficult to reconcile with the direct interpretation that emerges from the Belsky et al. theory.

Rather, we propose that a theory in which genetic variability can be potentially expressed in some families, but not others, is consistent with both sets of findings. The theory also provides a mechanistic explanation to at least partially explain the Belsky et al. (2007) differential susceptibility hypothesis. This theory is consistent with our zero correlations, because there is no reason to expect correlations between behavior problem and timing of menarche within the categories. But differences between categories can be explained by the difference in diversity of choice available across the two categories.