Categorical and dimensional perspectives on mental disorders
Accurately classifying mental disorders remains a challenge for studying psychological symptoms and selecting appropriate treatment. Available classification systems, including the 5th edition of the Diagnostic and Statistical Manual of Mental Disorders (DSM-5; American Psychiatric Association,
2013), the 10th version of the International Classification of Diseases (ICD-10; World Health Organization
1993), or the upcoming ICD-11 (World Health Organization,
2019), were developed following a “top-down” approach based on clinician consensus, and mental disorders were conceptualized as categorical concepts with a disorder being defined as either absent or present (reviewed by Achenbach
2020). In comparison, a “bottom-up” approach prioritizes empirical data from which conceptualizations of mental abnormalities are derived (Achenbach,
2020). This bottom-up approach is consistent with a dimensional perspective and has spawned diagnostic instruments which are widely used in child psychopathology (e.g. the Achenbach System of Empirically Based Assessment [ASEBA]; Achenbach
1991; Achenbach & Rescorla,
2001). In addition, ongoing research efforts such as the Hierarchical Taxonomy of Psychopathology (HiTOP; Kotov et al.,
2017) attempt to identify a more accurate and potentially more parsimonious representation of the underlying structure of psychopathology. This system, which is still a work in progress, specifies six hierarchical layers, ranging from super spectra on the highest level to symptoms on the lowest level.
Super spectra are higher-order dimensions, assumed to influence all spectra on the layer below. The HiTOP specifies six
spectra (e.g. antagonistic externalizing), an array of
subfactors (e.g. antisocial behavior),
syndromes and disorders (e.g. conduct disorder [CD]), which are used synonymously with DSM-5 diagnoses at this point to facilitate communication,
components (e.g. maladaptive traits) and
symptoms (e.g. physical aggression) on the lowest level. In recent years, attention has focused on the uppermost level of the hierarchy, leading to the search for a general factor of psychopathology, the so-called
p-factor (Caspi et al.,
2014; Lahey et al.,
2012). Although the precise nature of the
p-factor is not yet understood, it may reflect a broad liability to psychopathology (Caspi & Moffitt,
2018). Overall, existing categorical diagnostic systems are challenged by dimensional models of psychopathology, which may provide a more accurate and potentially more parsimonious representation of psychopathology.
Latent factor analysis as a method to examine the structure of psychopathology
Statistical methods such as latent factor analysis can be employed to identify a common underlying factor of mental disorders. In particular, hierarchical models and bifactor models, which have become very popular over the last decade (Reise,
2012), can help to identify common factors. Hierarchical models can have several levels: In psychological research, they are often limited to two layers, in which for each domain, one first-order factor is modeled, which are all influenced by a second-order factor (Eid et al.,
2017). In a bifactor model, the general factor (g-factor) is modeled as another first-order factor, influencing all observed variables. Additionally, specific factors (s-factors) for the individual domains are modeled, which are assumed to influence the observed variables within their domain. As the bifactor model approach is based on the assumption that the g-factor causes all correlations between the s-factors, the correlations between g- and s-factors, and between all the s-factors, are constrained to zero (Eid et al.,
2017), although this latter requirement is often ignored (e.g. Caspi et al.,
2014).
Bifactor models, however, come with a few serious problems, calling their application in psychological research into question. One issue concerns the accurate interpretation of the results of these models. As bifactor models are less restrictive than correlated factor models, they generally result in better global model fit indices, although this does not necessarily indicate that these models really do fit the data better, as bifactor models are prone to overfitting (Bonifay et al.,
2017). While an array of additional bifactor-specific indices exists (Rodriguez et al.,
2016), they are frequently not calculated and reported, and the choice of the best model is based solely on model fit. This often leads to a bifactor model being “undeservingly” chosen as the best model (Watts et al.,
2019).
Some of the major issues inherent in traditional bifactor models cannot be identified when solely examining model fit, but rather require the additional assessment of bifactor-specific indices. Such issues pertain to frequently observed anomalous factor loadings (Burns et al.,
2020a; Eid et al.,
2017; Junghänel et al.,
2020; Rodenacker et al.,
2018; Thöne et al.,
2021) and vanishing s-factors (Burns et al.,
2020a). Anomalous results are any results that are not in line with the general structure of the model, and include negative or non-significant factor variances and/or factor loadings > 1 (Eid et al.,
2017). Vanishing s-factors are defined by a large number of non-significant/negative factor loadings on their respective factor, suggesting that the factor in question might not exist and leaving the defined indicators for that factor to only measure the g-factor (Heinrich et al.,
2021). This is problematic as it changes the meaning of the g-factor and the s-factors. The g-factor is no longer the general factor that was assumed to represent all domains equally well, and is instead now mainly defined by indicators from the vanished s-factor. As a further consequence of this, the meaning of the g-factor is study- and sample-specific and cannot be compared across studies (Burns et al.,
2020a), as it does not represent a general psychopathology factor as intended. It has been suggested (Eid et al.,
2017; Heinrich et al.,
2021) that the mistaken assumption of interchangeable domains could be the reason for the frequently found anomalous results and vanishing s-factors. Although the assumption of a g-factor equally influencing all psychopathological symptom complexes is very parsimonious, easy to understand and indeed tempting, previous research has shown that it is at best questionable whether the g-factor in traditional bifactor models truly represents general psychopathology (Levin-Aspenson et al.,
2021; Watts et al.,
2019).
Apart from bifactor models, there are also alternative factor models to analyze multi-faceted constructs. A frequently used model is the first-order correlated factor model (CFM), which specifies multiple distinct facets of a construct that do not overlap (Eid,
2020). When high correlations between the facets emerge, suggesting commonality, the application of a more theory-driven version of a bifactor model – a bifactor S-1 model - could be the appropriate choice. In a bifactor S-1 model, one s-factor out of the specific existing domains is left unmodeled. The s-factors are statistically contrasted against this remaining s-factor, which is now defined as the general reference domain (Burns et al.,
2020a). S-factors in a bifactor S-1 model represent the part of a domain that cannot be explained by the reference domain (Burns et al.,
2020a). In this case, domains can be structurally different, correlations between s-factors can be meaningfully interpreted, and anomalous results disappear as a result (Heinrich et al.,
2021). With this approach, assumed commonality between the s-factors is accounted for by modeling a general reference domain, the association between all factors can be assessed simultaneously, and interpretation can occur in a straightforward manner. The
a priori definition of the general reference facet, which is chosen based on theoretical assumptions or the specific research question (Eid,
2020), allows for a comparison between studies, irrespective of which s-factors are included. Eid (
2020) summarizes in which situation a CFM should be applied and when a bifactor S-1 model should be specified, as the two yield differential information and both can help to disentangle the association between different domains of psychopathology.
Latent factor structure of the externalizing spectrum
As the identification of a general psychopathology factor seems to be challenging at best, in this study, we focus on a more narrowly defined area of psychopathology – the externalizing spectrum – by applying the aforementioned factor analytic models to assess the structure of symptoms within that spectrum.
From an ICD/DSM-based perspective, externalizing symptoms can be categorized into the disorders ADHD, ODD, and CD, which have been shown to be strongly related to each other (Willcutt et al.,
2012). Regarding ADHD and ODD, Willcutt et al., (
2012) found that around 50% of children diagnosed with ADHD also met criteria for an ODD diagnosis. A study by Burns et al., (
2020b), which applied a bifactor S-1 model in a community sample of Spanish children, supported the strong association between the ADHD domains and ODD, while at the same time emphasizing the importance of distinct domains. Specifying HI as the general reference facet, the authors found the IN domain of ADHD and ODD to remain a stable component, which is strongly associated with, but still distinct from, ADHD HI. Junghänel et al., (
2020) and Thöne et al., (
2021) reported similar findings regarding the association of these domains and the stability of the ADHD IN and the ODD factor in clinical samples of German children, thus strengthening the results of Burns et al., (
2020b) with respect to the latent factor structure of ADHD and ODD. Regarding ADHD and CD, around 20% of children diagnosed with ADHD also met criteria for a CD diagnosis (Willcutt et al.,
2012). A study by Beauchaine et al., (
2010) supported the association between ADHD and CD, with the authors reporting that the vast majority of adolescent boys with early-onset CD also meet the criteria for ADHD. While a highly heritable externalizing liability factor, which is expressed as temperamental trait impulsivity, is assumed to represent the predisposing vulnerability to both of these disorders, environmental factors are also assumed to play a major role in the emergence of early-onset CD (Beauchaine et al.,
2017; Beauchaine & McNulty,
2013). Symptoms of ODD can be regarded as middlemen in the development from child impulsivity to CD problems, a pathway which is negatively reinforced by high-risk environments (Beauchaine,
2015).
From a developmental theory-based perspective, disruptive behaviors can be meaningfully described as aggressive (AGG; e.g. having tantrums, arguing, threatening) and rule-breaking (RB; e.g. lying, stealing, skipping school) dimensions, instead of categorizing these behaviors into an ODD or CD diagnosis (Burt,
2012; Burt et al.,
2015). In line with this, these dimensions have been shown to differ in terms of developmental course and etiology (Harden et al.,
2015). Moreover, the factor analytic literature supports the notion that RB and AGG constitute separable, though positively correlated (
\(\stackrel{-}{r}\) = 0.55), dimensions (Burt,
2012). Critically, however, there is a large variability around this mean correlation (range:
r = .28 − .73), which may be attributable to informant discrepancies (Burt,
2012). Finally, Achenbach’s internationally widely recognized ASEBA instruments also assess disruptive behaviors using the empirically derived AGG and RB syndrome scales (Achenbach & Rescorla,
2001).
In clinical child assessment, integrating multiple informants’ reports (e.g. children, parents, teachers) is considered a key component of best practices in evidence-based assessment, since it is unlikely that one single informant is sufficiently privy to a child’s situation-specific behavior, such as at home or school (Achenbach,
2020; De Los Reyes et al.,
2013; Los Reyes et al.,
2015; Dirks et al.,
2012). In fact, an early meta-analysis demonstrated moderate correspondence (
\(\stackrel{-}{r}\) = 0.60) between similar informants in the same context (e.g. pairs of parents), but low correspondence (
\(\stackrel{-}{r}\) = 0.28) between different types of informants (e.g. parents vs. teachers), and the lowest correspondence (
\(\stackrel{-}{r}\) = 0.22) between self and other informant reports (Achenbach et al.,
1987). Subsequent meta-analyses have demonstrated similar cross-informant discrepancies in child psychopathology (De Los Reyes et al.,
2015; Los Reyes et al.,
2019). While these informant discrepancies had been theorized to reflect some kind of invalidity or rater bias (De Los Reyes,
2011), there is growing recognition that such discrepancies may rather reflect how children’s behavior varies meaningfully across contexts (De Los Reyes et al.,
2013; Dirks et al.,
2012).
Regarding measurement invariance of the externalizing spectrum, some studies demonstrated that their factor model was generalizable (i.e.
invariant) across clinical and community samples (Rodenacker et al.,
2016). Another study (Junghänel et al.,
2020), which assessed the associations among ADHD and ODD symptomatology in a clinical sample, found individual domains of these diagnoses to be more independent from each other than was reported in a study assessing these symptoms in a community sample (Burns et al.,
2020a). Accordingly, this suggests that the relation between disorders within the externalizing domain might be influenced by the sample setting (clinical vs. community). Moreover, some studies found measurement invariance across mothers and fathers, i.e., across informants who rated their child’s behavior in the same context (Burns et al.,
2014). However, other research noted informant-specific variance across parent and teacher ratings, highlighting meaningful cross-situational variability in child behavior (Burns et al.,
2020a; Thöne et al.,
2021; Vitoratou et al.,
2019). Studies including a self-report and systematically comparing this to teacher and parent reports are still rare.
Finally, there are well-characterized sex differences in the expression of externalizing symptoms, with much higher rates in boys than in girls at a ratio of 3:1 during childhood (Beauchaine et al.,
2009; Copeland et al.,
2011; reviewed by Martel
2013). Despite these differences in prevalence rates, several studies (Lee et al.,
2016; Rodenacker et al.,
2016,
2018) demonstrated measurement invariance across sex (but see King et al.,
2018 for an exception), indicating that the structure of the externalizing spectrum itself is invariant across sex.
Factor-analytic evaluation
We applied four criteria to find the best model for all samples (Table
2).
Table 2
Model Evaluation in all Samples
(1) Model Fit Indicesa | 1. BI Ext S-1 2. BI Ext 3. BI Ext S-1* 4. CFM-3* 5. CFM-3 6. CFM-2 7. Uni | 1. BI Ext S-1 2. BI Ext 3. BI Ext S-1* 4. CFM-3 5. CFM-3* 6. CFM-2 7. Uni | 1. BI Ext 2. BI Ext S-1 3. CFM-3 4. BI Ext S-1* 5. CFM-3* 6. CFM-2 7. Uni | 1. BI Ext 2. BI Ext S-1 3. CFM-3 4. BI-Ext S-1* 5. CFM-3* 6. CFM-2 7. Uni | 1. BI Ext 2. BI Ext S-1 3. CFM-3* 4. CFM-2 5. CFM-3 6. Uni |
Interim Conclusion | Uni model discarded due to worst model fit indices in all five samples. |
(2) Factor loadings | BI Ext: non-significant factor loadings (S04/S05) | BI Ext: non-significant (S05/S06) / negative factor loadings (S04) | BI Ext: non-significant (S05) / negative (S04) factor loadings | BI Ext: non-significant (S05) / negative (S04) factor loadings | BI Ext/BI Ext S-1: All ODD/CD factor loadings anomalous (non-significant/negative/>1). BI Ext S-1* did not converge |
Interim Conclusion | BI Ext model discarded due to negative/non-significant factor loadings in all give samples. |
Comparison of model fit CFM-2 vs. CFM-3b | CFM-3 better than CFM-2 (p < .001) | CFM-3 better than CFM-2 (p < .001) | CFM-3 better than CFM-2 (p < .001) | CFM-3 better than CFM-2 (p < .001) | CFM-3 not better than CFM-2 (p = .101) |
Interim Conclusion | CFM-2 excluded due to superiority of CFM-3 in four out of five samples. |
Comparison of model fit CFM-3 vs. BI Ext S-1b | BI Ext S-1 better than CFM-3 (p < .001) | BI Ext S-1 better than CFM-3 (p = .003) | BI Ext S-1 better than CFM-3 (p = .034) | BI Ext S-1 better than CFM-3 (p = .002) | BI Ext S-1 better than CFM-3 (p < .001) |
Comparison of model fit CFM-3* vs. BI Ext S-1*b | BI Ext S-1* not better than CFM-3* (p = .668) | BI Ext S-1* better than CFM-3* (p = .044) | BI Ext S-1* not better than CFM-3* (p = .213) | BI Ext S-1* better than CFM-3* (p < .001) | BI Ext S-1* better than CFM-3* (p < .001) |
(3) Omega-Statistics | Unstable s-factors for all bifactor models | Unstable s-factors for all bifactor models | Unstable s-factors for all bifactor models | Unstable s-factors for all bifactor models | BI Ext/BI Ext S-1: s-factors not stable/ interpretable due to factor loadings > 1. BI Ext S-1* did not converge |
(4) Parsimony | CFM-3/CFM-3* moreparsimoniousthan BI Ext S-1/BI Ext S-1* | CFM-3/CFM-3* moreparsimonious than BI Ext S-1/BI Ext S-1* | CFM-3/CFM-3* more parsimonious than BI Ext S-1/BI Ext S-1* | CFM-3/CFM-3* more parsimonious than BI Ext S-1/BI Ext S-1* | CFM-3/CFM-3* more parsimonious than BI Ext S-1 |
Chosen models | CFM-3/CFM-3* | CFM-3/CFM-3* | CFM-3/CFM-3* | CFM-3/CFM-3* | CFM-3/CFM-3* |
First, we calculated prominent goodness-of-fit indices (CFI, TLI, RMSEA, and SRMR) to evaluate which of the tested models demonstrated the best model fit in each sample (Table S1). The unidimensional model had the worst model fit in all five samples and was therefore considered inadequate (CFI or TLI < 0.95 or RMSEA or SRMR > 0.08). One exception concerned the SBB-Clinical Sample, in which model fit was considered adequate to good (CFI and TLI ≥ 0.95, RMSEA and SRMR ≤ 0.08), while still showing the worst model fit of all five models tested. The CFM-2 showed a good model fit in the two samples SBB-Clinical and FBB-P-Community, an adequate model fit in the samples FBB-P-Clinical sample and an inadequate fit in the SBB-Community sample (CFI and TLI < 0.95) and the FBB-T-Special Needs School (RMSEA = 0.084). Correlations between the dimensions ADHD and ODD/CD were high and significant (
r = .74 − .81, all
p < .001). The CFM-3 showed good model fit in all samples, solely the RMSEA in the FBB-T-Special Needs School sample was slightly above the recommended cut-off of 0.080 (RMSEA = 0.087). All correlations in all samples were high and significant (ODD-ADHD:
r = .75 − .86, CD-ADHD:
r = .55 − .75, CD-ODD:
r = .69 − .97, all
p < .001). The CFM-3* model showed good model fit in all samples; only the TLI in the SBB-Community sample was slightly below the recommended cut-off of 0.950 (TLI = 0.944). Correlations in all samples were high and significant (ADHD-AGG:
r = .75 − .85, ADHD-RB:
r = .53 − .68, AGG-RB:
r = .74 − .85). The traditional bifactor model BI Ext was (one of) the best models in terms of model fit in all samples (Table
2). Similarly, the BI Ext S-1 and the BI Ext S-1* models showed good model fit (CFI and TLI ≥ 0.95, RMSEA and SRMR ≤ 0.05) in most samples. In the SBB-Community sample, the BI Ext S-1* was in an adequate range (TLI = 0.935; RMSEA = 0.066). In the FBB-T-Special Needs School sample, the model fit for the BI Ext S-1 was in an adequate range, with an RMSEA = 0.055. The BI Ext S-1* model was not identified in this sample. In the BI Ext S-1 and the BI Ext S-1* model, we did not find any significant residual correlations between ADHD and CD or between ADHD and RB, respectively. In addition to the goodness-of-fit indices, we calculated the sample-size adjusted BIC (Table S3). In all samples except for the FBB-T-Special Needs School sample, we found the bifactor models and the CFM-3 / CFM-3* models to be among the best models, while the exact order of these three models differed between samples (Table
2). The unidimensional model was the worst model according to BIC values in all five samples. Therefore, we decided to discard the unidimensional model after applying the first criterion
model fit indices.
Second, we took a closer look at the
factor loadings and found for the BI Ext S-1 model in the FBB-T-Special Needs School sample anomalous factor loadings > 1 for the item S07
Lies/Steals (Table S2). By definition, this is impossible, since factor loadings range between 0 and 1. Hence, it did not seem reasonable to interpret omega statistics in the BI Ext S-1 model within the FBB-T-Special Needs School sample. Since the BI Ext S-1* model did not converge in the FBB-T-Special Needs School sample, it did not seem reasonable to further interpret this model in this specific sample. Regarding the BI Ext model, we found anomalous factor loadings, such as negative and/or non-significant factor loadings and/or factor loadings > 1 in all five samples. Such statistical anomalies complicate a meaningful interpretation in terms of content. Therefore, we decided to exclude the BI Ext model in our model selection process. We then performed likelihood ratio tests to compare the goodness of fit between the CFM-2 and the CFM-3 models across all samples (Table S6). We found that the CFM-3 model was superior to the CFM-2 model in all samples (
p < .001), except for the FBB-T Special Needs School sample (
p = .101). Therefore, we decided to exclude the CFM-2 model due to the superiority of the CFM-3 model in our selection process (Table
2). Then, we compared the CFM-3 and the BI Ext S-1 models using likelihood ratio tests (Table S6). We found that the BI Ext S-1 model was superior to the CFM-3 model in all five samples (
p ≤ .001 ≤ .034). When comparing the alternative CFM-3* and the BI Ext S-1* models using likelihood ratio tests, we found that the BI Ext S-1* was superior to the CFM-3* model in the three samples SBB-Clinical (
p = .044), SBB-Community (
p < .001), and FBB-T Special Needs School (
p < .001), but not in the FBB-P-Clinical (
p = .668) and the FBB-P-Community (
p = .213) samples (Table
2).
Third, we calculated
omega statistics to evaluate the model-based reliability, or, in other words, the stability of the s-factors. In our samples, the omega statistics revealed that in the BI Ext S-1 / BI Ext S-1* models, none of the s-factors in any of the samples remained a stable independent component, as the ωHS for all s-factors remained below the recommended cut-off of 0.50 (Reise et al.,
2013). Omega statistics for the FBB-T-Special Needs School sample were not computed, as their interpretation would have been flawed due to anomalous factor loadings. The same held true for the ECV in this sample. The ECV (Table S2) supported the finding that all s-factors in the BI Ext S-1, and the BI Ext S-1* model were weakly defined, as the ECV was small for all s-factors (ADHD: ECV = 0.11 − 0.16, CD: ECV = 0.11 − 0.25, RB: ECV = 0.09 − 0.17) compared to the g-factor (ECV = 0.63 − 0.79). Exploratory analyses for the BI Ext models also showed weak factors and low ECV values (Table S2), reinforcing our decision to exclude the BI Ext model from our selection process.
Fourth, our final criterion model parsimony favored the CFM-3 / CFM-3* models over all bifactor S-1 models, as models with fewer restrictions are generally favorable.
Considering all four criteria, we discarded models step by step, which culminated in the same models capturing the data best in all five samples – the CFM-3 and the CFM-3* model (Table
2). Overall, model fits were adequate to good; solely the RMSEA (0.087) for the CFM-3 model in the FBB-T-Special Needs School sample and the TLI (0.944) for the CFM-3* model in the SBB-Community sample were slightly above the recommended cut-off. All factor loadings of the two models across all samples were significant, with low standard errors (Tables S4, S5). The factor loadings for items S01 – S08 were all high, ranging from 0.63 − 0.97 for the CFM-3 and from 0.66 to 0.99 for the CFM-3* model. The factor loadings for the item S09
Skips school were considerably lower in the CFM-3 (0.24 − 0.67) and CFM-3* (0.27 − 0.69) models than the other items in most samples but remained significant. We conclude that both the ICD/DSM-based CFM-3 (ADHD, ODD, CD) model and the developmental theory-based CFM-3* (ADHD, AGG, RB) model provided a sound view of externalizing dimensions.
Discussion
The aim of this study was to refine the knowledge about the structure underlying externalizing dimensions. For this purpose, we analyzed items from a screening instrument assessing these symptoms across five large samples, which differed with respect to sample setting (clinical, community, special needs school) and source (i.e. parents, teachers, self-ratings).
A first conclusion drawn from our analyses is that a separation into different dimensions appears to be justifiable, as the unidimensional model was rejected in all samples. The study by Beauchaine et al., (
2010) supports this finding, as the authors stated that in addition to inherent impulsivity, which underlies ADHD, ODD, and CD and often leads to an early presence of ADHD symptomatology, the co-occurrence of ODD and CD additionally depends on environmental influences. As the CFM-3 model yielded better results regarding global model and likelihood ratio tests fit than did the CFM-2 model in four out of our five samples, the separation of ODD and CD was supported. In this ICD/DSM-based CFM-3 model, all items except for the item S09
Skips school showed high factor loadings on their respective first-order factor. Following an developmental theory-based perspective, we also specified a model with three correlated factors (CFM-3*; ADHD, AGG, RB), thereby describing disruptive behaviors as aggressive and rule-breaking problems (Burt,
2012; Burt et al.,
2015). This CFM-3* model demonstrated overall good model fit and high factor correlations across samples (AGG-RB:
r = .74 − .85). These correlations are somewhat higher than those reported in the factor analytic literature (cf. Burt,
2012). One explanation for this differing finding might lie in the different informants considered (Burt,
2012).
Due to the high correlations between all first-order factors in all samples in the CFM-3 model, we specified a traditional bifactor model. For this model, we had to combine ODD and CD, as the model was otherwise not identified, with solely two indicators loading on the ODD factor. As the two-factor solution was also adequate, we did not expect this to be a major problem, however this has to be pointed out as a limitation as it impedes the comparison with the other models. The model fit for the traditional bifactor model was superior. This in itself should not be overinterpreted, as the fit of traditional bifactor models is generally superior to the fit of first-order correlated factor models given that more free parameters are estimated, making the model less restrictive (Bonifay et al.,
2017). However, upon closer examination of the factor loadings, it became apparent that anomalous factor loadings, such as negative and non-significant factor loadings, were present in all five samples and we even observed an unreasonable factor loading > 1 in the FBB-T-Special Needs School sample, which further exacerbated the interpretation. These very small or even negative factor loadings do not concern the ADHD dimension, but only the ODD/CD dimension in all samples. Such statistical anomalies related to traditional bifactor models are consistent with methodological concerns (Eid et al.,
2017) and empirical studies demonstrating a variety of anomalous results associated with the application of traditional bifactor models to externalizing symptoms (Arias et al.,
2018; Burns et al.,
2020a; Rodenacker et al.,
2018; Thöne et al.,
2021). Anomalous loadings might be a result of the generally mistaken assumption of interchangeability of domains, which is a statistical prerequisite for accurately applying a bifactor model (Heinrich et al.,
2021). However, domains in psychopathology are most likely structurally different (Eid et al.,
2017; Heinrich et al.,
2021). The structural differences between domains are apparent when examining the correlations between subdomains in the CFM-3 / CFM-3* models. For interchangeability, these correlations would have to be equal, which is not the case. Anomalous results, which we observed in the FBB-T-Special Needs School sample for the item
Lies/Steals (S07), are a major problem for interpreting the respective model, since test statistics cannot reliably be interpreted. Common reasons for these anomalous results are the extraction of too many factors, a small sample size, small variability in an indicator or a misspecification of the model, potentially through adding or omitting paths and/or posing restrictions based on non-conclusive assumptions (Chen et al.,
2001). As our sample size was quite large (
n = 755), and we only extracted three factors with at least three indicators per factor, which was both identical (i.e., same factorial configuration) or at least similar to the other samples, we do not believe this to be the reason for these anomalous results. Descriptive statistics regarding the variance and skewness of item S07 in the FBB-Special Needs School sample were unremarkable and similar to the other items, eliminating this consideration as a potential cause for these anomalous results. A potential misspecification of the model in this particular case cannot fully be excluded, however, we found item S07 to significantly and highly load on its respective factor in the CFM-3, suggesting that the allocation to the CD factor was justified. Excluding item S07 from the model led to an elimination of the statistical anomalies in both models (BI Ext / BI Ext S-1), however, based on descriptive statistics and as this item is a symptom criterion of CD in the DSM-5, we did not see enough grounds for such a radical decision. Therefore, despite encountering this problem in our bifactor models, we proceeded to analyze further important details. To analyze a bifactor model in a statistically sound manner, it is important to consider additional bifactor-specific indices beyond the global model fit indices (Rodriguez et al.,
2016), especially since bifactor models tend to overfit. Important bifactor-specific indices include, but are not limited to, the omega statistics and ECV. In the FBB-T-Special Needs School sample, it was not possible to calculate the omega statistics and the ECV, as the anomalous factor loading > 1 on item S07 would have led to a misinterpretation of the results. For the other four models, we computed omega statistics and ECV. As expected from the small and/or negative factor loadings on the ODD/CD dimension, this s-factor vanished in all four remaining samples, explaining little remaining variance beyond the variance already explained by the g-factor. The s-factor ADHD was more strongly defined than the ODD/CD one, but was not strong enough to be considered a reliable s-factor (Reise,
2012). The g-factor explains more variance than the s-factors do for almost all items in all samples, with very few sample-specific exceptions. The ECV for the g-factor in all samples lay between 0.62 for the SBB-Community sample and 0.77 for the FBB-P-Community sample. According to Rodriguez et al., (
2016), an ECV for the g-factor > 0.80 supports the idea of one-dimensionality. In our bifactor models, the ECV for the g-factor was below this cut-off for all samples, and additionally, we found the ECV especially for the ADHD s-factor to be quite high (0.33 − 0.44). Despite the superior model fit of the bifactor model and ECV values pointing at potential multidimensionality, we decided to discard this model, as it appears that the remaining s-factor could not be interpreted in a stable, reliable manner. Anomalous factor loadings additionally exacerbate a straightforward interpretation of g- and s-factors. One likely reason for the unstable s-factors is that the g-factor already explains a huge proportion of the common variance, leaving little variance to be explained for the individual s-factors. The strength of the g-factor here is supported by the very high correlation between the factors ADHD and ODD/CD in the CFM-2, ranging from 0.74 to 0.81, and the ωH values between 0.73 and 0.84.
The strong g-factor we found could suggest one-dimensionality, but as already discussed, the unidimensional model did not fit the data well and the ECV values for the s-factors in the traditional bifactor model suggest that the s-factors still account for a significant part of reliable variance after partialling out the influence of the g-factor. In order to keep the concept of s-factors that are all associated with each other, likely through some sort of common factor, we specified two bifactor S-1 models. As opposed to a traditional bifactor model, an S-1 model considers structural differences among domains and does not assume interchangeability (Heinrich et al.,
2021). In the ICD/DSM-based BI Ext S-1 model, we were able to keep the separation between ODD and CD. As we were interested in the relation of ADHD and CD with ODD, and since we observed the highest factor loadings on the ODD items in our BI Ext models across all samples, we specified ODD as the general reference domain, with the ADHD and CD dimension being orthogonal (uncorrelated) to it. Theoretically supporting the choice of our reference factor, ODD symptoms have been suggested as middlemen connecting impulsivity (one factor of ADHD) and CD (Beauchaine,
2015). We therefore regarded it as a domain of special interest, which according to Eid (
2020) is a valid base for this selection. For the ODD dimension, no s-factor was modeled. Analogously, the AGG dimension was specified as the reference factor in the developmental theory-based Bi Ext S-1* model. When applying a bifactor S-1 model, one has to dismiss the idea of identifying a common overarching factor; however, as pointed out by Heinrich et al., (
2021), it frequently remains unclear what this factor even really stands for. Due to the
a priori definition of the general reference factor, a bifactor S-1 model allows for a straightforward interpretation of all factors, including their relation to one another. This is supported by the fact that bifactor S-1 models generally avoid anomalous results (Heinrich et al.,
2021). In our case, we found a strong reduction of anomalous factor loadings for the BI Ext S-1 compared to the BI Ext model, but the factor loading > 1 in the FBB-T-Special Needs School sample on item S07 remained, which did not allow for a closer examination of this model in this specific sample. The BI Ext S-1* model did not converge in the FBB-T-Special Needs School sample, further indicating problems in this specific sample. In all other samples, the significant positive factor loadings allowed for a straightforward interpretation of all factors for both bifactor S-1 models. However, the s-factors of the BI Ext S-1 model still accounted for little reliable variance after accounting for the variance explained by the general reference factor ODD. This shows that the two ODD items already explain so much variance in the ADHD and the CD domains that not enough variance remained for these s-factors to explain to be considered stable. The ECV values in this model were similar to the BI Ext model and ranged between 0.63 and 0.77, again remaining just under the recommended cut-off of 0.80 proposed by Rodriguez et al., (
2016). A similar pattern emerged for the BI Ext S-1* model, although the differentiation between AGG and RB might depict a more coherent view than the ICD/DSM-based perspective of ODD and CD (cf. Burt,
2012).
When comparing our factor models, we found both the ICD/DSM-based CFM-3 model and the developmental theory-based CFM-3* model to fit the data best in all samples. This was also supported using likelihood ratio tests and calculating BIC values to compare model fit, as the CFM-3 / CFM-3* models were superior to the Uni model and the CFM-2 model in all samples (with the only exception involving the FBB-T-Special Needs School sample, where the CFM-2 was equally as good as the CFM-3 model). The specific factors in both the traditional bifactor model and in the two bifactor S-1 models explained too little variance to be interpreted in an appropriately useful manner. In addition, the CFM-3 / CFM-3* models explain the data in a more parsimonious way than all bifactor models. We therefore found both the CFM-3 / CFM-3* to be the models representing our data best, although the CFM-3* model with the AGG and RB dimensions may offer a more accurate representation of disruptive behavior problems (Burt,
2012; Burt et al.,
2015). However, it has to be pointed out that the high correlations between the different factors, in combination with the strong g-factor in both bifactor models, suggest unmodeled commonality. These high correlations correspond to the frequently found comorbidities of ADHD, ODD, and CD (Beauchaine et al.,
2010; Willcutt et al.,
2012). Although proposing a general externalizing psychopathology factor (or a
p-factor in general) using bifactor modeling may be tempting, theoretical and statistical considerations, previous work, and the present study show that bifactor models may be unsuitable for this. It remains an open question, how the associations between these three frequently co-occurring diagnoses can be modelled in the most ideal way, i.e. to capture all specific aspects of each symptom complex but at the same time avoid an excessive number of comorbid diagnoses.
In a final step, we evaluated our CFM-3 / CFM-3* models’ measurement invariance. When testing for measurement invariance across all five samples, scalar invariance could not be supported. However, scalar invariance was confirmed across sources (parents vs. self-ratings), demonstrating that despite the frequently reported cross-informant discrepancies (De Los Reyes et al.,
2015; Los Reyes et al.,
2019), the structure of the externalizing spectrum itself is invariant across sources. Furthermore, there was only minor evidence of scalar non-invariance across sample setting (community vs. clinical sample), suggesting that the samples differ with regard to symptom severity on one or more variables. Finally, we found that scalar invariance was supported across sex (males vs. females). These findings are in line with previous studies (Lee et al.,
2016; Rodenacker et al.,
2016,
2018) and provide support for the validity of the symptom dimensions across males and females.
The present study may have implications for model selection when examining the associations between psychological dimensions. Bifactor models of psychopathology have become increasingly popular in recent years and are often applied to search for general factors in psychopathology (Caspi & Moffitt,
2018), despite the statistical and interpretational difficulties outlined above. While our traditional bifactor model demonstrated good model fit, we nevertheless decided to discard it after carefully evaluating additional statistical indices. This finding is in line with previous observations that researchers often mistakenly regard their bifactor model as superior and fail to take into account further statistical indices (Arias et al.,
2018; Watts et al.,
2019). Although our results are limited to the externalizing spectrum in children, our approach of discarding factor models step by step according to several criteria may be adopted in the
p-factor literature as well. Here, similar statistical and interpretational difficulties become apparent, calling a g-factor of a traditional bifactor model into question as the proper candidate for a
p-factor (Levin-Aspenson et al.,
2021; Watts et al.,
2019).
Our findings may also have implications regarding categorical and dimensional perspectives on mental disorders. From an ICD/DSM perspective, mental disorders are conceptualized as categorical concepts, although there is little evidence that the underlying structure of mental disorders is, in fact, categorical in nature (Achenbach,
2020). Instead, empirical research points towards a dimensional perspective (Achenbach,
2020). More specifically, the two dimensions AGG and RB may offer more nuanced indications of clinical significance than the ODD / CD diagnoses (Burt,
2012). For example, the AGG and RB dimensions may outperform CD diagnoses when predicting adult symptoms of antisocial personality disorder (Burt et al.,
2011). The results from our research show that both correlated factor models from an ICD/DSM-based and developmental theory-based perspective provide a sound view of externalizing dimensions. At this point, therefore, we cannot conclude whether one perspective is truly superior to the other. Below, we provide suggestions for answering these questions unambiguously in the future.