Introduction
It has been suggested that the structure of psychiatric phenomena can be reduced to a few dimensions of symptoms. For example, Kendler et al. [
1] assessed a cohort of more than 5600 adult twins from a population-based registry using DSM diagnostic criteria and concluded that genetic risk factors predispose to two broad groups of internalizing and externalizing disorders. In another instance, Wright et al. [
2] compared the fit of categorical, continuous, and hybrid (i.e., combined categorical and continuous) models of syndromes in an adult epidemiological sample (N = 8841), assessed via structured clinical interviews, and found that the best fitting higher-order model of these syndromes grouped them into three broad spectra: internalizing, externalizing, and psychotic experiences. In turn, Lahey et al. [
3] examined the structure of psychopathology—also assessed using structured interviews—in an epidemiological sample of individuals aged 18–65 and found that a bi-factor model in which every mental disorder loaded in a general factor, in addition to the externalizing, distress, and fears factors, presented the best fit for the data. That is, whereas each of the three group factors (externalizing, distress, and fears) accounted well for the correlations among the specific mental disorders that loaded most strongly on those factors, a general dimension captured what all the examined disorders shared in common. Similarly, Caspi and colleagues [
4] explored the structure of psychopathology in a longitudinal study that repeatedly assessed individuals from a birth cohort at 18, 21, 26, 32, and 38 years of age, and concluded that psychiatric disorders (assessed via structured interviews) could be explained by three higher-order factors (internalizing, externalizing, and thought disorders), but also found evidence supporting a bi-factor model with one general overriding dimension—termed the
“p factor”—that captured individuals’ propensity to develop any and all forms of common psychopathologies over and above individual dimensions for each psychiatric disorder, and the three higher-order factors.
Studies in epidemiological samples of children and adolescents have yielded similar results. For example, in a large population-based sample of young people (N = 18,222), Goodman, Lamping, and Ploubidis [
5] used confirmatory factor analysis (CFA) to compare the relative fit of three alternative factor structures for the Strengths and Difficulties Questionnaire (SDQ) [
6,
7]. The authors concluded that a second-order model with internalizing and externalizing factors (along with a prosocial subscale factor) fitted the data better than a first-order model with the five hypothesized SDQ subscales. These two models had a substantially better fit than the third alternative: a first-order model with internalizing, externalizing, and prosocial factors. These results replicated the classical internalizing-externalizing approach to the structure of child psychopathology [
8]. However, Goodman et al. [
5] discussed that, while this factorial solution is probably justified in low-risk samples, a five-factor model might be more appropriate in high-risk populations and clinical samples. Yet, this remains to be tested.
It is argued that “lumping” of psychiatric symptoms into broader dimensions can generate models with several advantages. First, they seem to conform to the genetic architecture of psychiatric symptoms given that individual and aggregate molecular genetic risk factors have been found to be shared among a range of psychiatric disorders that are treated as distinct categories in clinical practice [
9]. Second, such a parsimonious model of psychopathology could account for the high rates of comorbidity observed among individuals with mental disorders [
10]. However, the above-mentioned findings are based on epidemiological samples which, while offering the benefit of being unbiased by referral practices and being generally larger than clinical samples, may not reflect what is seen in clinical practice.
Indeed, in the rare instance where psychiatric symptoms have been analyzed in clinical samples, it has not been possible to group the data in such a reduced number of dimensions. For example, in adults, Kotov et al. [
11] found that the best-fitting model for their sample of 2900 outpatients seeking psychiatric treatment was a five-factor solution, including internalizing, externalizing, thought disorder, somatoform, and antagonism dimensions, which fit the data better than a seven-factor model based on the DSM-IV, an internalizing-externalizing model, a three-factor model with an additional somatoform dimension, and an alternative four-factor model which included the previous three dimensions and additionally placed the psychosis, manic episode, and cluster A traits from the internalizing group into a thought disorder dimension. In a sample of German children and adolescents, Becker et al. [
12,
13] subjected the items of the SDQ to CFA, and demonstrated a good fit of the original five-factor model both for the parent (N = 543) and the self-reported (N = 214) measure. Interestingly, they also employed exploratory factor analysis (EFA) and the results highly converged with the five original SDQ subscales. The differing evidence between epidemiological and the few studied clinical samples is a clear example of how factorial structure is determined by the type of sample (i.e., the number of ill individuals that these contain and the types of problems that they have) [
14], and in this specific case, it suggests that less parsimonious dimensional structures may better reflect the reality of individuals with psychiatric disorders.
In fact, models based on “splitting”—rather than lumping—also have considerable support, particularly in relation to clinical variables. For example, whilst genetic etiology may be largely shared between various anxiety disorders, their distinction may be important in relation to family history, neurobiology, and treatment response [
15]. Perhaps even more strikingly, whilst many of the most parsimonious models of psychopathology would consider hyperactivity and conduct problems or irritability as a joint entity, their distinction has key implications for treatment and course. Attention-deficit/hyperactivity disorder (ADHD) symptoms do not respond to parenting interventions [
16], whilst conduct and oppositional problems do [
16,
17]; conversely, stimulants show large effect sizes particularly for hyperactivity, impulsivity, and inattention, yet less so for irritability and related behaviors [
18,
19]. It is therefore crucial to examine whether the structure of psychopathology found in epidemiological samples applies to clinical samples.
Additionally, it remains a matter of debate whether distinct structures of symptoms can have an impact in the prediction of psychiatric outcomes, and whether this is influenced by the type of sample. Results from epidemiological samples have yielded a moderate to high level of agreement between SDQ-generated diagnoses and corresponding clinical diagnoses [
20,
21]. More recently, Goodman et al. [
5] found that a second-order structure of the SDQ with internalizing, externalizing, and prosocial factors showed clear convergent and discriminant validity when predicting clinical disorders even at the lowest SDQ scores. By contrast, the five SDQ subscales only showed convergent and discriminant validity in children with high scores on those subscales, especially for behavioral and hyperactivity problems. These findings would also support the hypotheses that less parsimonious models account for symptoms in clinical samples.
Results of predictive validity in clinical samples are mixed. Becker and colleagues [
12,
13] showed that the total difficulties score of the SDQ was a good predictor of any axis I diagnosis; furthermore, they found that the subscales of the SDQ predicted well their matching diagnostic categories in their clinical sample. However, in a more recent study by Brøndbo et al. [
22] on a Norwegian clinical sample, the SDQ was considered insufficient for clinical purposes. The authors also concluded that the SDQ was better in detecting the presence of “any diagnosis” rather than more specific ones and, conversely, was better at ruling out specific diagnoses rather than “any diagnosis.” Given these results, larger clinical samples might be needed to ascertain the predictive value of distinct symptom structures.
While previous studies have examined a first-order five-factor model in young clinical samples [
12,
13], no studies to date have offered a comparison of alternative models described in the literature. In addition, no studies in clinical samples have provided with a cross-country validation of these models, hence limiting the generalizability of previous results. Moreover, while several studies have examined the prediction of psychiatric disorders using different symptoms dimensions, no studies have tested whether these predictions hold when these disorders co-occur. The current study tries to fill these gaps in the literature with the following three aims.
First, using CFA in two independent clinical samples from England and Norway, we examine the relative fit of key alternative models using the SDQ, which is one of the most widely-used instruments to measure child and adolescent psychopathology worldwide [e.g.,
23,
24]. For comparability, we test the models that have been comprehensively tested before in large epidemiological samples [
3‐
5]. These include a first-order five-factor model, a second-order model with the widely-established broad symptom dimensions of internalizing-externalizing, and two bi-factor models capturing a general psychopathology factor.
Second, as we employ two large samples from different countries, we examine the measurement invariance of the best fitting model across countries to see whether the same structure is generalizable. This is particularly relevant since differences have been found in the presentation of psychopathological symptoms between these countries [
25].
Third, we test the external validity of the dimensions. In particular, we test whether each dimension of symptoms—either first- or second-order dimensions—specifically links with psychiatric disorders. Finally, given that comorbidity is typical, we test whether psychopathological symptoms are differently distributed in participants with distinct comorbidities across both samples.
Discussion
In this study, we used data from 14,209 children and adolescents from clinical settings (8343 from England and 5866 from Norway) to explore the symptom structure of their psychiatric symptoms using a widely used psychometric tool, the SDQ. Results showed that a five-factor structure presented the best fit for the data in both samples and was superior to second order factor models with additional ‘internalizing’ and ‘externalizing’ factors, as well as to two bi-factor models that accounted for a general factor. This finding contrasts with those of previous epidemiological studies [e.g.,
5], suggesting that psychiatric disorders present with unique phenotypic characteristics that need to be taken into account, and that a too simplified approach may not be appropriate when dealing with patients in real-world settings. As reported by Goodman et al. [
5], discriminating symptom clusters may be easier when focusing on children with more severe mental health problems (i.e., a clinical population) and this differentiation may be more difficult to establish when levels of psychopathology are low (like in epidemiological samples). It is possible that the expression, perception, and report of symptoms in low risk samples might be unspecific and blurred, whereas in high-risk samples the specificity of symptoms for each disorder increases. Such a pattern would inevitably influence the factorial structure of symptoms. Additionally, epidemiological studies have generally missed out participants suffering from serious mental health problems or otherwise disadvantaged [
56]. In our study, mean SDQ Total Difficulties scores ranged between 16.3 and 18.3. In contrast, mean scores in epidemiological studies never reach these levels of severity, with average scores ranging around 7.5–11.0 points [e.g.,
57,
58].
It could be argued that our findings, which contrast with those obtained in epidemiological studies, are due to referral biases in our samples. However, even if such biases were operating, it would not take away from the fact that a substantial proportion of severely impaired young people—those who attend clinics—show a structure of psychopathological symptoms that is different from that observed in epidemiological studies. Moreover, it would be expected that at least some of the referral biases would be different between England and Norway—yet, we demonstrate strict factorial invariance across the two samples. This becomes especially relevant when taking into account that the two countries under study have shown differences in the presentation of their psychiatric symptoms, also measured by the SDQ [e.g.,
25,
59]. MI analyses also showed that, at least in clinical samples, psychiatric symptoms cluster in these five factors in boys and girls of different ages. Our results suggest that, given a certain level of severity, groups of symptoms might be already defined from early stages in the development with no distinctions across genders. This would be in line with previous research showing that psychopathology appears to be differentiated among younger children as much as it is among older children [
60]. Interestingly, in a recent study across five European countries using an epidemiological sample of adolescents (N = 3012) which also used the SDQ [
57], MI across countries was only partial (11 items out of 25 were invariant), suggesting that some items should be considered carefully when using across countries. However, this study used the self-reported version of the questionnaire and participants were older than those in our sample (mean age = 14.20). Authors point out that the developmental changes that occur in adolescence could be different depending on factors such as the geographical area, the culture, and the meaning of the items or the language.
A more granular approach to psychiatric symptoms, where several dimensions of symptoms are considered, also shows to be helpful when predicting psychopathology. In keeping with our hypotheses, the split between SDQ subscales allowed for distinctions between disorders, even when other disorders were also present. This finding is relevant for etiological studies. Twin studies and risk-factor studies have shown that there are substantial phenotypic correlations among pairs of psychiatric disorders that are influenced by the same genetic factors [e.g.,
61,
62]. This indicates that causes of the disorders may be similar and seems to encourage a transdiagnostic approach to psychiatric disorders [
4]. Additionally, this more parsimonious approach may be helpful when looking at low-risk samples [
5] or when studying correlates, neural mechanisms, outcomes that are common across mental disorders [
3], and factors of resilience to psychiatric disorder [
33]. However, in clinical samples and when looking for models that can inform clinical predictions and treatment choices, a model considering a broader range of symptom dimensions could be more accommodating. The differences in treatment response across seemingly related behaviors/symptoms [
16‐
19] suggest that, in order to understand the genetic or neural substrates of psychopathology, we should probably use more specific models which split patterns of symptoms. Clinicians can benefit from this approach to more sensitively screen patients referred to mental health services. An instrument as short as the SDQ seems to be helpful in making distinctions provided its multi-dimensional structure is retained. However, it is important to note that these multiple dimensions are still grouping together a number of conditions (e.g., the emotional scale of the SDQ may contain a range of anxiety and depressive disorders that may require different treatment approaches) and, hence, an in depth assessment that takes into account the specificity of psychopathology at the diagnostic level is still preferred if time and resources allow.
The results of this study need to be considered in light of its limitations. First, we only used parent-reported measures in our analyses. However, those are typically collected in pediatric populations, especially in younger children. Second, the structure of psychopathology may be, to some extent, influenced by the type of instrument used. Hence, we cannot rule out the possibility that different instruments could lead to different results. Studies in adult studies have used different assessment methods and, hence, a head to head comparison between pediatric and adult samples may be difficult. However, the pediatric studies with which we are comparing our results have also used the SDQ. Third, each section of the DAWBA uses skip-rules, one component of which is in some occasions the relevant SDQ subscale. For example, in the hyperactivity disorder section, parents positively reporting ‘some problems with hyperactivity or poor concentration’
or an SDQ hyperactivity score ≥ 6 for their child, will continue responding items in the section; otherwise, they will be directed to the next section. Therefore, predicting DAWBA diagnoses using the SDQ subscale scores might be somewhat circular. However, in order to test to what extent circularity would have affected our results, we performed additional analyses (see Supplemental Material) using the Avon longitudinal study of parents and children (ALSPAC) sample [
63], where SDQ skip rules were not employed to define DAWBA diagnoses. The results of these analyses showed that using SDQ skip rules to define diagnoses—against not using these rules—did not modify significantly the predictions. Most importantly, the specificity of predictions between SDQ subscales and relevant diagnoses was clear in both approaches.