Why the factorial structure of the SCL-90-R is unstable: Comparing patient groups with different levels of psychological distress using Mokken Scale Analysis

doi:10.1016/j.psychres.2012.03.012

Psychiatry Research

Volume 200, Issues 2–3, 30 December 2012, Pages 819-826

https://doi.org/10.1016/j.psychres.2012.03.012 Get rights and content

Abstract

Since its introduction, there has been a debate about the validity of the factorial structure of the SCL-90-R. In this study we investigate whether the lack of agreement with respect to the dimensionality can be partly explained by important variables that might differ between samples such as level of psychological distress, the variance of the SCL-90-R scores and sex. Three samples were included: a sample of severely psychiatrically disturbed patients (n=3078), a sample of persons with Gender Incongruence (GI; n=410) and a sample of depressed patients (n=223). A unidimensional pattern of findings were found for the GI sample. For the severely disturbed and depressed sample, a multidimensional pattern was found. In the depressed sample sex differences were found in dimensionality: we found a unidimensional pattern for the females, and a multidimensional one for the males. Our analyses suggest that previously reported conflicting findings with regard to the dimensional structure of the SCL-90-R may be due to at least two factors: (a) level of self-reported distress, and (b) sex. Subscale scores should be used with care in patient groups with low self-reported level of distress.

Introduction

The Symptom Checklist-90-Revised (SCL-90-R) (Derogatis, 1994) was designed to cover nine different dimensions of psychological distress; the mean item score across all 90 items with theoretical values ranging from 0 through 4 is referred to as the Global Severity Index (GSI), which is widely used as a global index for psychological distress. Since the introduction of the SCL-90(-R), there has been a debate about the validity of the factorial structure, which was aptly expressed in the title of the paper ‘Factor structure of the SCL-90-R: is there one?’ (Cyr et al., 1985). More than two decades have passed since the publication of that paper; however, the debate has still not abated, as recent publications have demonstrated (Olsen et al., 2004, Arrindell et al., 2006, Elliott et al., 2006, Hafkenscheid et al., 2007). On one hand, there is a group of researchers that firmly believe in the multidimensionality of the instrument (Arrindell et al., 2004b, Arrindell et al., 2004a, Arrindell et al., 2006), whereas another group has pointed out that alternative models with only one or at most a few factors show an equally good or better fit (Hafkenscheid, 2004, Hafkenscheid et al., 2007). In a recent paper, Paap et al. (2011b) proposed a new scale solution of 7 scales based on a study involving patients referred for a personality disorder (PD); scales were built on two start items that reflected the content of the disorder that corresponded with the specific scale. The new solution included 60 of the 90 items clustered in seven scales: Depression, Agoraphobia, Physical Complaints, Obsessive-Compulsive, Hostility (unchanged), Distrust and Psychoticism. The authors found that most of the new scales discriminated reliably between patients with moderately low scores to moderately high scores. The items forming the GSI showed low scalability, and the authors concluded that their research findings lent support for a multidimensional model of the SCL-90-R. The authors speculated that the lack of agreement between studies might be due to several factors, such as difference in variance, the existence of structure generating factors, differences in the interpretation of the fit indices, and, finally, the chosen analytic strategy (Paap et al., 2011b).

In the current study, we investigate whether the findings in the study by Paap et al. can be generalized to other patient groups by comparing the dimensionality of the PD sample to that of a sample of persons with Gender Incongruence (GI) and a sample of depressed outpatients. The term ‘GI’ signifies the incongruence between one's gender identity on one hand, and one's assigned gender and/or one's congenital primary and secondary sex characteristics on the other hand (Kreukels et al., 2010, Meyer-Bahlburg, 2010).¹ Following Kreukels et al., we use GI when referring to patients who have not yet been diagnosed with GID (APA, 1994) or transsexualism (WHO, 1992). We expect the reported level of psychological distress (estimated by the GSI) to be lower in the GI sample than in the depressed sample and PD sample. Haraldsen and Dahl (2000) showed that patients diagnosed with GID had slightly elevated GSI scores when compared to healthy adults, but did not reach the value of 1.0 which is the cut-off for clinically significant symptoms (GSI_GID=0.6, GSI_controls=0.4). In contrast, depressed outpatients have been found to exceed the cut-off (GSI_DEP=1.4) (Leinonen and Niemi, 2007), and so have the patients in the PD sample used in the study by Paap et al. (GSI_PD=1.5). Our main research questions are:

(1)
Is the dimensionality of the SCL-90-R similar for patient groups that differ in level of reported psychological distress?
(2)
Are the different factorial solutions found in the literature due to a difference in variance in reported psychological distress?

Following Paap et al. (2011b) and Meijer et al. (2011), Mokken Scale Analysis (MSA; Mokken, 1971) was used to analyze the data. MSA is a nonparametric Item Response Theory (IRT) approach that can be used to explore and test hypotheses about the dimensionality of a data-set, while at the same time resulting in scales adhering to a measurement model.

Section snippets

Personality disorder sample: PD_low and PD_high

This sample consisted of 3078 patients admitted to 14 different day hospitals participating in the Norwegian Network of Personality-Focused Treatment Programs (Karterud et al., 1998), treated in the period from January 1993 through July 2007. This sample was also used in the study by Paap et al. (2011b). Sex ratio and age are depicted in Table 1. Seventy-nine percent were diagnosed with at least one personality disorder (PD). Of the PDs, Avoidant PD was most common (39%), followed by Borderline

Missing data: two-way imputation

Less than 1% of the data were missing in each of the data-sets. Following Paap et al. (2011b), we used Two-Way imputation (Bernaards and Sijtsma, 2000), which allows the user to transform an incomplete data-file into a complete one by using all available information about the proficiency of the respondent and the ‘difficulty’ of the item (Sijtsma and van der Ark, 2003). This method is easy to implement using SPSS (SPSS, 2007), using the syntax provided by van Ginkel and van der Ark (2005).

Description of the data

Table

Discussion

Studies reporting on the dimensionality of the SCL-90-R have had very diverse outcomes. To this day, the original 9-scale solution (Derogatis, 1994) remains controversial (Schwarzwald et al., 1991, Holi et al., 1998, Vassend and Skrondal, 1999, Schmitz et al., 2000, Olsen et al., 2004, Arrindell et al., 2006, Elliott et al., 2006, Hafkenscheid et al., 2007, Paap et al., 2011b). Here, we wanted to identify factors that could help explain the inconsistent findings in the literature. The main

Acknowledgments

We thank Jan van Bebber for imputating the missing data, Xi X. Zhao for preparing Fig. 1, and Mitzi Paap, Frøydis Hellem and Thomas Mengshoel for helpful discussions. We thank the patients and staff from the GID clinics in Amsterdam, Oslo, Hamburg and Ghent, as well as from the Department of Neuropsychiatry and Psychosomatic Medicine at Oslo University Hospital for their contribution to this study. Finally, we thank the patients and staff from the following treatment units in the Norwegian

References (42)

C. Armour et al.
Gender differences in the factor structure of posttraumatic stress disorder symptoms in war-exposed adolescents
Journal of Anxiety Disorders
(2011)
M.C.S. Paap et al.
Assessing the utility of diagnostic criteria: a multisite study on gender identity disorder
Journal of Sexual Medicine
(2011)
K. Sijtsma et al.
Mokken scale analysis as time goes by: an update for scaling practitioners
Personality and Individual Differences
(2011)
O. Vassend et al.
The problem of structural indeterminacy in multidimensional symptom report instruments. The case of SCL-90-R.
Behaviour Research and Therapy
(1999)
APA
Diagnostic and Statistical Manual of Mental Disorders (3rd ed., revised) (DSM–III-R)
(1987)
APA
Diagnostic and Statistical Manual of Mental Disorders (4th ed.) (DSM–IV)
(1994)
W.A. Arrindell et al.
Invariance of SCL-90-R dimensions of symptom distress in patients with peri partum pelvic pain (PPPP) syndrome
British Journal of Clinical Psychology
(2006)
W.A. Arrindell et al.
Nog meer steun voor het multidimensionale karakter van de SCL-90-R [Even more support for the multidimensional nature of the SCL-90-R]
De Psycholoog
(2004)
W.A. Arrindell et al.
Verdere steun voor het multidimensionale karakter van de SCL-90-R [Further support for the multidimensional nature of the SCL-90-R]
De Psycholoog
(2004)
C.A. Bernaards et al.
Influence of simple imputation and EM methods on factor analysis when item nonresponse in questionnaire data is nonignorable
Multivariate Behavioral Research
(2000)

J.J. Cyr et al.

Factor structure of the SCL-90-R: is there one?

Journal of Personality Assessment

(1985)

L.R. Derogatis

SCL-90-R: Administration, Scoring and Procedures Manual

(1994)

R. Elliott et al.

Deconstructing therapy outcome measurement with rasch analysis of a measure of general clinical distress: the Symptom Checklist-90-revised

Psychological Assessment

(2006)

A. Hafkenscheid

Hoe multidimensionaal is de Symptom Checklist (SCL-90) nu eigenlijk? [How multidimensional is the Symptom Checklist (SCL-90) really?]

De Psycholoog

(2004)

A. Hafkenscheid et al.

The dimensions of the Dutch SCL-90: more than one, but how many?

Netherlands Journal of Psychology

(2007)

I.R. Haraldsen et al.

Symptom profiles of gender dysphoric patients of transsexual type compared to patients with personality disorders and healthy adults

Acta Psychiatrica Scandinavica

(2000)

M.M. Holi et al.

A Finnish validation study of the SCL-90

Acta Psychiatrica Scandinavica

(1998)

S. Karterud et al.

Day treatment of patients with personality disorders: experiences from a Norwegian treatment research network

Journal of Personality Disorders

(2003)

S. Karterud et al.

The Norwegian network of psychotherapeutic day hospitals

Therapeutic Communities

(1998)

B.P.C. Kreukels et al.

A European network for the investigation of gender incongruence: the ENIGI initiative

European Psychiatry

(2010)

E. Leinonen et al.

The influence of educational information on depressed outpatients treated with escitalopram: a semi-naturalistic study

Nordic Journal of Psychiatry

(2007)

Cited by (28)

The structure of the Symptom Checklist-90-Revised: Global distress, Somatization, Hostility, and Phobic Anxiety scales are reliable and robust across community and clinical samples from four European countries
2024, Psychiatry Research
While the reliability of SCL-90-R subscales is often questioned, five relatively recent European studies have examined the factor structure of SCL-90-R using a bifactor model and concluded that most of these subscales are reliable. However, examination of their results shows that three subscales, Somatization, Hostility, and Phobic Anxiety, consistently had significantly higher reliability than the other six across clinical and community samples recruited in three very different European countries, Greece, Hungary, and the Netherlands. The objective of this study was to examine whether this “top-3″ would be found in a sample from a fourth European country, France. To do this, we had 696 university students (387 women, 56 %) complete the SCL-90-R and we examined the reliability of the scales of this questionnaire by testing a bifactor model using Exploratory Structural Equation Modeling (ESEM). Our results confirmed that, in our sample, the three scales presented a higher reliability than the other six scales. It therefore seems that there exists, at least in the European cultural area, a stable structure of the SCL-90-R comprising a global distress factor and three reliable and robust specific factors: Somatization, Hostility, and Phobic Anxiety.
A reanalysis of the center for epidemiological studies depression scale (CES-D) using non-parametric item response theory
2020, Psychiatry Research
The “Center for Epidemiological Studies Depression Scale” (CES-D; Radloff, 1977) is a questionnaire used world-wide to measure depressive symptoms. Although the original four-factor-structure has been widely accepted and replicated, some studies point to other factor-structures like a one- and two-factor-structure. The goal of the current study was to evaluate the factor structure of the CES-D (one-, two- and four-factor-structure), which was found using classical test theory (CTT), with two non-parametric item-response-theory-models (Mokken-Scaling; Monotone-homogeneity-model; MHM and Double-monotonicity-model; DMM). To this end, a representative German sample was analyzed (N = 2,507). Regarding the one-factor-model, neither the MHM nor the DMM were supported; the two-factor-model and the four-factor-model, however, satisfied the assumptions of the MHM but not of the stricter DMM. Sum scores therefore constitute ordinal scales for the two-factor and four-factor-structure. This justifies, for example, the use of percentile ranks or medians and associated statistical techniques. Especially the use of the 16 negative formulated items seems appealing. Further studies should explore, whether the use of these 16 instead of the original 20 items is of advantage regarding the practical validity of the CES-D (e.g. higher sensitivity and specificity in screening applications and better temporal stability).
PROMIS Physical Function Short Forms Display Item- and Scale-Level Characteristics at Least as Good as the Roland Morris Disability Questionnaire in Patients With Chronic Low Back Pain
2020, Archives of Physical Medicine and Rehabilitation
To compare dimensionality, item-level characteristics, scale-level reliability, and construct validity of PROMIS Physical Function short forms (PROMIS-PF) and 24-item Roland Morris Disability Questionnaire (RMDQ-24) in patients with chronic low back pain (LBP).
Cross-sectional study.
Secondary care center for rehabilitation and rheumatology.
Patients with nonspecific LBP ≥3 months (N=768). Mean age was 49±13 years, 77% were female, and 54% displayed pain for more than 5 years.
Not applicable.
Dutch versions of the 4-, 6-, 8-, 10-, and 20-item PROMIS-PF and of the RMDQ-24.
PROMIS-PF-6, PROMIS-PF-8, and RMDQ-24 exhibited sufficient unidimensionality (confirmatory factor analysis: comparative fit index>0.950, Tucker-Lewis index>0.950, root means square error of approximation<0.060), whereas the other instruments did not. All instruments were free of local dependence except PROMIS-PF-20 with 4 item pairs with clear residual correlations. Mokken scale analysis found 1 nonmonotone item for PROMIS-PF-20 and 8 for RMDQ-24 (ie, the probability of endorsing these items was not increasing with increasing level on the underlying construct). PROMIS-PF-20 displayed 2 misfitting items (S-χ² P value>.001). Two-parameter item response theory models found 2 items with low discrimination for RMDQ-24. All other instruments had adequate fit statistics and item parameters. PROMIS-PF-20 displayed the best scale-level reliability. Construct validity was sufficient for all instruments as all hypotheses on expected correlations with other instruments and differences between relevant subgroups were met.
PROMIS-PF-6, PROMIS-PF-8, and RMDQ-24 exhibited better unidimensionality, whereas PROMIS-PF-4, PROMIS-PF-6, PROMIS-PF-8, and PROMIS-PF-10 showed superior item-level characteristics. PROMIS-PF-20 was the instrument with the best scale-level reliability. This study warrants assessment of other measurement properties of PROMIS-PF short forms in comparison with disease-specific physical functioning instruments in LBP.
Evaluating psychiatric symptoms in Parkinson's Disease by a clinimetric analysis of the Hopkins Symptom Checklist (SCL-90-R)
2018, Progress in Neuro-Psychopharmacology and Biological Psychiatry
Citation Excerpt :
According to the SCL-90-R manual (Derogatis, 1994) it is important to consider gender specific scorings but not age specific. This is in accordance both to Schmitz et al. (2000) and to Paap et al. (2012) who found no impact of age on the SCL-90-R subscales. Therefore, the two samples were gender matched.
Although psychiatric comorbidity in Parkinson's Disease (PD) has often been studied, the individual psychiatric symptoms have rarely been evaluated from a clinimetric point of view in an attempt to measure how much the symptoms have been bothering or distressing the PD patients.
The current study is therefore aimed at evaluating from a clinimetric viewpoint the severity of psychiatric symptoms affecting PD patients by using the Hopkins Symptom Checklist (SCL-90-R) to show its measurement-driven construct validity (scalability). The conventional nine SCL-90-R subscales (somatization, obsessive-compulsive, interpersonal sensitivity, depression, anxiety, hostility, phobic anxiety, paranoid ideas, and psychoticism), as well as the clinical most valid subscales from the SCL-28 version (depression, anxiety, interpersonal sensitivity, and neurasthenia) were analysed according to a clinimetric approach by comparing PD patients with a control group from a general population study. Scalability was tested by the non-parametric item response theory model by use of a Mokken analysis. Among the various SCL-90-R or SCL-28 subscales we identified from the clinimetric analysis that the somatization, anxiety, phobic anxiety, psychoticism, and neurasthenia (apathy), as well as the SCL-90-R GSI, were the most impaired psychiatric syndromes reaching a clinically significant effect size above 0.80, whereas the total SCL-28 GSI obtained an effect size of just 0.80. Our clinimetric analysis has shown that patients with PD not only are bothered with diverse somatic symptoms, but also with specific secondary psychiatric comorbidities which are clinically severe markers of impairment in the day-to-day function implying a negative cooping approach.
SCL-90-R emotional distress ratings in substance use and impulse control disorders: One-factor, oblique first-order, higher-order, and bi-factor models compared
2017, Psychiatry Research
Citation Excerpt :
Paap et al. (2012) pointed to the fact that previously reported conflicting findings may be due to at least 2 sample characteristics: level of self-reported distress and biological sex. Paap et al. (2012) based their assertion on the outcome of analyses of SCL-90-R ratings yielded in three samples: severely psychiatrically disturbed patients, persons with gender incongruence, and depressed patients. Another determinant of factor structure was recently reported when SCL-90 ratings of adolescent psychiatric inpatients (at admission and discharge) and age- and gender-matched community youth were analyzed by Rytilä-Manninen et al. (2016).
To fully understand the dimensionality of an instrument in a certain population, rival bi-factor models should be routinely examined and tested against oblique first-order and higher-order structures. The present study is among the very few studies that have carried out such a comparison in relation to the Symptom Checklist-90-R. In doing so, it utilized a sample comprising 2593 patients with substance use and impulse control disorders. The study also included a test of a one-dimensional model of general psychological distress. Oblique first-order factors were based on the original a priori 9-dimensional model advanced by Derogatis (1977); and on an 8-dimensional model proposed by Arrindell and Ettema (2003)—Agoraphobia, Anxiety, Depression, Somatization, Cognitive-performance deficits, Interpersonal sensitivity and mistrust, Acting-out hostility, and Sleep difficulties. Taking individual symptoms as input, three higher-order models were tested with at the second-order levels either (1) General psychological distress; (2) ‘Panic with agoraphobia’, ‘Depression’ and ‘Extra-punitive behavior’; or (3) ‘Irritable-hostile depression’ and ‘Panic with agoraphobia’. In line with previous studies, no support was found for the one-factor model. Bi-factor models were found to fit the dataset best relative to the oblique first-order and higher-order models. However, oblique first-order and higher-order factor models also fit the data fairly well in absolute terms. Higher-order solution (2) provided support for R.F. Krueger's empirical model of psychopathology which distinguishes between fear, distress, and externalizing factors (Krueger, 1999). The higher-order model (3), which combines externalizing and distress factors (Irritable-hostile depression), fit the data numerically equally well. Overall, findings were interpreted as supporting the hypothesis that the prevalent forms of symptomatology addressed have both important common and unique features. Proposals were made to improve the Depression subscale as its scores represent more of a very common construct as is measured with the severity (total) scale than of a specific measure that purports to measure what it should assess―symptoms of depression.
Cross-cultural confirmation of bi-factor models of a symptom distress measure: Symptom Checklist-90-Revised in clinical samples
2016, Psychiatry Research
Four decades have elapsed since the introduction for clinical and research purposes of the Symptom Checklist-90(-R). Yet, its underlying dimensional structure has not been clearly delineated. A shift has been observed in the methods utilized–from predominantly exploratory factor analytic in nature in the first two decades or so to different confirmatory methods in recent years. A need remains to search for a structure that remains invariant across samples and nations. In that context, the present study attempted to replicate and extend recent findings yielded in a Hungarian general population sample (N=2,874) with two psychiatric patient samples from Hungary (N=972) and The Netherlands (N=1,902). In doing so, four models were contrasted: the one-factor model, Derogatis’ nine factor model, a second-ordered factor model, and a bi-factor model. The bi-factor model was shown to yield the closest fit to the data in both countries. Further studies are needed to determine the stable number and kind of subscale scores that reflect the specific (primary) symptoms best, that is, those subscales with minimal shared variance with the overall general psychological distress dimension.

View all citing articles on Scopus

View full text

Why the factorial structure of the SCL-90-R is unstable: Comparing patient groups with different levels of psychological distress using Mokken Scale Analysis

Abstract

Introduction

Section snippets

Personality disorder sample: PDlow and PDhigh

Missing data: two-way imputation

Description of the data

Discussion

Acknowledgments

Journal of Anxiety Disorders

Journal of Sexual Medicine

Personality and Individual Differences

Behaviour Research and Therapy

Diagnostic and Statistical Manual of Mental Disorders (3rd ed., revised) (DSM–III-R)

Diagnostic and Statistical Manual of Mental Disorders (4th ed.) (DSM–IV)

Invariance of SCL-90-R dimensions of symptom distress in patients with peri partum pelvic pain (PPPP) syndrome

British Journal of Clinical Psychology

Nog meer steun voor het multidimensionale karakter van de SCL-90-R [Even more support for the multidimensional nature of the SCL-90-R]

De Psycholoog

Verdere steun voor het multidimensionale karakter van de SCL-90-R [Further support for the multidimensional nature of the SCL-90-R]

De Psycholoog

Influence of simple imputation and EM methods on factor analysis when item nonresponse in questionnaire data is nonignorable

Multivariate Behavioral Research

Factor structure of the SCL-90-R: is there one?

Journal of Personality Assessment

SCL-90-R: Administration, Scoring and Procedures Manual

Deconstructing therapy outcome measurement with rasch analysis of a measure of general clinical distress: the Symptom Checklist-90-revised

Psychological Assessment

Hoe multidimensionaal is de Symptom Checklist (SCL-90) nu eigenlijk? [How multidimensional is the Symptom Checklist (SCL-90) really?]

De Psycholoog

The dimensions of the Dutch SCL-90: more than one, but how many?

Netherlands Journal of Psychology

Symptom profiles of gender dysphoric patients of transsexual type compared to patients with personality disorders and healthy adults

Acta Psychiatrica Scandinavica

A Finnish validation study of the SCL-90

Acta Psychiatrica Scandinavica

Day treatment of patients with personality disorders: experiences from a Norwegian treatment research network

Journal of Personality Disorders

The Norwegian network of psychotherapeutic day hospitals

Therapeutic Communities

A European network for the investigation of gender incongruence: the ENIGI initiative

European Psychiatry

The influence of educational information on depressed outpatients treated with escitalopram: a semi-naturalistic study

Nordic Journal of Psychiatry

Personality disorder sample: PD_low and PD_high