Validity evidence for the use of the Pediatric Quality of Life Inventory, the Revised Children’s Anxiety and Depression Scale-25, and the Columbia-Suicide Severity Rating Scale in measurement-based care in intensive outpatient child and adolescent mental health care
We sought to evaluate the validity of the interpretations made on the scores from three patient-reported outcome measures (PROMs), the Pediatric Quality of Life Inventory (PedsQL), the Revised Children’s Anxiety and Depression Scale-25 (RCADS-25), and Columbia-Suicide Severity Rating Scale-short form (C-SSRS), for use in measurement-based care (MBC) in intensive outpatient child and adolescent mental health services.
Methods
A mixed-methods, secondary analysis of interview and survey data from an MBC implementation evaluation, as well as PROMs data collected through MBC in routine clinical practice, was performed. The setting was intensive outpatient mental health services for children 6–18 years of age. The Standards for Educational and Psychological Testing argument-based approach to validation was used combining qualitative and quantitative data to evaluate validity.
Results
The PROMs appear to comprehensively cover key domains relevant to child and adolescent mental health intensive outpatient treatment. There is preliminary evidence that youth scores accurately reflect their symptoms and functioning and pick up change in these constructs, however, potential issues with response processes were identified with the C-SSRS, the school domain of PedsQL, and for caregiver proxy-reports on the PedsQL and RCADS-25, which warrant further investigation.
Conclusion
There is evidence to support the use of these PROMs for MBC in child and adolescent mental health. However, further investigation is needed into response processes, internal structure, and to establish clinically meaningful thresholds to improve interpretability, and ensure the validity of their use.
Why is this study needed? This study was needed because there is not enough evidence about which patient-reported outcome measures (i.e., standardized patient questionnaires) are optimal for tracking mental health progress in children and adolescents receiving intensive outpatient mental health treatment.
What is the key problem this study addresses? The study focuses on whether three widely used mental health tools—the Pediatric Quality of Life Inventory, the Revised Children’s Anxiety and Depression Scale-25, and the Columbia-Suicide Severity Rating Scale—are valid and useful for measuring mental health progress in children and adolescents in intensive outpatient programs.
What is the main point of this study? The study found that these three tools are generally performing well and are viewed as helpful for measuring important aspects of mental health in young people and for tracking changes over time.
What are the main results and what do they mean? The tools covered key mental health areas and showed evidence that they measure what they are supposed to and can track changes over time. However, the study also identified gaps in our knowledge of the tools’ performance across different groups, how they capture different concepts of mental health and quality of life, and how sensitive they are to detecting small changes. This means the tools are useful, but more research is needed to make sure they work the same for everyone and are helpful for clinicians and patients alike.
Background
The youth mental health crisis is marked by an increasing number of help-seeking children and adolescents, with approximately 1 in 5 youth being affected by a mental health condition [1, 2]. Measurement-based care (MBC) is the routine use of standardized clinical measures to monitor symptoms and functioning, and to inform treatment in mental health services [3, 4]. MBC has been shown to improve care efficacy by reducing symptom severity and increasing functioning in children and adolescents with mental health conditions [5]. At the system level, aggregated MBC patient-reported outcome measures (PROM) data are used for service evaluation, research, and informing organizational and government policy-making [4, 6, 7].
Despite its benefits, selecting appropriate PROMs for MBC in child and adolescent mental health (CAMH) is challenging. Currently, there is no consensus on the optimal MBC measures for CAMH. In fact, consensus may not be possible, as the suitability of a measure depends on the goals of a treatment program, its target population and the intended uses of the data (i.e., for individual treatment decision-making or service evaluation). Given this, it is essential to demonstrate validity evidence across different CAMH populations and intended uses to enable informed decisions when selecting PROMs. However, such evidence remains limited in CAMH [8, 9].
A set of PROMs, the acute version of the Pediatric Quality of Life Inventory Generic Core Scales, Revised Children’s Anxiety and Depression Scale-25 item version (RCADS-25), and Columbia-Suicide Severity Rating Scale self-report screener (C-SSRS), were selected to be used in an MBC program for two intensive outpatient youth (8–18 years of age) mental health treatment programs with moderate to severe mental health conditions [10, 11]. The PROM selection process was informed by standard outcome set recommendations, and conducted in collaboration with youth, families, clinicians and administrators with the aim of identifying a set of PROMs that balanced comprehensive content coverage and ability to capture constructs of interest, against burden of administration [12]. This study appraises the validity of the interpretations made on the scores from a set of PROMs for use in MBC in intensive outpatient CAMH treatment, supporting their future use in CAMH.
Methods
Design
This is a convergent, mixed-methods study evaluating the evidence for the validity of the interpretations made based on the scores from a set of three PROMs for individual decision-making within MBC in intensive outpatient mental health services [13]. This is a secondary analysis of interview and survey data from an MBC implementation evaluation and PROMs data collected through a “real world” MBC program [11, 13, 14]. Quantitative and qualitative data were collected and analyzed separately, then combined providing a supplemental, preliminary understanding of the validity evidence for PROMs’ use [13]. This study was approved by the University of Calgary’s Research Ethics Boards (REB22-1137) and reported using COSMIN’s General Reporting Recommendations [15].
Conceptualization of validity
In this study, we are using an argument-based approach to validation, following the Standards for Educational and Psychological Testing [16, 17]. This is described as a process where evidence is collected to support a proposed interpretation and use of a measure’s scores (in a particular population and context) to provide a sound scientific argument for the validity of that proposed interpretation and use [16]. Formulating a validity argument can be thought of as a three-step process: (1) State the purpose of measurement; (2) State the inferences and assumptions made when the instrument is used; (3) Evaluate the evidence to support those inferences [16]. Instead of prescribing particular types of validity, study designs, or analyses, this approach uses five sources of evidence that a measurement developer or researcher may consider for the validation process, which are evidence based on: (1) content (e.g., measure grounded in theory, expert review of items’ content); (2) response process (e.g., cognitive interviewing), (3) internal structure (e.g., factor analysis); (4) relations to other variables (e.g., correlate scores with an instrument for a related construct); and (5) consequences associated with measure use (e.g., feasibility of use, clinical value). Conceptualizing validation as a process of gathering and evaluating evidence for the validity of an instrument’s intended use allows professional judgment to guide decisions regarding the specific forms of evidence that are needed to support its use [18, 19]. It also brings attention to the examination of the social consequences of using PROMs for a particular use, and prompts researchers to think more deeply about validity and PROMs [17, 19]. The modern, argument-based approach to validation has been recommended in education and psychological testing since 1999 and is beginning to gain traction in PROM research [19, 20].
Measures
PedsQL
The acute version of the PedsQL is a 23-item questionnaire with a 7-day recall period. It has 4 sub-scales (physical, emotional, social, and school functioning),three summary scores (Physical Health Score, Psychosocial Health Score and Total Scale Score), three self-report forms for children ages 5–7, 8–12, and 13–18 years old, and 4 proxy-report forms for preschool aged, 5–7, 8–12, and 13–18 years old [21]. It has shown preliminary evidence of validity in CAMH populations and is the most widely used pediatric PROM in Alberta, a Canadian province [22, 23]. Respondents are asked to rate how frequently they (or their child) experience difficulties with activities using a 5-point response scale. Items are reverse scored and linearly transformed to a 0–100 scale. Proposed interpretations of scores, for this study’s target population (i.e., children and adolescent accessing intensive outpatient mental health services), are that higher scores indicate the respondent is perceiving better health-related quality of life [24].
RCADS-25
RCADS-25 is a 25-item questionnaire for children and youth ages 8–18 with a total score and 2 subscales (anxiety and depression), with no recall period, which can be converted to norm-based T scores [25]. It has self-report and proxy-report forms and is a recommended measure from the International Consortium for Health Outcomes Measurement core outcome set for Children and Young People with Anxiety and Depression [12]. However, its sensitivity to change has not been extensively evaluated [25, 26]. Respondents are asked to rate how often they experience symptoms on a 4-point frequency scale. Proposed interpretations of scores are: (1) higher scores indicate the respondent is perceiving higher anxiety or depression symptoms, and (2) a T-score of 65 or higher may correspond to a clinical level of anxiety or depression, suggesting the need for mental health treatment [25, 27].
The C-SSRS screener is a 6-item self-report questionnaire for youth ages 10 and older, which measures severity of suicidal ideation and behaviour, and classifies individuals as high, moderate or low risk for suicide [28]. The clinician-administered interview version has shown good internal consistency, inter-rater reliability and sensitivity to change with adolescents; however, the evidence for the use of the self-report C-SSRS in child and adolescent populations has not been evaluated [26, 28]. Items 1–2 ask about suicidal ideation and items 3–6 ask about suicidal behaviours with increasing severity. Items 1–5 have a one-month recall period and item 6 has a 3 month recall period. It is scored by looking at the highest level endorsed across the 6 items. Respondents are categorized as Low (1), Moderate (2) or High Risk (3), and are assigned a score of 0 if they do not endorse any items. Proposed interpretations of scores are that different triage actions should be taken based on each risk score, and that following those actions will decrease suicide risk [29].
Proposed Use
The proposed use of the PedsQL, RCADS-25 and C-SSRS as a set in MBC is to assess mental health symptoms and quality of life, monitor longitudinal changes in those constructs for youth (8–18 years old), and make individual treatment decisions based on scores and change in scores, in intensive outpatient CAMH services. Table 1 illustrates the assumptions inherent in using these instruments for the intended use, its source of evidence, current evidence available to support the assumption, and the evidence that will be evaluated to support the inferences and assumptions. The assumptions to be evaluated were determined based on an examination of the intended use of the PROMs, and in collaboration with key stakeholders (clinicians, administrators) who identified what would be most important about a PROM for MBC in CAMH.
Table 1
Assumptions, existing validity evidence, evidence evaluated in this study, with data sources and analytic approach
Assumptions
Standards source of evidence
Existing evidence
Data source
Analytic approach
Analyses including hypotheses to be evaluated (where applicable)
1. The PROMs item content of the 3 measures, taken together, represent important concepts to be assessed in intensive mental health treatment programs (no important concepts missing) and do not include constructs not relevant to this population and setting (no over-representation)
Content
Content validity testing of PedsQL from other populations [24]. Content validity testing of RCADS-47 and C-SSRS in child and adolescent populations [26]
Clinician interviews
Qualitative content analysis
Qualitative content analysis of interview questions asking about whether: each measure asks questions that are relevant to the clients you see; the most important concepts covered by these 3 measures; there any important concepts missing; as well as other references to the measures content throughout the interview
Youth and caregivers survey
Descriptive statistics and qualitative content analysis
Descriptive analysis of youth ratings (on a 5-point scale, not at all appropriate to strongly appropriate) of each measure to their reason for seeking care
Responses to open-ended survey questions asking about missing items and general comments about the measures were analysed using content analysis
2. Respondents comprehend and respond to PROMs items in the way that was intended
Content and response process
Cognitive interviews of PedsQL in other populations [24]
Cognitive interviews of RCADS in CAMH populations [26]. No testing of response processes for self-report version of C-SSRS in children or adolescents
Clinician interviews
Qualitative content analysis
Qualitative content analysis of interview questions asking about their perceptions of client’s difficulties interpreting or completing PROMs
Youth and caregivers survey
Descriptive statistics
Descriptive statistics of responses to “easy to read and understand”
3. The scores of PedsQL, RCADS-25, and C-SSRS correspond with how patients actually perceive their symptoms or function
Relationship with other variables
Alignment with theory (convergent, discriminant validity) of PedsQL in child and adolescents with a diagnosis of anxiety or depression [25]. Alignment with theory of RCADS-25 in CAMH populations (discriminant validity) 20/07/2025 08:55:00
The clinician administered C-SSRS has good inter-rater reliability and sensitivity to change [26]
Clinician interviews
Qualitative content analysis
Qualitative content analysis of interview questions asking if the PROM scores seem to represent what you observe clinically about the client’s symptoms and functioning
PROMs data
Hypothesis approach: an examination of how closely hypotheses made based on theory and logic about the relationships between scores and other, external variables are consistent with observed relationships in the data
Hypothesis 3a. It is recognized that experiencing depression and anxiety symptoms is associated with lower quality of life [34, 35]. Therefore, we hypothesize that there will be a small to medium inverse correlation between the RCADS-25 and PedsQL total scores (− 0.3 < r < − 0.1)
Hypothesis 3b. The RCADS-25 total score captures anxiety and depression symptoms, and those constructs are covered in the emotional domain of PedsQL, therefore, we hypothesized that there would be a large, inverse correlation between the PedsQL emotional score and the RCADS-25 total score (r < − 0.5)
Hypothesis 3c. It is conceivable that individuals experiencing symptoms of anxiety and depression may also experience some problems with their physical health and functioning, but we would expect that association to be weaker than the relationship between symptoms of anxiety and depression and emotional functioning. Therefore, we hypothesize that the strength of the correlation between the PedsQL physical functioning score and the RCADS-25 total score would be weaker than the correlation between PedsQL emotional functioning and RCADS-25 total score
4. The PedsQL, RCADS-25, and C-SSRS differentiate clients with differing levels of symptoms and functioning
Relationship with other variables
See evidence above
PROMs data
Hypothesis approach
Hypothesis 4a. Youth in Day Hospital have theoretically higher care needs, and presumably greater symptom intensity and lower daily functioning compared with youth in the intensive community treatment services. Therefore, we expect a significant difference between Day Hospital and ICTS scores on PedsQL total and emotional functioning scale, and RCADS total and subscale scores
Hypothesis 4b. Youth with a higher degree of suicidal ideation and behaviours would presumably experience high levels of depressive symptoms. Therefore, we hypothesize there would be a significant difference on the RCADS depression subscale between youth with C-SSRS scores of low to moderate risk (0–2), versus score of high-risk (3)
5. The PedsQL, RCADS-25, and C-SSRS are sensitive enough to changes in client’s symptoms and functioning
Relationship with external variables
The RCADS-47 (from which RCADS-25 is derived) has some evidence of sensitivity to change in CAMH populations. The PedsQL has evidence of sensitivity to change in other populations [21]. The clinician-administered C-SSRS has evidence of sensitivity to change in CAMH populations [26]
Clinician interview
Qualitative content analysis
Qualitative content analysis of interview questions asking if the measures seem to represent the change in symptoms or functioning that you observe clinically
PROMs data
Hypothesis approach
Hypothesis 5a. We hypothesize that changes in youth mental health symptoms and quality of life will be moderately correlated over the course of treatment. Specifically, we expect a medium, negative correlation (r ≈ − 0.3) between change scores on the RCADS-25 total and PedsQL total scores, such that improvements in mental health symptoms (lower RCADS scores) are associated with increases in quality of life (higher PedsQL scores)
Hypothesis 5b. We also expect that the association will be higher between the PedsQL emotional functioning scale and the RCADS-25 total score. Therefore, we hypothesize the strength of the correlation between PedsQL emotional functioning change scores and the RCADS-25 total change scores would be stronger than for PedsQL total score
Hypothesis 5c. Change in scores on the C-SSRS should be associated with a reduction in depression symptoms. Therefore, we hypothesize there will be a medium association between change in C-SSRS scores and change in RCADS-25 depression scores
6. RCADS-25, PedsQL and C-SSRS are clinically feasible and informative to use for MBC in intensive outpatient treatment for CAMH
Consequences associated with the measure
According to ICHOM recommendations, RCADS-25 and C-SSRS are the optimal PROMs available for these constructs, in terms of balancing the need for reliable and valid scores against clinical feasibility [26]. PedsQL is the most frequently used pediatric PROM in Alberta, suggesting it is feasible [22]
PROMs data
Descriptive statistics
Mean and standard deviation for time to complete were calculated for youth at intake
Clinician interviews
Qualitative content analysis
Qualitative content analysis of clinician’s references to burden of administration, as well as utility of the measures for clinical decision-making
Youth and caregivers survey
Descriptive statistics
Descriptive analysis of survey responses to the statement “The time to complete the measures was not too long” using a 5 point agreement scale
7. Suggested interpretations of the RCADS-25, PedsQL and C-SSRS scores, including MCID and clinical cut points, provides accurate information and inform best possible clinical decision-making
Consequences associated with the measure
Not evaluated in this study
8. The discrepancies between youth and caregivers on the RCADS-25 and PedsQL scores align with true differences in perceptions of the child and proxy
Relationships with other variables and consequences associated with scores (i.e., clinical interpretations)
Evidence of varying degrees agreement between self and proxy-report versions of RCADS-25 and PedsQL [24, 36]. No examination of validity of agreement
Not evaluated in this study
9. Data from PedsQL and RCADS-25 align with the developer’s proposed scaling structure
Internal structure
Limited evidence of RCADS-25 internal structure in CAMH populations in English and cultural translations [26]
Evidence of PedsQL internal structure in other patient populations [21]
Not evaluated in this study
10. Differences in linguistic/cultural backgrounds and/or sociodemographic backgrounds do not lead to substantially different interpretations of the items on the PedsQL, RCADS-25 and C-SSRS
Internal structure and response processes
Evidence of invariance in PedsQL from non-CAMH paediatric populations and across ages [24, 37]
Evidence of non-invariance of factor structure across countries in general child and adolescent populations [27, 38]
Not evaluated in this study
Study setting
PROMs were administered electronically, in person, to all children and adolescents (herein, youth) and their caregivers accessing a Day Hospital or Intensive Community Treatment Services (ICTS) at a CAMH centre in Calgary, Canada. The Day Hospital program supports youth transitioning from an inpatient psychiatric unit to the community, and ICTS supports youth experiencing escalating symptoms in community. Both programs suport clients and families through short-term, intensive, multi-disciplinary therapy provided by allied health professionals and psychiatrists. Programs are described in greater detail elsewhere [11]. Figure 1 shows the PROMs and administration timepoints.
All youth, along with their caregivers, who access services are requested to complete the PROMs as part of routine clinical care. All are asked whether they consent to research use of their PROM data and if they consent to be contacted for research purposes. All clinicians were invited to participate in interviews through an open-invitation, as well as purposively recruited to ensure representation of diverse clinical roles.
Data collection
Youth and caregiver survey
Youth over 14 years of age and caregivers who consented to contact between March and September 2023 were emailed an invitation with an online survey link between July and November 2023. The survey had a mix of open and closed-ended questions to evaluate PROMs, described in more detail in Table 1.
Clinician interviews
Semi-structured interviews were conducted with clinicians from February to May 2024 as part of implementation evaluation. The questions focused on evaluating the PROMs with respect to (1) content coverage (e.g., Does this measure ask questions that are relevant your clients; Are all important concepts covered by these 3 measures), (2) accuracy of scores (e.g., Does it seem to capture change in your clients); and (3) clinical utility (e.g., Do these measures add value for you). They were also asked about their perceptions of feasibility for clients (i.e., Have your clients reported difficulty understanding the measures). In addition, any information given about PROMs validity evidence or utility for use in MBC throughout the interviews was included in the analysis. Qualitative sample sizes were based on the concept of information power and for the original implementation study (i.e., information provided by participants was sufficient to meet study aims) [30].
PROM data
The following data was extracted from the MBC database collected from March 2023 to September 2024: PedsQL, RCADS-25 and C-SSRS subscale and total scores of youth at intake and discharge, demographic data (i.e., age, ethinicity), and the start and end times for administration of PROMs. Data were only extracted for youth who consented to have their data used for research (or, for youth under 14 years of age, whose guardian consented on their behalf). PROMs data was collected using a convenience sampling approach (i.e., all available data from the centre), meeting the recommended sample sizes of > 100 participants for evaluating relationships to other, related variables [31].
Data analysis
Assumptions 1–6 from Table 1 were evaluated by examining quantitative and/or qualitative data.
Qualitative analysis
We used directed content analysis of interview data and open-ended survey responses from youth and caregivers [32]. The initial coding framework was based on the Assumptions from Table 1. References were initially coded, with new codes and subcodes created and combined subcodes as needed to organize data. Two researchers (EM, BB) coded 3 transcripts independently, then met to finalize the coding framework. One researcher (EM) coded the remaining data. A data matrix was created, with participant type (clinician, youth, caregivers) as rows and columns for the assumptions in Table 1. Codes for participant data populated the cells. One researcher described the data in each cell, with a second verifying the descriptions.
Quantitative analysis
Analyses were performed in SPSS [33]. Categorical variables were summarized as counts and percentages and continuous variables summarized as means and standard deviations (SD). The mean and SD for the PedsQL and RCADS-25 subscales and total scores were calculated. We examined floor and ceiling effects by calculating the percentage of youth scoring the maximum or minimum score, considering 15% or more to be evidence of ceiling or floor effects, respectively. C-SSRS scores are described as counts and percentages. All hypothesis-testing used youth self-report data at intake, except for hypotheses about change scores, which were calculated for participants who completed both intake and discharge timepoints. We examined the magnitude of association between scale scores by calculating a Pearson product-moment correlation (r) with a 95% confidence interval. We explored differences in scores between groups using an independent samples t-test, with p-value < 0.05 considered significant, with Cohen’s D calculated to examined magnitude of difference. There was no item non-response because the MBC program forces responses to all items. However, there was loss to follow up between intake and discharge timepoints. Because this data was collected through real world administration of PROMs (rather than clinical research), we are unable to describe reasons for drop-out, however, it is generally thought to be related to youth leaving the programs prior to their anticipated discharge date, or issues in PROM administrative processes resulting in youth missing the discharge administration timepoint.
Results
Participant demographics
There were 10 clinician interview participants, as well as 75 caregiver and 17 youth survey participants (see Table 2). Participants in the PROMs database include 295 participants, 175 youth from Day Hospital and 120 youth from ICTS, mean of 14.9 years of age (SD: 1.7), and minimum–maximum 10–18 years of age. Table 3 describes participants self-identified ethnicity.
Table 2
Participant demographics for survey and interviews
Number (%)
Clinician interview participants (n = 10)
Service
Day hospital
4 (40.0)
ICTS
6 (60.0)
Role
Nurse
1 (10.0)
Psychologist
3 (30.0)
Social worker
6 (60.0)
Gender
Woman
10 (100.0)
Caregiver survey participants (n = 75)
Gender
Woman
65 (88.89)
Man
10 (11.11)
Caregiver age
20–35 years
7 (7.33)
36–45 years
32 (42.67)
46–55 years
32 (42.67)
56–65 years
4 (5.33)
Youth survey participants (n = 17)
Gender
Woman
15 (88.23)
Man
2 (11.77)
Age
14 years
1 (5.88)
15 years
4 (23.53)
16 years
5 (29.41)
Table 3
Counts and percentages of youth’s ethnicity
Ethnicity
N
White
114 (38.6%)
Southeast Asian
13 (4.4%)
South Asian
5 (1.7%)
Middle Eastern
2 (0.7%)
Latino, Latina, Latinx
7 (2.4%)
Indigenous
5 (1.7%)
East Asian
7 (2.4%)
Black
4 (1.4%)
Mixed
26 (8.8%)
Undisclosed
112 (38%)
Descriptives of scale scores
Tables 4 and 5 describe scores for PedsQL and RCADS-25, and C-SSRS, respectively. There were no floor or ceiling effects observed. Thirty-eight percent of youth were lost at follow-up, with no significant differences observed between youth who did or did not complete PROMs at discharge (see Supplementary File 1).
Table 4
Descriptive statistics of scale and subscale scores, floor and ceiling effects
Variable
Intake
Discharge
Intake
Discharge
N
Minimum/maximum
Mean (SD)
Median (interquartile range)
N
Minimum/maximum
Mean (SD)
Median (interquartile range)
Floor (%)
Ceiling (%)
Floor (%)
Ceiling (%)
PedsQL total
295
12.0–82.6
49.0 (15.3)
48.9 (22.5)
183
15.22–100
54.1 (16.5)
55.5 (25.0)
0.0
0.0
0.0
0.5
PedsQL physical
295
6.2–100
60.6 (20.9)
62.5 (31.3)
183
12.5–100
62.5 (20.4)
62.5 (32.0)
0.0
1.4
0.0
2.2
PedsQL emotional
295
0–100
37.2 (17.6)
35 (25)
183
0–100
45.9 (19.3)
45.0 (30.0)
0.7
0.7
1.6
0.5
PedsQL social
292
0–100
57.2 (21.9)
55.0 (25.0)
183
0–100
60.5 (23.7)
60.0 (30.0)
1.0
4.5
1.6
7.1
PedsQL school
295
0–90
34.5 (17.6)
35.0 (20.0)
183
0–100
42.7 (21.0)
40.0 (30.0)
3.4
0.0
3.8
1.6
RCADS total
295
5–71
38.2 (14.0)
38.0 (18.0)
183
0–73
31.5 (14.3)
31.0 (20.0)
0.0
0.0
1.1
0.0
RCADS depression
295
2–30
18.2 (6.3)
18.0 (9.0)
183
0–30
15.0 (6.7)
15.0 (9.0)
0.0
0.7
1.1
1.6
RCADS anxiety
295
1–42
10.1 (8.8)
20.0 (14.0)
183
0–43
16.5 (8.7)
16.0 (12.0)
0.0
0.0
1.1
0.0
Floor and ceiling effect thresholds were considered to be present if 15% or more respondents scored at the top or bottom of the scales, respectively
Table 5
Descriptive statistics for C-SSRS
C-SSRS score
Intake
Discharge
N
Count (proportion)
N
Count (Proportion)
0
286
20 (7%)
175
56 (32%)
1
286
12 (4.2%)
175
33 (18.9%)
2
286
33 (11.5%)
175
14 (8%)
3
286
221 (77.3%)
175
72 (41.1%)
Assumption 1. The content of the items
Clinicians described the content of PedsQL, RCADS-25 and C-SSRS as relevant and comprehensive. In terms of youth and caregivers, 93.3% and 92.9% rated PedsQL as appropriate or strongly appropriate, respectively. RCADS-25 was rated as appropriate or strongly appropriate by 100% of youth and 97.3% of caregivers, while 66.7% of youth rated C-SSRS as appropriate or strongly appropriate. Specific to the C-SSRS, 3/12 (33.3%) of youth made comments that the C-SSRS was potentially upsetting as a self-report instrument. Most participants felt the PROMs were comprehensive for youth in these intensive treatment settings, however, clinicians working from a family systems approach reported missing concepts relating to family relationships. Similarly, caregivers identified child’s relationships with siblings and parents as missing concepts. A minority of youth felt that there should be questions about self-harm and substance use.
Assumption 2. PROM comprehensibility
Overall, clinicians observed youth and their caregivers having minimal difficulties with understanding and responding to items in PedsQL and RCADS-25. Youth (100%) and caregivers (94.7%) agreed or strongly agreed that the PROMs, taken together, were easy to read and understand. However, problems with comprehending or responding were expressed by caregivers and youth. On open-ended survey questions, both youth and caregivers expressed difficulty responding to school-related items of the PedsQL, since youth were often not attending regular school while in treatment. Caregivers also commented on having difficulty evaluating their child’s functioning and symptoms, due to not observing their child for most of the day, or because they believe their child may not be communicating their daily emotional challenges with them. Clinicians reported incongruities between the C-SSRS self-report score and clinicians’ findings from their clinical interviews, however the source of this was unclear (e.g., whether it was due to comprehensibility of the C-SSRS, a reluctance to report suicide ideation or behaviours, or a function of the scale itself).
Assumption 3. Correspondence of scores with actual perceptions of symptoms/functioning
Clinicians reported that PedsQL and RCADS-25 scores seem to accurately represent youth’s functioning and symptom intensity, while scores did not always correspond for C-SSRS (see Assumption 2).
Table 6 describes associations between PedsQL and RCADS-25 scores. A small to medium inverse correlation between RCADS-25 and PedsQL total scores was hypothesized in 3a; however, a strong correlation was observed. Hypothesis 3b was supported because there was a large, inverse correlation between PedsQL emotional scores and the RCADS-25 total score. Hypothesis 3c was supported because the strength of correlation between PedsQL physical score and RCADS-25 total score was weaker than that of PedsQL emotional score and RCADS-25.
Table 6
Correlation between PedsQL and RCADS-25 subscale and total scores
Scales
N
Pearson correlation
Significance (2-tailed)
95% confidence intervals (2-tailed)
Lower
Upper
RCADS total
295
− 0.695
< 0.001
− 0.750
− 0.631
PedsQL total
RCADS total
295
− 0.757
< 0.001
− 0.802
− 0.703
PedsQL emotional
RCADS total
295
− 0.475
< 0.001
− 0.559
− 0.382
PedsQL physical
Assumption 4. Scores detect differences in clients
Clinicians felt that the C-SSRS self-report does not differentiate well between youth at high- and moderate risk for suicide based on clinical interview (i.e., C-SSRS is highly sensitive to risk). There were no comments on the other PROMs with regards to differentiating clients. Table 7 describes the relationships between PedsQL and RCADS-25 scores for Day Hospital and ICTS clients. Table 8 describes the relationships between C-SSRS risk scores (high risk vs low-moderate risk) and RCADS-25 depression scores. Hypothesis 4a was not supported as no significant difference in scores was detected between the two services. Hypothesis 4b was supported as a significant difference was observed in RCADS-25 depression scores between youth with low to moderate risk versus high-risk C-SSRS scores (moderate effect size; d = 0.663).
Table 7
RCADS-25 and PedsQL scores by service type
Scale
Day hospital
(N = 175)
ICST
(N = 120)
Significance (2-tailed)
Cohen’s D
Mean (SD)
Mean (SD)
PedsQL total
50.18 (15.81)
47.23 (14.39)
0.103
0.194
PedsQL emotional
38.74 (17.59)
34.87 (17.34)
0.063
0.221
RCADS total
38.27 (14.28)
38.17 (13.65)
0.955
0.007
RCADS depression
18.05 (6.30)
18.30 (6.34)
0.74
− 0.039
RCADS anxiety
20.22 (9.12)
19.88 (8.361)
0.744
0.039
Table 8
RCADS-25 depression scores by suicide risk as assessed by C-SSRS
Scale
C-SSRS ≤ 2
(N = 65)
C-SSRS = 3
(N = 221)
Significance (2-tailed)
Cohen’s D
Mean (SD)
Mean (SD)
RCADS depression
15.20 (7.349)
19.22 (5.634)
< 0.001
0.663
Assumption 5. Detection of change in scores
Clinicians reported that the change in PedsQL and RCADS-25 scores reflected what they observed clinically. No clinicians commented on using the C-SSRS scores from the follow-up time points.
Table 9 describes associations between change scores. Hypotheses 5a was supported as a medium, negative correlation was observed between PedsQL and RCADS-25 total change scores. Hypothesis 5b was not supported as the strength of correlations were similar. Hypothesis 5c was supported because there was a medium association between change in C-SSRS scores and change in RCADS-25 depression scores.
Table 9
Relationships between pre-post differences for measures
Scale
N
Pearson Correlation
Significance (2-tailed)
95% confidence intervals (2-tailed)
Lower
Upper
RCADS total/PedsQL total
184
− 0.414
< 0.001
− 0.527
− 0.286
RCADS total/PedsQL emotional
184
− 0.401
< 0.001
− 0.516
− 0.272
C-SSRS
RCADS depression
184
0.254a
< 0.001
0.109
0.388
aSpearman’s Rho was used for the strength of association between change scores on the C-SSRS because it is an ordinal variable
Assumption 6. PROMs are feasible and informative
Youth (88.2%) and caregivers (84.0%) agreed or strongly agreed that the time to complete the PROMs was reasonable. When asked to identify drawbacks of administering PROMs for patients, clinicians identified burden of completion as a potential drawback, with the caveat that they felt the potential burden was outweighed by the benefits. The time to complete all three PROMs was 6 min and 15 s (SD = 3 min, 56 s) for youth; for caregivers, it was 6 min and 43 s to complete PedsQL and RCADS-25 (SD = 3 min, 19 s).
PedsQL, as a measure of functioning, was viewed by clinicians as aligning well with the focus of these two short-term, intensive, multi-disciplinary programs which focus on improving a youth’s function. The RCADS-25 was viewed as useful because it provides a quick measure of both depression and anxiety and also has norm-based T-scores that help with interpretating scores. Clinicians reported that reviewing individual item responses in both scales provided additional insights into a youth’s main area of distress. Clinicians viewed the C-SSRS as a useful screening tool, despite discrepancies between the PROM and findings of the clinical interview.
Discussion
In this study, we evaluated the validity of the validity of interpretations made on the scores the PedsQL, RCADS-25 and C-SSRS for use in MBC in intensive, outpatient CAMH treatment using a validity argument approach. These measures showed evidence of accurately capturing the constructs, and sensitivity to change. While our hypotheses regarding subgroup differences were only partially supported, this may reflect greater similarities than expected in symptom intensity and quality of life between the two programs, rather than being a function of the PROMs themselves. However further evaluation is warranted. At 6–7 min to complete, the PROMs appear clinically feasible, and clinicians viewed them as useful and as a set, providing good coverage of relevant content in their intensive outpatient settings. Overall, in our judgement, there is adequate evidence to support the use of the child self-report version of PedsQL and RCADS-25 for MBC for youth (8–18 years old) in intensive, outpatient CAMH services. However, caution is needed in interpreting the school domain of the PedsQL for youth who are not in their regular school environment. As well, further investigation of caregiver’s proxy-report versions of these measures is needed due to potential response processes issues identified (Assumption 2). Given these potential issues with validity, it is essential that clinicians interpret PROMs scores alongside other information to support clinical decisions.
The validity argument for the use of PedsQL and RCADS-25 could be strengthened by evidence in three key areas: internal structure, the influence of socio-demographic and clinical characteristics on response processes, and sensitivity to change. Internal structure is crucial for valid interpretations of subscale scores, yet there is no evidence available for the internal structure of PedsQL and limited evidence for RCADS-25 in this population. Subscales must provide accurate information to ensure that individual and service level decision-making does not result in unintended consequences.
Additionally, the influence of socio-demographic factors (gender, linguistic/cultural background, educational attainment, and severity of mental disorder) on responses to PedsQL have not been thoroughly investigated. This is important to evaluate for both PedsQL and RCADS-25, because if there are biases within these items, making decisions based on these PROMs risks perpetuating health inequities.
Preliminary evidence from this study also suggest RCADS-25 and PedsQL are able to pick up change in scores, but further work is needed to establish and verify clinically meaningful thresholds, such as minimal clinically important differences and patient-acceptable symptom states, to enhance score interpretation for clinicians [8, 39].
With respect to the C-SSRS, there is a need to further investigate the evidence for its validity in CAMH populations. Given concerns about response processes (Assumption 2), in our judgement, it should always be used alongside a clinical interview. This finding, along with a lack of prior studies examining response processes in the self-report C-SSRS in CAMH populations, points to a strong need to for a thorough evaluation of its content validity.
Finally, further evaluations of the PedsQL and RCADS-25 caregiver proxy-reports is needed. Previous studies have found only moderate correlations between caregiver and youth reports of symptoms and functioning [24, 36], but whether this is due to response process issues or true differences in perceptions of symptoms and function between youth and caregivers has not been established. If discrepancies represent true differences in perception, this could provide additional interpretations of PROM scores [40]. This is an important area for future CAMH research.
Strengths and limitations
A limitation of this study was its reliance on secondary data analysis. Since interviews were conducted to evaluate an MBC program, gaps exist in the qualitative data. The quantitative PROM data, drawn from real-world clinical use, also has limitations. Non-response could not be assessed since the administration platform forces responses to all items, and only scale scores were available for analysis. Demographic data was limited to age and ethnicity, and external variables available for examining relationships were limited. However, the argument-based validity approach allowed us to use evidence from secondary analyses to draw some conclusions about the validity of using these PROMs for MBC in CAMH. Importantly, it also allowed us to identify gaps in the validity evidence and clear directions for future PROM research in CAMH populations.
Conclusions
This study provides information about the validity of the interpretations from PedsQL, RCADS-25 and C-SSRS scores in an MBC program for intensive outpatient CAMH settings. Taken together, the PROMs provide adequate content coverage, are clinically feasible and relevant. There is a need for further validity research for all three PROMs.
Acknowledgements
We wish to thank the administrative support, clinical staff and managers at The Summit for their support for this project. This project was made possible by generous community support through the Alberta Children’s Hospital Foundation. Erin McCabe was supported through a Health System Impact Fellowship co-funded by Canadian Institutes of Health Research, Alberta Health Services Provincial Addictions and Mental Health and Mitacs.
Declarations
Competing interests
The authors declare that they have no competing interests.
Ethical approval
This study was approved by The University of Calgary Research Ethics Boards (REB22-1137).
Consent to participate
Verbal, informed consent to participate was obtained from participants in the interviews. Implied, informed consent was used for adult participants responding to surveys (i.e., by responding to the survey, participants are consenting to participate in research). Youth ages 14–18, with decision-making capacity were asked to read the study information and indicate informed consent by ticking a box. Decision-making capacity with youth ages 14–18 was assessed using 4 questions to evaluate that they understand the purpose of the study, that participation is voluntary, that they can stop participating at any time, and that their participation will not impact the care that they will receive. If all four questions were answered “yes” they were considered competent to consent to participate as a mature minor, according to Canadian ethical standards. PROMs data from youth 14–18 years old, who were competent to consent to participate, were given information about using their de-identified data for future research and were asked to consent to have their data used for research. Assent was sought for youth 14 years of age or under, or those not considered competent to participate, as well as guardians’ informed consent.
Consent to publish
This manuscript does not contain any individual person’s data in any form.
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Validity evidence for the use of the Pediatric Quality of Life Inventory, the Revised Children’s Anxiety and Depression Scale-25, and the Columbia-Suicide Severity Rating Scale in measurement-based care in intensive outpatient child and adolescent mental health care
Auteurs
Erin McCabe
Whitney Hindmarch
Bishnu Bajgain
Johanna Jacob
Paul D. Arnold
Iliana Ortega
Michele Dyson
Deborah McNeil
Gina Dimitropoulos
Ryan Clements
Maria J. Santana
Jennifer D. Zwicker
Barican, J. L., Yung, D., Schwartz, C., Zheng, Y., Georgiades, K., & Waddell, C. (2022). Prevalence of childhood mental disorders in high-income countries: A systematic review and meta-analysis to inform policymaking. Evidence-Based Mental Health,25(1), 36–44.CrossRefPubMed
Lewis, C. C., Boyd, M., Puspitasari, A., Navarro, E., Howard, J., Kassab, H., et al. (2019). Implementing measurement-based care in behavioral health: A review. Journal of the American Academy of Child and Adolescent Psychiatry,76(3), 324–335.
4.
Parikh, A., Fristad, M. A., Axelson, D., & Krishna, R. (2020). Evidence base for measurement-based care in child and adolescent psychiatry. Child and Adolescent Psychiatric Clinics of North America,29(4), 587–599.CrossRefPubMed
5.
Rognstad, K., Wentzel-Larsen, T., Neumer, S. P., & Kjøbli, J. (2023). A systematic review and meta-analysis of measurement feedback systems in treatment for common mental health disorders. Administration and Policy in Mental Health and Mental Health Services Research,50(2), 269–282. https://doi.org/10.1007/s10488-022-01236-9CrossRefPubMed
6.
Childs, A. W., & Connors, E. H. (2022). A roadmap for measurement-based care implementation in intensive outpatient treatment settings for children and adolescents. Evidence-Based Practice in Child and Adolescent Mental Health,7(4), 419–438. https://doi.org/10.1080/23794925.2021.1975518CrossRefPubMed
7.
Connors, E. H., Douglas, S., Jensen-Doss, A., Landes, S. J., Lewis, C. C., McLeod, B. D., et al. (2021). What gets measured gets done: How mental health agencies can leverage measurement-based care for better patient care, clinician supports, and organizational goals. Administration and Policy in Mental Health and Mental Health Services Research,48(2), 250–265. https://doi.org/10.1007/s10488-020-01063-wCrossRefPubMed
8.
Krause, K. R., Hetrick, S. E., Courtney, D. B., Cost, K. T., Butcher, N. J., Offringa, M., et al. (2022). How much is enough? Considering minimally important change in youth mental health outcomes. The Lancet Psychiatry,9(12), 992–998.CrossRefPubMed
9.
Thapa Bajgain, K., Amarbayan, M., Wittevrongel, K., McCabe, E., Naqvi, S. F., Tang, K., et al. (2023). Patient-reported outcome measures used to improve youth mental health services: A systematic review. Journal of Patient-Reported Outcomes,7(1), 14.CrossRefPubMedPubMedCentral
10.
Bajgain, K. T., Mendoza, J., Naqvi, F., Aghajafari, F., Tang, K., Zwicker, J., et al. (2024). Prioritizing patient reported outcome measures (PROMs) to use in the clinical care of youth living with mental health concerns: A nominal group technique study. Journal of Patient-Reported Outcomes,8(1), 20. https://doi.org/10.1186/s41687-024-00694-zCrossRefPubMedPubMedCentral
11.
McCabe, E., Dyson, M., McNeil, D., Hindmarch, W., Ortega, I., Arnold, P. D., et al. (2024). A protocol for the formative evaluation of the implementation of patient-reported outcome measures in child and adolescent mental health services as part of a learning health system. Health Research Policy and Systems,22(1), 85. https://doi.org/10.1186/s12961-024-01174-yCrossRefPubMedPubMedCentral
12.
International Consortium for Health Outcomes Measurement. (2022). Children & young people with anxiety & depression, including OCD & PTSD: Data collection reference guide (p. 71). International Consortium for Health Outcomes Measurement.
13.
Creswell, J., & Plano Clark, V. L. (2017). Designing and conducting mixed methods research (3rd ed.). SAGE.
14.
McCabe, E., Bajgain, B., Hindmarch, W., Dyson, M., McNeil, D., Ortega, I., Arnold, P. D., Dimitropoulos, G., Clements, R., Zwicker, J. D., & Santana, M. J. (2024). Evaluating the implementation of measurement-based care in child and adolescent mental health services as part of a learning health system. Research Square. https://doi.org/10.21203/rs.3.rs-5390833/v1CrossRef
15.
Gagnier, J. J., Lai, J., Mokkink, L. B., & Terwee, C. B. (2021). COSMIN reporting guideline for studies on measurement properties of patient-reported outcome measures. Quality of Life Research,30(8), 2197–2218.CrossRefPubMed
16.
American Educational Research Association, American Psychological Association, National Council on Measurement in Education. (2014). Standards for educational and psychological testing. American Educational Research Association.
17.
Kane, M. T. (2013). Validating the interpretations and uses of test scores. Journal of Educational Measurement,50(1), 1–73.CrossRef
18.
Cook, D. A., Brydges, R., Ginsburg, S., & Hatala, R. (2015). A contemporary approach to validity arguments: A practical guide to Kane’s framework. Medical Education,49(6), 560–575.CrossRefPubMed
19.
Weinfurt, K. P. (2021). Constructing arguments for the interpretation and use of patient-reported outcome measures in research: An application of modern validity theory. Quality of Life Research,30(6), 1715–1722. https://doi.org/10.1007/s11136-021-02776-7CrossRefPubMed
20.
Hawkins, M., Elsworth, G. R., Nolte, S., & Osborne, R. H. (2021). Validity arguments for patient-reported outcomes: Justifying the intended interpretation and use of data. Journal of Patient-Reported Outcomes,5(1), 64. https://doi.org/10.1186/s41687-021-00332-yCrossRefPubMedPubMedCentral
21.
Varni, J. W., Burwinkle, T. M., Seid, M., & Skarr, D. (2003). The PedsQL 4.0 as a pediatric population health measure: Feasibility, reliability, and validity. Ambulatory Pediatrics,3(6), 329–341.CrossRefPubMed
22.
Bele, S. (2022). Investigating the implementation of pediatric patient-reported outcome and experience measures in Alberta [Dissertation, University of Calgary].
23.
O’Loughlin, R., Jones, R., Chen, G., Mulhern, B., Hiscock, H., Devlin, N., et al. (2024). Comparing the psychometric performance of generic paediatric health-related quality of life instruments in children and adolescents with ADHD, anxiety and/or depression. PharmacoEconomics. https://doi.org/10.1007/s40273-024-01354-2CrossRefPubMedPubMedCentral
24.
Varni, J. W., Seid, M., & Kurtin, P. S. (2001). PedsQL 4.0: Reliability and validity of the Pediatric Quality of Life Inventory version 4.0 generic core scales in healthy and patient populations. Medical Care,39(8), 800–812.CrossRefPubMed
25.
Ebesutani, C., Reise, S., Chorpita, B., Ale, C., Regan, J., et al. (2012). The Revised Child Anxiety and Depression Scale-Short Version: Scale reduction via exploratory bifactor modeling of the broad anxiety factor. JAMA.,24(4), 833–45.
26.
Krause, K. R., Chung, S., Adewuya, A. O., Albano, A. M., Babins-Wagner, R., Birkinshaw, L., et al. (2021). International consensus on a standard set of outcome measures for child and youth anxiety, depression, obsessive-compulsive disorder, and post-traumatic stress disorder. The Lancet Psychiatry,8(1), 76–86.CrossRefPubMed
27.
Chorpita, B. F., Yim, L., Moffitt, C., Umemoto, L. A., & Francis, S. E. (2000). Assessment of symptoms of DSM-IV anxiety and depression in children: A revised child anxiety and depression scale. Behaviour Research and Therapy,38(8), 835–855.CrossRefPubMed
28.
Posner, K., Brown, G. K., Stanley, B., Brent, D. A., Yershova, K. V., Oquendo, M. A., et al. (2011). The Columbia-suicide severity rating scale: Initial validity and internal consistency findings from three multisite studies with adolescents and adults. American Journal of Psychiatry,168(12), 1266–1277.CrossRefPubMed
Malterud, K., Siersma, V. D., & Guassora, A. D. (2016). Sample size in qualitative interview studies: Guided by information power. Qualitative Health Research,26(13), 1753–1760.CrossRefPubMed
31.
De Vet, H. C. W., Terwee, C. B., Mokkink, L. B., & Knol, D. L. (2011). Measurement in medicine : A practical guide (Practical guides to biostatistics and epidemiology). Cambridge University Press.CrossRef
32.
Hsieh, H. F., & Shannon, S. E. (2015). Three approaches to qualitative content analysis. Nordic Journal of Digital Literacy,2015(1), 29–42.
33.
IBM. (2023). SPSS Statistics for MacIntosh. IBM Corp.
34.
Alemu, W. G., Due, C., Muir-Cochrane, E., Mwanri, L., Azale, T., & Ziersch, A. (2024). Quality of life among people living with mental illness and predictors in Africa: A systematic review and meta-analysis. Quality of Life Research,33(5), 1191–1209. https://doi.org/10.1007/s11136-023-03525-8CrossRefPubMed
35.
Spitzer, R. L., Kroenke, K., Linzer, M., Hahn, S. R., Williams, J. B., deGruy, F. V., et al. (1995). Health-related quality of life in primary care patients with mental disorders. Results from the PRIME-MD 1000 study. JAMA,274(19), 1511–1517.CrossRefPubMed
36.
Ebesutani, C., Korathu-Larson, P., Nakamura, B. J., Higa-McMillan, C., & Chorpita, B. (2017). The revised child anxiety and depression scale 25—parent version: Scale development and validation in a school-based and clinical sample. Assessment,24(6), 712–728. https://doi.org/10.1177/1073191115627012CrossRefPubMed
37.
Stevanovic, D., Atilola, O., Vostanis, P., Pal Singh Balhara, Y., Avicenna, M., Kandemir, H., et al. (2016). Cross-cultural measurement invariance of adolescent self-report on the Pediatric Quality of Life InventoryTM 40. Journal of Research on Adolescence Official Journal of Social Research Adolescence,26(4), 687–95.
38.
Stevanovic, D., Bagheri, Z., Atilola, O., Vostanis, P., Stupar, D., Moreira, P., et al. (2016). Cross-cultural measurement invariance of the Revised Child Anxiety and Depression Scale across 11 world-wide societies. Epidemiology and Psychiatric Sciences,26(4), 430.CrossRefPubMedPubMedCentral
39.
Kvien, T. K., Heiberg, T., & Hagen, K. B. (2007). Minimal clinically important improvement/difference (MCII/MCID) and patient acceptable symptom state (PASS): What do these concepts mean? Annals of the Rheumatic Diseases,66(Suppl 3), iii40.CrossRefPubMedPubMedCentral
40.
De Los Reyes, A., & Epkins, C. C. (2023). Introduction to the special issue. A dozen years of demonstrating that informant discrepancies are more than measurement error: Toward guidelines for integrating data from multi-informant assessments of youth mental health. Journal of Clinical Child & Adolescent Psychology,52(1), 1–18. https://doi.org/10.1080/15374416.2022.2158843CrossRef