A critical review of outcome measures used to evaluate the effectiveness of comprehensive, community based treatment for young children with ASD
Introduction
Autism Spectrum Disorder (ASD) is diagnosed at younger ages and with increased frequency, with current estimates that approximately 1% of school aged children (Blumberg et al., 2012) meet the diagnostic criteria of qualitative impairment in communication and socialization skills, as well as the presence of repetitive behavioral mannerisms that interfere with daily functioning (American Psychiatric Association, 2013). As the number of children with ASD increases, so has public pressure to provide evidence based treatment. However, though evidence based treatments are established within research settings (e.g., Makrygianni & Reed, 2010) the gap between research and practice is large (Dingfelder and Mandell, 2011, Kasari and Smith, 2013) and despite significant costs associated with treatments in community settings, little is known about how well many treatments developed in research settings generalize into the community. For example, Amendah, Grosse, Peacock and Mandell (2012) estimate costs of $25,099 to $60,000 + per person, per year for behavioral therapies. In Canada, provincial governments are spending up to $40,000 per child, per year on therapies for children with ASD (Madore & Pare, 2006). Lifetime costs are even greater with recent estimates of $2.4 million in the United States and £1.5 million in the United Kingdom (Buescher, Cidav, Knapp, & Mandell, 2014).
The distinction between treatment efficacy and effectiveness has important implications for bridging research and clinical practice. Treatment efficacy is demonstrated through completion of replicable studies in highly controlled research settings, whereas treatment effectiveness is the demonstration of the generalizability of efficacious treatments into community settings (Greenberg, 2004). Our understanding of effective treatments for ASD has been consolidated through several extensive and systematic reviews (e.g., National Autism Center, 2009, Wong et al., 2014) yet, the evidence base for ASD treatment effectiveness is still an emerging field. In addition, predicting what intervention will work best for an individual child, the specificity of the intervention targets, and the individual responsiveness of ASD symptomology to treatment is still unknown, particularly as knowledge is generalized out of university contexts into the community (Minjarez, Williams, Mercier & Hardan, 2011). Moreover, in the context of implementation science, the lag time between the development of an efficacious practice and its eventual adoption is still estimated to be as high as 20 years (Walker, 2004).
One contributing factor to the slow adoption of efficacious practice is the lack of consensus on measurement tools and wide usage of different instruments (Bolte & Diehl, 2013). Matson (2007) reports that measures of intelligence and adaptive functioning are used most frequently, though a sole focus on these two constructs in the measurement of ASD treatment response can be problematic. For example, regarding cognition, Matson (2007) reports that (1) children often age out of Intelligence Quotient (IQ) measures from pre to post test, forcing substitution of a different IQ instrument normed on an older population, (2) it is not clear whether ASD intervention results in increased scores due to increased compliance, attention, motivation or ability, (3) IQ tests are less reliable at predicting future performance for children at young ages, (4) comorbid psychopathologies may interfere with measurement of the underlying constructs and, (5) IQ tests are not normed on an ASD population. Adaptive measures, though valuable, are normed primarily on the typical developing population (Sattler, 2006), not designed specifically for this and consequently only provide a reference point for identifying delays and strengths in the ASD population.
According to Gould, Dixon, Najdowski, Smith, and Tarbox (2011) measurement outcomes of intensive ASD programs should be: (1) comprehensive, (2) target early childhood development, (3) consider behavior function, (4) directly link assessment items to curricula targets, and (5) be used to track child progress over time. Gould et al. (2011) indicate that a combination of direct observation and indirect assessment (e.g., rating scales and checklists) is an ideal manner to track outcomes. However, after reviewing 27 different tools that may be used to measure ASD intervention progress, they were not able to identify any specific tool that met their five criteria. Four tools identified as being ‘of promise’ were the Verbal Behavior Milestones Assessment and Placement Program [VB MAPP] (Sundberg, 2008), the Brigance Diagnostic Inventory of Early Development II [Brigance IED II] (Brigance, 2004), the Vineland Adaptive Behavior Scales Second Edition [VABS II] (Sparrow, Cichchetti, & Balla, 2005), and the Brigance Diagnostic Comprehensive Inventory of Basic Skills Revised [CIBS R] (Brigance, 1999). The VABS was described as “by far the most popular assessment” (p. 998). To strengthen these tools, Gould et al. (2011) recommended simplified administration of the VB MAPP, increased psychometric evaluation of the VB MAPP and Brigance IED II, and content linking of the VABS and CIBS R more clearly to a curriculum.
Bolte and Diehl, 2013 reflected the lack of consensus on measurement tool selection to evaluate treatment response in their review of 195 prospective ASD treatment trials from 2001 to 2010. They identified 289 unique measurement tools, of which the vast majority (61.6%) were only used once. The top five utilized tools reported in this review were the Aberrant Behavior Checklist [5%] (Aman, Singh, Stewart, & Field, 1985), Clinical Global Impressions [4.6%] (Guy 1976), VABS [3.9%] (Sparrow, Balla, & Cicchetti, 1984), investigator designed video observations (1.9%), and the Bayley Scales of Infant Development [1.7%] (Bayley, 1993). Bolte and Diehl, 2013 concluded that “greater consistency in the use of measurement tools in ASD clinical trials is a worthwhile and achievable goal” (p. 2499) as the sheer number of tools and tool symptom combinations make comparison between studies difficult. Improved consistency in test use is important as this will allow for more nuanced and detailed comparisons across program models and funding jurisdictions, allow for a better understanding of how different treatment models influence different areas of child functioning, and lead to improved efficiency in the test selection and completion process for research participants. In summary, based on this review of the literature, one can conclude there is currently no single assessment measure that will capture all aspects of an intensive ASD treatment program, and a combination of consistently used outcome tools is of value.
Measurement challenges have been documented for decades in the ASD treatment literature. For example, much of the literature for ASD treatment began with a seminal study by Lovaas (1987) who reported that up to 47% of preschool children diagnosed with ASD could achieve average scores on standardized measures of intelligence after receiving 40 hours per week of behaviorally based intensive intervention for at least 2 years. This model has been identified as the UCLA Young Autism Project (UCLA YAP) model and still influences much of the current ASD literature (Reichow & Wolery, 2009).
Lovaas (1987) used multiple instruments to evaluate treatment effectiveness including four different measures of intelligence that were combined into an IQ “estimate” of mental age, direct recording of behavior and language, and post intervention classroom placement at 6 or 7 years of age. The pre measures of intelligence were diverse and the Vineland Social Maturity Scale (Doll, 1953), an earlier version of the VABS, was used to estimate the mental age for participants that were deemed to be untestable. Post treatment measures were equally diverse and included up to six different cognitive measures, with unclear rules for allowable test substitution.
Jordan, Jones, and Murray (1998) and Howlin (1997) identified numerous measurement shortcomings of Lovaas (1987) including (1) the use of different measures before and after treatment, (2) measures that may not reflect important areas of difficulty in addressing ASD, (3) non adherence to standard assessment protocols, (4) lengthy delays between program delivery and outcome assessment, and (5) the use of prorated mental age, a psychometrically weak metric. Eikeseth (2001) responded to these criticisms by identifying (1) that no one single IQ test covers the age range needed for lengthy interventions, (2) there is high overlap of up to 75% between ASD and mental retardation (citing Lord & Schopler, 1989), (3) the standardized process used would have penalized higher IQ scores, biasing against the intervention (4) the lengthy delay between intervention and final testing would also bias against the intervention and (5) that ratio IQ is a conservative measure of this construct. Despite these methodological and practical concerns, the same tools and methodological challenges continue to be present in many current studies, and researchers continue to report benefit for the children with ASD that participate in the UCLA YAP model (e.g., Cohen, Amerine-Dickens, & Smith, 2006; Howard, Sparkman, Cohen, Green, & Stanislaw, 2005).
In addition to the UCLA YAP behavior based model, developmentally influenced ASD models also exist, though these models are less prominent in the research literature. These models include the Treatment and Education of Autistic and Related Communication Handicapped Children [TEACCH] (Mesibov, Shea, & Schopler, 2005), the Early Start Denver Model [ESDM] (Dawson et al., 2010), the Joint Attention and Symbolic Play/Engagement and Regulation [JASP/ER] (Kasari, Freeman, & Paparella, 2006), and inclusive classroom models such as Learning Experiences and Alternative Program for Preschoolers and Their Parents [LEAP] (Strain & Bovey, 2011) or the Children’s Toddler School [CTS] (Stahmer & Ingersoll, 2004). No review or critique of the outcomes measured used to evaluate these models has identified whether measurement concerns parallel those found in the behaviourally oriented intervention research.
This review builds on the previous body of work related to measurement issues in ASD treatment effectiveness literature by systematically identifying and recording the degree to which documented measurement concerns continue in published ASD literature evaluating treatment effects. Ethical practice guidelines established by the American Educational Research Association, American Psychological Association, & National Council on Measurement in Education (AERA, APA, & NCME, 1999) are used to identify strengths and weaknesses in clinical assessment and reporting requirements. We focus specifically on comprehensive, community based (i.e., that reference a preschool, nursery, home or other community setting), outcome effectiveness, group based design ASD studies for young children, as this is where treatment effectiveness is ultimately demonstrated. Given the high volume of ASD studies, those that are single subject design, instrument related reviews, parent education programs, diagnostic studies, or general program descriptions are excluded. Specific aims are to: (1) identify whether the earlier criticisms of instrument usage (Howlin, 1997, Jordan et al., 1998) in initial ASD efficacy studies have been resolved, (2) identify the dominant instruments used to measure ASD treatment gains and evaluate their construct validity as primary measures for ASD, and (3) develop a standardized checklist to evaluate strengths and weaknesses in test administration, use and reporting requirements for ASD program effectiveness based on a best practice framework for evaluating tool selection, the Standards for Educational and Psychological Testing (AERA, APA, & NCME, 1999).
Section snippets
Best practices framework for evaluating tool selection and use
In order to guide the measurement evaluation process, a best practice framework that guides psychological and educational instrument selection and utility for program outcomes was adopted. This framework was guided by the fourth version of the Standards for Educational and Psychological Testing (AERA, APA, & NCME, 1999). The Standards provides minimum guidelines around test development and test use to ensure ethical assessments, jointly published since 1966, to improve ethical test use and
Instrument classifications
For the first level of analysis, each study was reviewed and all of the outcome instruments identified, resulting in a total of 53 different standardized outcome instruments. However, similar to findings by Bolte and Diehl, (2013), 24/53 (45.28%) instruments appeared in only one study. As the purpose of this review is to provide an overview of those tools commonly being used in the ASD effectiveness literature only the 21 instruments that appeared in a minimum of three studies are listed in
Discussion
The purpose of this paper was to provide a critical measurement review of recent community effectiveness outcome studies of treatments for young children with ASD to identify (1) if early criticism of instrument usage in ASD efficacy studies have been resolved, (2) to report on the dominant instruments used to measure ASD treatment gains and evaluate their construct validity as primary measures for ASD, and (3) to develop a standardized checklist to evaluate strengths and weaknesses in ethical
References (66)
- et al.
A review of assessments for determining the content of early intensive behavioral intervention programs for autism spectrum disorders
Research in Autism Spectrum Disorders
(2011) - et al.
A comparison of intensive behavior analytic and eclectic treatments for young children with autism
Research in Developmental Disabilities
(2005) - et al.
A meta-analytic review of the effectiveness of behavioural early intervention programs for children with autism spectrum disorders
Research in Autism Spectrum Disorders
(2010) Determining treatment outcome in early intervention programs for autism spectrum disorders: a critical analysis of measurement issues in learning based interventions
Research in Developmental Disabilities
(2007)- et al.
Comorbid psychopathology with autism spectrum disorder in children: an overview
Research in Developmental Disabilities
(2007) Parenting stress index
(1995)Manual for the child behavior checklist
(1991)- et al.
The aberrant behavior checklist: a behavior rating scale for the assessment of treatment effects
American Journal of Mental Deficiency
(1985) - et al.
Outcome measures for clinical drug trials in autism
CNS Spectrums
(2004) - et al.
The economic costs of autism: a review
Standards for educational and psychological testing
Diagnostic and statistical manual of mental disorders
Bayley scales of infant development
Bayley scales of infant development
Administration manual
Changes in prevalence of parent-reported autism spectrum disorder in school-aged U.S. children: 2007 to 2011–2012
National health statistics reports, 65
Measurement tools and target symptoms/skills used to assess treatment response for individuals with autism spectrum disorder
Journal of Autism and Developmental Disorders
Clinical assessment of behavior
Brigance diagnostic comprehensive inventory of basic skills revised
Brigance diagnostic inventory of early development-II
Scales of independent behavior-revised comprehensive manual
Costs of autism spectrum disorders in the United Kingdom and United States
Journal of the American Medical Association Pediatrics
The Vineland Adaptive Behavior Scales: Supplementary norms for individuals with autism
Journal of Autism and Developmental Disorders
Early intensive behavioral treatment: replication of the UCLA model in a community setting
Developmental & Behavioral Pediatrics
The PDD behavior inventory
Social responsiveness scale-2nd edition (SRS 2)
Randomized, controlled trial of an intervention for toddlers with autism: the early start Denver model
Pediatrics
Bridging the research-to-practice gap in autism intervention: an application of diffusion of innovation theory
Journal of Autism and Developmental Disorders
The measurement of social competence
Recent critiques of the UCLA young autism project
Behavioral Interventions
The MacArthur communicative development inventories: user’s guide and technical manual
Current and future challenges in school-based prevention: the researcher perspective
Prevention Science
The clinical global impression scale
ECDEU assessment manual for psychopharmacology-revised
Parenting challenges in families of children with autism: a pilot study
Issues in Comprehensive Pediatric Nursing
Cited by (8)
Methodological considerations in the use of standardized motor assessment tools for children with autism spectrum disorder: A scoping review
2022, Research in Autism Spectrum DisordersCitation Excerpt :Indeed, studies that examined the relationship between IQ and motor difficulties of children with ASD using standardized assessment tools, revealed inconsistencies (e.g., Hirata et al., 2015), suggesting the need to reconsider the most suitable way to determine the intellectual abilities of children with ASD, rather than setting IQ-related exclusion criteria. The studies identified here varied considerably in how they determined IQ, as previously described (e.g., Stolte, et al., 2016). Some measured it directly, while others relied on previous data.
Evaluating outcomes within culturally diverse contexts for children and youth with developmental disabilities
2022, International Review of Research in Developmental DisabilitiesCitation Excerpt :Additional data on the validity of outcome measures should be reported for the specific cultural and linguistic groups. To date, numerous reviews have been published describing the complexities that arise when evaluating outcomes for children and youth with DD (Howell, Bradshaw, & Langdon, 2021; Imms et al., 2016; Steinhausen, Mohr Jensen, & Lauritsen, 2016; Stolte, Hodgetts, & Smith, 2016; Thurm, Kelleher, & Wheeler, 2020). However, much of this research has not evaluated whether outcomes for children and youth with DD are culturally responsive or relevant.
Efficacy of parent-training programs for preschool children with autism spectrum disorder: A randomized controlled trial
2020, Research in Autism Spectrum DisordersCitation Excerpt :In addition, as the FEAS was established based on the DIR model, it is more likely to detect improvements in the DIR-based parent-training program. Stolte, Hodgetts, and Smith (2016) reported that evaluating treatment efficacy using measures related to targeted areas of function rather than using standardized outcome measures is one source of assessment bias. Moreover, the developmental levels described in the FEAS are more focused on social interaction with others and emotional functioning, which are areas of difficulty for children with ASD.
Measures Used to Assess Treatment Outcomes in Children with Autism Receiving Early and Intensive Behavioral Interventions: A Review
2023, Review Journal of Autism and Developmental DisordersModerate Effects of Low-Intensity Behavioral Intervention
2020, Behavior Modification