The term autism spectrum disorder (ASD) refers to a class of pervasive developmental disorders characterized by impairments in social interaction, deficits in speech/language and communication development, and restricted, repetitive, and stereotyped behaviors (American Psychiatric Association 2013). The number of children diagnosed with ASD has increased in recent years (Baird et al. 2006; Baron-Cohen et al. 2009; Fombonne 2009) and this increase is associated with growing demands for effective educational services (Kogan et al. 2008). There is thus an increasing need for effective and cost efficient educational interventions for children with ASD.

Currently, there are numerous intervention methods that claim to be effective for educating children with ASD, including various medications, speech/language therapy, assistive technology interventions, sensory integration therapy, music therapy, visual schedules, gentle teaching, holding therapy, special diets, and vitamin supplements (e.g., Goin-Kochel et al. 2007; Green et al. 2006; Hess et al. 2008; Howlin 2005; Simpson 2005). There is insufficient evidence to support the use of most of these interventions (e.g., Howlin 2005; Lang et al. 2012; Mulloy et al. 2010; Simpson 2005; Simpson and Keen 2011). However, a large body of research has demonstrated positive effects from interventions based on the principles of applied behavior analysis (ABA), especially for teaching functional skills and reducing problem behavior in children with ASD (e.g., Matson et al. 1996; Matson and Smith 2008; National Research Council [NRC] 2001; Smith et al. 2007; Vismara and Rogers 2010).

ABA-based approaches often involve teaching single responses in a structured one-to-one teaching paradigm (Duker et al. 2004). This approach, sometimes referred to as discrete-trial training (DTT), has been associated with gains in intellectual functioning, language, and social skills of children with ASD and with reductions in problem behavior (e.g., Eldevik et al. 2009; Lovaas 1987; Peters-Scheffer et al. 2011; Smith 2001; Vismara and Rogers 2010). However, the DTT approach also has some potential disadvantages. First, it has been noted to be relatively time-consuming and costly (Koegel et al. 2003b, 1999c; Smith 2001; Vismara and Rogers 2010). Second, stimulus and response generalization may not occur, without additional generalization programming (Lovaas et al. 1973; Smith 2001; Steege et al. 2007; Stokes and Baer 1977; Vismara and Rogers 2010).

To address these potential limitations of DDT, more naturalistic interventions have been developed (Allen and Cowan 2008). The latter approaches are generally considered to be naturalistic in the sense that they (a) are typically conducted in a variety of natural settings, (b) tend to be more loosely structured than interventions following a DTT format, (c) involve the use of a variety of motivational strategies, such as following the child's lead, (d) incorporate a variety of stimuli, prompts, and natural reinforcers, and (e) target clusters of responses rather than teaching skills involving a single response (Allan and Cowan 2008; Delprato 2001; Koegel et al. 1987a, 1999c). Naturalistic approaches typically include a package of teaching procedures that are often referred to as involving (a) incidental teaching (e.g., McGee et al. 1983, 1985), (b) milieu teaching (e.g., Hancock and Kaiser 2006), (c) the Natural Language Paradigm, or (d) Pivotal Response Treatment (e.g., Koegel et al. 1987b; Koegel and Koegel 2006).

Pivotal Response Treatment (PRT), which evolved from the Natural Language Paradigm (NLP), is described as a comprehensive naturalistic intervention model based on ABA. PRT aims to teach pivotal behaviors to children with ASD in order to achieve generalized improvements in their functioning (Koegel et al. 2006). Pivotal behaviors are described as behaviors that, when targeted, lead to collateral improvements in other—often untargeted—aspects of functioning. Pivotal responses are conceptually related to behavioral cusps. Rosales-Ruiz and Baer (1997) describe behavioral cusps as behaviors in which changes have far-reaching consequences, because those behavior changes expose the individual to new reinforcers, contingencies, and environments. The concepts of pivotal responses and behavioral cusps are similar in that they both aim to facilitate further development by prioritizing target behaviors that lead to widespread behavior change.

So far, research has focused on four aspects of functioning that appear to be pivotal: (a) motivation, (b) self-initiations, (c) responding to multiple cues, and (d) self-management (Koegel et al. 1999a, c, 2001). Motivational procedures are incorporated to teach pivotal behaviors and include: (a) following the child's lead and offering choices, (b) gaining the child's attention, (c) providing clear opportunities to respond, including shared control and turn taking, (d) varying tasks and interspersing maintenance and acquisition tasks, (e) using contingent and natural reinforcement, and (f) reinforcing attempts at target skills (e.g., Dunlap and Koegel 1980; Koegel et al. 1999a, c, 1987a; Koegel and Koegel 2006; Koegel et al. 2001, 1988). A critical feature of PRT is implementation of the intervention in the child's natural environment to promote generalization (Stokes and Baer 1977). Family involvement, in the form of teaching parents and other caregivers to implement the motivational procedures, is also emphasized (Koegel and Koegel 2006).

The extent to which PRT can be considered to be an evidence-based practice has been examined. For example, Simpson (2005) evaluated 33 treatments for children with ASD and concluded that PRT is a scientifically based practice for the education of children with autism. In 2009, the National Autism Center (NAC) also concluded that PRT is an established intervention. Another synthesis of research on PRT concluded that PRT effectively improved social and emotional behaviors of young children with ASD (Masiello 2003). A comparative review, involving studies that compared naturalistic interventions (including NLP and PRT) with DTT, concluded that naturalistic interventions were more effective in teaching language to young children with ASD (Delprato 2001).

Surprisingly, none of these reviews addressed the claim that PRT leads to improvements in untargeted behaviors via the targeting of pivotal behaviors. Thus, it remains unclear as to whether pivotal behaviors are in fact pivotal (Koegel et al. 2001). It is also unclear whether the research on PRT supports the theoretical model of PRT. Furthermore, none of the previous reviews referenced above systematically considered caregiver or staff variables that might impact on PRT implementation (e.g., the extent to which parents can learn to use the techniques and the effects on parental affect or stress). This is a limitation because such variables could influence PRT's effectiveness (Koegel and Koegel 2006; Schreibman et al. 1991; Steiner 2011). In recent years, a large number of studies on the effectiveness of PRT have been conducted, which have not yet been included in previous systematic reviews. Given the limitations of previous reviews and the recent growth in the number of PRT studies, a systematic review on PRT was considered important and timely.

The purpose of this systematic review was to analyze the research on PRT in order to (a) document the range of skills that have been targeted for improvement with PRT, (b) assess the success of PRT for improving the skills of children with ASD (i.e., pivotal skills and untargeted skills), (c) assess the success of PRT for improving the skills of caregivers and staff, (d) evaluate the certainty of evidence arising from these studies, (e) identify limitations of the existing evidence base, and (f) suggest directions for future research.

Method

Search Procedures

To identify studies for inclusion in this review, we searched five electronic databases: Education Resources Information Center (ERIC), Linguistics and Language Behavior Abstracts, Medline, PubMed, and PsychINFO. Publication year was not restricted, but searches were limited to peer-reviewed studies. Within each database, the following parenthetical terms were entered as free text into the keywords field (PRT or pivotal response treatment or pivotal response training or pivotal response therapy or pivotal response intervention or pivotal response teaching or pivotal response or NLP or natural language paradigm) and combined with autis* or ASD or pervasive developmental disorder or PDD-NOS or Asperger.

The abstracts of the studies returned from the electronic database searches were reviewed to determine if the study met the inclusion criteria (see Inclusion and Exclusion Criteria). In addition, following the database searches, hand searches—covering December 2012 to June 2013—were conducted on the journals that had published at least two studies identified for the review from the electronic database searches. Finally, the reference lists of the studies meeting the inclusion criteria were reviewed to identify additional studies for inclusion. Searches of databases, journals, and reference lists occurred from February to June 2013. A total of 441 abstracts were screened for inclusion (see Reliability of Search and Coding Procedures).

Inclusion and Exclusion Criteria

To be included in this review, studies had to meet the following predetermined criteria. First, at least one of the participants had to have been diagnosed with Autistic Disorder, Asperger's Disorder, or Pervasive Developmental Disorder Not Otherwise Specified. Second, the study had to have included an empirical evaluation of either PRT or NLP. In order to meet this criterion, the study had to involve implementation of at least one antecedent motivational technique (i.e., following the child's lead, getting the child's attention, providing a clear opportunity for responding, or interspersing maintenance and acquisition tasks) and one consequent motivational technique (i.e., contingent and natural reinforcement or reinforcement of attempts) and the study had to refer to the intervention as PRT or NLP or explicitly state the specific motivational techniques that were implemented (Koegel and Koegel 2006; Koegel et al. 2010c, 1987b). Third, the study has to have been written in English, Dutch, or German (i.e., languages understood by the authors of this review). Studies were excluded if the motivational techniques of PRT and NLP were implemented, but the intervention evaluated was not referred to as PRT or NLP. For example, Hancock and Kaiser (2002) examined the effects of Enhanced Milieu Teaching (EMT) for developing social communication skills of preschool children with ASD. The milieu teaching procedures included following the child's lead and giving the child access to requested objects (i.e., natural reinforcement). The approach thus shared some of the motivational techniques associated with PRT and NLP. However, the Hancock and Kaiser study was excluded because it did not specifically evaluate either PRT or NLP and because EMT includes additional intervention components not commonly considered inherent to PRT. Studies were also excluded if the motivational techniques of PRT or NLP were implemented, but the purpose of the study was not to evaluate PRT or NLP. For example Sherer and Schreibman (2005) investigated whether a behavioral profile predicted children's response to PRT. Although PRT was implemented, the purpose of the study was not to evaluate PRT. The study was therefore excluded. Ultimately, 43 studies met the inclusion criteria.

Data Extraction

Included studies were summarized in terms of (a) participant characteristics (i.e., characteristics of the children with ASD and characteristics of parents or staff that implemented PRT), (b) dependent variables, (c) intervention procedures, (d) intervention outcomes, including measures on follow-up, generalization and, social validity, and (e) certainty of evidence. Various procedural aspects were also noted, including method of data-collection, implementer, experimental design, inter-observer agreement, and treatment fidelity.

Intervention outcomes of PRT were first summarized as reported by the study's authors. Further, intervention outcomes of PRT were classified as positive, mixed, or negative (e.g., Lang et al. 2012; Machalicek et al. 2008; Palmen et al. 2012). Results were classified as positive in single-case design studies if visual analysis of graphed data revealed that all participants improved on all dependent variables. In studies using a group design, results were classified as positive if the PRT group made statistically significant improvements on all dependent variables. Results were classified as mixed in single-case design studies if some, but not all participants or dependent variables improved. In studies using a group design, results were classified as mixed if the PRT group statistically significant improved on some, but not all dependent variables. Results were classified as negative in single-case studies if none of the participants improved on any dependent variable. In studies using a group design, results were classified as negative if the PRT group did not make statistically significant improvements on any dependent variable.

Certainty of evidence was evaluated for each study by considering several methodological characteristics (e.g., research design) in order to provide an overview of the quality of evidence of research on PRT (Schlosser and Sigafoos 2007). The certainty of evidence for each study was rated as either “suggestive”, “preponderant” or “conclusive”, using the classification system as described by Lang et al. (2012), Palmen et al. (2012), Ramdoss et al. (2011) and Ramdoss et al. (2012). The lowest level of certainty was suggestive evidence. Studies classified as “suggestive” did not evaluate the intervention with an experimental design (e.g., AB-design or intervention-only design). The second level was preponderant evidence. Studies classified as ‘preponderant’ had the following qualities: (a) the study used an experimental design (e.g., group design with random assignment, ABAB-design or multiple baseline design), (b) adequate inter-observer agreement and treatment fidelity were reported (i.e., measured during at least 20 % of the sessions with at least 80 % agreement and fidelity), (c) operational definitions for dependent variables were provided and (d) sufficient details for replication of intervention procedures were provided. However, studies at the preponderant level were limited in their ability to control for alternative explanations for treatment outcomes. For example, if two coinciding interventions (e.g., PRT and DTT) were targeting the same dependent variable and no design feature controlled for the effect of DTT, the study was classified as “preponderant”. The highest level was conclusive evidence. Studies classified as “conclusive” contained all the attributes of the preponderant level, but the study's design also provided at least some control for alternative explanations for treatment outcomes (e.g., a group design with appropriate randomization and blinding or a concurrent multiple baseline design).

Reliability of Search and Coding Procedures

The first and last author of this review independently conducted the database search to check agreement. The reliability of the database search was determined by calculating the percentage of articles identified by both authors out of the total number of identified articles (99 % initial agreement on the database search). A total of 436 articles were identified during the initial database search. The first and last author then independently screened the abstracts of the 436 articles for possible inclusion. The resulting lists of abstracts were compared across co-authors. Agreement as to whether a study should be considered for inclusion was 90 % (i.e., agreement was obtained on 393 of the 436 studies). A total of 136 studies were further screened for possible inclusion in this review applying the inclusion and exclusion criteria. Agreement as to whether a study should be included or excluded was obtained on 114 of the 136 studies (i.e., agreement was 84 %). The disputed articles were then discussed by the co-authors until 100 % agreement was achieved. Next, hand searches, covering December 2012 to June 2013 were conducted for journals that published at least two included studies. This journal search identified one additional study for inclusion. Finally, the reference lists of the included studies were searched and another four studies were identified for inclusion. Agreement on the inclusion of the studies identified via hand searches and reference list searches was 100 %. Ultimately, 43 studies were included in this review.

After the list of included studies was agreed upon, the first author extracted information to develop an initial summary of the 43 included studies. In cases where two studies presented results from the same group of participants, the data from both studies were consolidated into one summary (e.g., Pierce and Schreibman 1997a, b). A total of 39 summaries were developed. To ensure the accuracy of these summaries and to calculate inter-coder agreement on the extraction of data, the last author used a checklist containing five questions: (a) Is this an accurate description of the participants? (b) Is this an accurate description of the dependent variables? (c) Is this an accurate description of the intervention procedures? (d) Is this an accurate description of the intervention outcomes? and, (e) Is this an accurate description of the certainty of evidence? There were 195 items on which there could be agreement of disagreement (i.e., 39 studies with five items per study). Initial agreement was obtained on 184 items (94 %). If a summary was considered inaccurate, the co-authors discussed the study and the summary and made changes. This process was continued until consensus was achieved.

Results

Table 1 summarizes each of the included studies in terms of (a) participant characteristics, (b) dependent variables, (c) intervention procedures, (d) intervention outcomes, and (e) certainty of evidence.

Table 1 Summary of included studies

Participant Characteristics

In 37 of the summarized studies, data on child characteristics were reported. A total of 420 children participated in these studies. The sample size of participants ranged from 2 to 158 with 14 studies involving more than 6 children. Of the 420 children, 298 (71.0 %) were male, 65 (15.4 %) were female and the sex of 57 children (13.6 %) was not reported. Children ranged in age from 1;0 to 12;7 years; months (M = 4;7 years). The majority of the children (n = 221; 52.6 %) were identified as having ASD, but a specific diagnosis was not stated. One-hundred eighty-one children were diagnosed with autism (43.1 %), six with PDD-NOS (1.4 %) and two with Asperger's syndrome (0.5 %). Ten children (2.4 %) did not have a formal diagnosis of ASD, but met the cutoff score for an ASD on the Autism Diagnostic Observation Schedule or Autism Diagnostic Interview—Revised. In addition to ASD, one child also had developmental delays and mental retardation.

Nine studies reported data on caregiver characteristics. A total number of 121 caregivers participated in these studies. Of the 121 caregivers, 22 (18.2 %) were male, 75 (62.0 %) were female and the sex of 24 caregivers (19.8 %) was not reported. The caregivers were mainly the children's parents, but three studies also included a grandparent or one-to-one interventionist (Koegel et al. 2002; Randolph et al. 2011; Symon 2005). Caregiver education level was reported in six studies and ranged from high school to a graduate degree.

In six studies data on staff member characteristics were reported. A total number of 45 staff members participated in these studies. Of the 45 staff members, 1 (2.2 %) was male and 44 (97.8 %) were female. Staff members' years' of experience working in this field was reported in five studies and ranged from 3 months to 17 years. Staff members worked in an educational (n = 40; 88.9 %) or clinical (n = 5; 11.1 %) setting.

Four studies reported data on peer characteristics. A total number of 21 peers participated in these studies. Of the 21 peers, 8 (38.1 %) were male, 5 (23.8 %) were female and the sex of 8 peers (38.1 %) was not reported. Peers were most often typically developing children, but five peers were diagnosed with a specific learning disability, mental retardation or a developmental disability (Kuhn et al. 2008).

Dependent Variables

In 35 studies, child behaviors were targeted. Of these 35 studies, 18 studies targeted a pivotal skill. Seventeen studies targeted self-initiations (e.g., Koegel et al. 2012) and one study targeted motivation (Koegel et al. 2010b). Across studies, a variety of untargeted skills or collateral changes were measured. Thirty-one studies evaluated the effects of PRT on communication and language skills, such as functional verbal utterances (e.g., Minjarez et al. 2011), receptive and expressive language (e.g., Coolican et al. 2010), responding to others (e.g., Kuhn et al. 2008) and maintaining interactions (e.g., Pierce and Schreibman 1997a). Six studies evaluated collateral changes in play skills as a result of PRT (Gillet and LeBlanc 2007; Lydon et al. 2011; Randolph et al. 2011; Stahmer 1995; Pierce and Schreibman 1997b; Thorp et al. 1995). For example, Lydon et al. measured the duration of interaction with toys and the number of play actions and verbalizations. Five studies evaluated the effects of PRT on adaptive functioning (e.g., Baker-Ericzén et al. 2007; Koegel et al. 1999b; Randolph et al. 2011; Smith et al. 2010; Voos et al. 2013), using the Vineland Adaptive Behavior Scales (Sparrow et al. 1984, 2005). Five studies evaluated collateral changes in maladaptive behavior as a result of PRT (Coolican et al. 2010; Gianoumis et al. 2012; Koegel et al. 1992, 2010b; Smith et al. 2010). For example, Gianoumis et al. measured the percentage of trials with maladaptive behaviors (e.g., screaming, crying, and hitting) and Smith et al. used the Child Behavior Checklist (Achenbach and Rescorla 2000) to measure problem behavior. Four studies evaluated the effects of PRT on autism symptoms (Bernard-Opitz et al. 2004; Smith et al. 2010; Steiner et al. 2013; Voos et al. 2013). For example, Smith et al. used the Social Responsiveness Scale (Constantino and Gruber 2005) to identify changes in autism symptoms. Three studies evaluated collateral changes in affect as a result of PRT using rating scales (Koegel et al. 2012; Robinson 2011; Vismara and Lyons 2007). Two studies evaluated collateral changes in cognitive functioning as result of PRT (Smith et al. 2010; Steiner et al. 2013), using the Mullen Scales of Early Learning or the Merrill–Palmer-Revised Scales of Development (Mullen 1995; Roid and Sampers 2004). Two studies evaluated the effects of PRT on academic functioning (Koegel et al. 2010b, 1999b). For example, Koegel et al. (2010b) measured the children's productivity (i.e., rate of assignment units completed) and latency (i.e., number of minutes it took children to begin a task) during writing or math activities. Finally, one study evaluated the effects of PRT on face processing and neural response (Voos et al. 2013) and another study evaluated the effects of PRT on attendance and compliance (Bernard-Opitz et al. 2004).

In 13 studies caregiver behaviors were targeted. Of these studies, nine studies evaluated the effects of caregiver training on caregivers' fidelity of implementation of PRT or NLP (Coolican et al. 2010; Gillet and LeBlanc 2007; Koegel et al. 2002; Minjarez et al. 2013; Nefdt et al. 2010; Randolph et al. 2011; Stahmer and Gist 2001; Steiner et al. 2013; Symon 2005). Additionally, two studies evaluated collateral changes in parental stress as a result of PRT (Minjarez et al. 2013; Smith et al. 2010), using the Parenting Stress Index/Short Form (Abidin 1995). Two studies evaluated the effects of PRT on parental affect using rating scales (Koegel et al. 2002; Schreibman et al. 1991). Two studies evaluated collateral changes in self-efficacy as result of PRT (Coolican et al. 2010; Nefdt et al. 2010) and one study measured empowerment (Minjarez et al. 2013). Finally, one study evaluated the effects of PRT on interactional patterns (Koegel et al. 1996) and another study evaluated the effects of PRT on parent verbalizations (Laski et al. 1988).

Staff behaviors were targeted in seven studies. Of these studies, six studies evaluated the effects of staff training on staff members' fidelity of implementation of PRT or NLP (Gianoumis et al. 2012; Huskens et al. 2012; Robinson 2011; Seiverling et al. 2010; Suhrheinrich 2011; Suhrheinrich et al. 2007). Additionally, Gianoumis et al. (2012) evaluated the effect of staff training on staff members' ability to conduct a stimulus preference assessment. Robinson (2011) measured the duration of staff training and staff members' level of involvement and Koegel et al. (1992) evaluated the instruction and reinforcement provided by a clinician. Kuhn et al. (2008) measured effects of peer training on the number of interaction opportunities created by peers.

Intervention Procedures

PRT was implemented in 25 studies and NLP in seven studies. In two studies, other interventions were implemented; however, these interventions included PRT techniques. Specifically, Koegel et al. (2012) used facilitated social play training and Thorp et al. (1995) implemented socio-dramatic play training. Five studies did not indicate whether PRT or NLP was implemented, but these studies explicitly stated that the specific motivational techniques inherent to PRT were implemented (Koegel et al. 1998a, 2003a, 2010a, b, 1998b).

In 26 studies caregivers, staff members or peers were taught to implement PRT or NLP. The total duration of their training ranged from 66 min to 60 h. In six studies training continued until a mastery criterion was met (e.g., Gillet and LeBlanc 2007). Two studies did not report the duration of training (Schreibman et al. 1991; Suhrheinrich et al. 2007). Caregivers, staff members, or peers were taught individually in 15 studies and in a group in seven studies. Three studies combined group and individual training (e.g., Huskens et al. 2012). In one study, the training format was not reported (Suhrheinrich et al. 2007). The training was implemented by a clinician (i.e., psychologist or therapist) in 16 studies and by an experimenter in six studies. Nedft et al. (2010) used a self-directed learning program to teach parents to implement PRT consisting of an interactive DVD. Three studies did not report who implemented training. Caregiver, staff, or peer training involved a variety of instructional strategies. In 14 studies, a manual was incorporated (e.g., Minjarez et al. 2011) and 15 studies reported to use didactic instruction (e.g., Coolican et al. 2010). Eight studies incorporated video modeling as an instructional strategy and 16 studies incorporated in vivo modeling. Nineteen studies reported to use some form of practice, such as assignments (e.g., Minjarez et al. 2011), role-play (e.g., Pierce and Schreibman 1995) and guided practice (e.g., Randolph et al. 2011). Video feedback was used in four studies and in vivo feedback in 18 studies. Several studies incorporated additional instructional strategies, such as small group discussions (Smith et al. 2010), assessments (e.g., Seiverling et al. 2010), picture prompts (e.g., Harper et al. 2008) and reinforcement (Kuhn et al. 2008). Stahmer and Gist (2001) investigated the addition of a parent information support group to PRT parent training.

In 23 studies, the PRT or NLP intervention was implemented by caregivers, staff members or peers. In ten studies, a clinician implemented the intervention and in two studies an experimenter. In three studies, the intervention was implemented by a parent as well as a clinician. One study did not report the implementer (Koegel et al. 2010b). Across studies a variety of PRT techniques were used. In 35 studies following the child's choice was incorporated. Nine studies incorporated getting the child's attention. In 29 studies, providing clear opportunities for responding was used. Twenty-four studies used task variation and interspersal of maintenance and acquisition tasks. Natural reinforcement was incorporated in 33 studies. Of these studies, 18 studies also incorporated contingent reinforcement and 15 studies did not report whether contingent reinforcement was used. In 29 studies, reinforcement of attempts at target behaviors was used. Two studies incorporated all seven PRT techniques (Minjarez et al. 2011; Suhrheinrich 2011). Several studies incorporated additional intervention strategies, such as multiple cues (e.g., Pierce and Schreibman 1997b), modeling of target response (Stahmer 1995), prompting (e.g., Koegel et al. 2012), prompt fading (e.g., Koegel et al. 2010a), time delay (Koegel et al. 1998a), and narrative play (e.g., Harper et al. 2008).

Intervention Outcomes

Of the 35 studies targeting child behaviors, 15 studies (42.9 %) reported positive outcomes and 20 studies (57.1 %) reported mixed outcomes. Of the 13 studies targeting caregiver behaviors, 7 studies (53.8 %) reported positive outcomes and 5 studies (38.5 %) reported mixed outcomes. One study did not report intervention outcomes concerning caregivers (Smith et al. 2010). Of the seven studies targeting staff behavior, four studies (57.1 %) reported positive outcomes and three studies (42.9 %) reported mixed outcomes. The study targeting peer behaviors reported positive outcomes. None of the included studies reported negative outcomes.

Thirteen of the 39 studies (33.3 %) included data on follow-up. The length of the period between intervention and follow-up ranged from 2 weeks to 11 months. Generalization of intervention outcomes was measured in 22 studies (56.4 %). Generalization was measured across stimuli in eight studies (e.g., Thorp et al. 1995), across persons in eight studies (e.g., Robinson 2011), across conditions in three studies (e.g., Koegel et al. 2012) and across settings in 13 studies (e.g., Symon 2005). In ten studies (25.6 %) measures of social validity were conducted. All studies used a questionnaire to measure social validity (e.g., Huskens et al. 2012).

Certainty of Evidence

Six studies (15.4 %) were classified as providing a conclusive level of certainty (Gianoumis et al. 2012; Huskens et al. 2012; Laski et al. 1988; Randolph et al. 2011; Robinson 2011; Seiverling et al. 2010). All six studies reported mixed intervention outcomes for children. These studies targeted self-initiations (n = 3), communication and language skills (n = 5), play skills (n = 1), adaptive functioning (n = 1), maladaptive behavior (n = 1), and affect (n = 1). Adaptive functioning did not improve and only one child improved on affect, but improvements on the other targeted skills were reported for the majority of the children across studies. The two studies targeting caregiver behaviors also reported mixed intervention outcomes. These studies targeted fidelity of implementation and parent verbalizations. Of the four studies targeting staff behavior, three studies reported positive intervention outcomes and one study reported mixed outcomes with regard to fidelity of implementation. One study reported positive intervention outcomes with regard to level of involvement.

Eleven studies (28.2 %) were rated as providing a preponderant level of certainty (Coolican et al. 2010; Gillet and LeBlanc 2007; Koegel et al. 1998a, b, 2010a, 2012, 2002; Nefdt et al. 2010; Pierce and Schreibman 1997a; Symon 2005; Thorp et al. 1995). Of these studies, seven studies were classified as “preponderant”, because they provided limited control for alternative explanations of intervention outcomes. Specifically, five of these studies did not control for history due to use of a non-concurrent multiple baseline design (Carr 2005). One study did not control for interaction effects due to the small number of baseline probes between treatment conditions (Koegel et al. 1998b) and one study did not control for several threats to internal validity due to unstable baselines (Pierce and Schreibman 1997a). Four studies were classified as “preponderant”, because treatment fidelity was not reported or operational definitions for some dependent variables were not provided, although the study's design controlled for alternative explanations (e.g., Nefdt et al. 2010). Of the 11 studies classified as “preponderant”, six studies reported positive intervention outcomes for children and five studies reported mixed intervention outcomes for children. The studies reporting positive outcomes targeted self-initiations (n = 3), communication and language skills (n = 6), and affect (n = 1). The studies reporting mixed outcomes targeted self-initiations (n = 4), communication and language skills (n = 5), play skills (n = 3), and maladaptive behavior (n = 1). Of the five studies classified at this level targeting caregiver behaviors, four studies reported positive intervention outcomes. These studies targeted fidelity of implementation (n = 4), self-efficacy (n = 1), and parental affect (n = 1). One study reported mixed intervention outcomes and targeted fidelity of implementation and self-efficacy.

Twenty-two studies (56.4 %) were classified as providing a suggestive level of certainty. Of these studies, 19 studies were classified as “suggestive”, because they used a pre-experimental (n = 9) or quasi-experimental (n = 10) design. For example, Harper et al. (2008) used a multiple baseline design across only two participants, but a multiple baseline design should include at least three participants to demonstrate experimental control (Horner et al. 2005). Therefore, the design was rated as “quasi-experimental” and the study was classified as “suggestive”. Three studies used an experimental design, but were nevertheless classified as “suggestive”, because some dependent variables were not operationally defined, details on intervention procedures were insufficient to enable replication, treatment fidelity was not reported and/or inter-observer agreement was not adequate (Koegel et al. 2010b; Stahmer 1995; Stahmer and Gist 2001). Of the 22 studies classified as “suggestive” that targeted child behaviors, nine studies reported positive intervention outcomes and nine studies reported mixed outcomes. Three of the five studies that were classified as “suggestive” and targeted caregiver behaviors reported positive intervention outcomes and two studies reported mixed outcomes. Of the three studies classified as “suggestive” that targeted staff behaviors, one study reported positive interventions outcomes and two studies reported mixed outcomes. The study targeting peer behaviors reported mixed outcomes.

Discussion

This systematic review aimed to evaluate the evidence base of PRT for improving the skills of children with ASD, caregivers and staff members, to identify limitations of the existing evidence-based, and to suggest directions for future research. A systematic search identified 43 studies, indicating that the effectiveness of PRT has been extensively investigated. The majority of these studies were classified as providing a suggestive level of evidence. Below, the results of this systematic review are discussed for children with ASD and caregivers and staff members.

Children with ASD

The results of this systematic review indicate that the majority of children with ASD that were included in the reviewed studies were taught to self-initiate through PRT. However, there is yet insufficient evidence to conclude that PRT results in improvements in non-targeted pivotal skills, because motivation was evaluated in only one study that provided a suggestive level of evidence (i.e., Koegel et al. 2010b) and responding to multiple cues and self-management were not evaluated in any of the included studies. Furthermore, the results of this systematic review suggest that PRT results in collateral improvements in language and communication skills (e.g., functional verbal utterances, language, and maintaining interactions) and play skills for the majority of children with ASD. Moreover, for some children, PRT also resulted in changes in affect and reductions of maladaptive behavior. However, there is insufficient evidence to conclude that adaptive functioning, autism symptoms, cognitive functioning, and academic functioning improve as a result of PRT, because none of the studies that were classified as proving conclusive or preponderant evidence reported improvements in these skills.

The results of this systematic review provide insight into what extent research supports the theoretical model of PRT (i.e., targeting pivotal skills using PRT techniques results in widespread improvements in other aspects of functioning). Of the four skills that are considered to be pivotal, only self-initiations have been studied in detail. This systematic review indicates that for a number of children with ASD, increases in self-initiations as a result of PRT are accompanied by collateral improvements (i.e., increases in communication and language skills, play skills and affect and reductions in maladaptive behavior). Thus, the research reviewed here does provide some support for the theoretical model of PRT. However, as motivation, responding to multiple cues, and self-management were rarely measured in the studies included in this review, it is not clear whether these skills improve as a result of PRT, whether improvements in these skills are accompanied by collateral changes, and thus whether these skills could be considered pivotal.

It should be noted that motivation itself is difficult to define operationally, which could explain why motivation was rarely measured. Koegel et al. (2001) defined motivation in terms of the effects of improved motivation (i.e., increased responsiveness to social and environmental stimuli), such as increases in the number of responses to teaching stimuli, decreases in response latency, and changes in affect. However, none of the studies that evaluated these behaviors considered these behaviors as an effect of improved motivation. There is no clear explanation for the lack of studies that evaluated responding to multiple cues. However, some studies implemented “using multiple cues” as a PRT technique (e.g., Pierce and Schreibman 1997b), suggesting that this pivotal skill was targeted, but seemingly not measured. The lack of studies that evaluated self-management can be explained by the fact that the studies identified during the database search that involved self-management did not refer to their intervention as PRT or NLP nor did they implement the PRT techniques (e.g., Koegel and Frea 1993; Loftin et al. 2008). It could be considered a limitation of this systematic review that the inclusion criteria did not comprise studies regarding self-management. However, self-management is also considered a separate intervention that incorporates specific techniques (e.g., NAC 2009), suggesting that self-management is not a distinguishing component of PRT.

Although the skills of many children improved as a result of PRT, it should be noted that a considerable number of children did not improve significantly, as indicated by the large number of studies that reported mixed results. This variability in outcomes is not unique to PRT and is consistent with results of evaluations of behavioral interventions more generally (Peters-Scheffer et al. 2011; Reichow 2012). Research on predictors of outcomes from behavioral interventions suggests that outcomes are related to children's age (e.g., Granpeesheh et al. 2009; Perry et al. 2013), language proficiency (e.g., Sallows and Graupner 2005), pre-treatment IQ (Perry et al. 2013), severity of autism symptoms (e.g., Ben-Itzchak and Zachor 2011), parental stress (Osborne et al. 2008; Strauss et al. 2012), and parental treatment fidelity (Strauss et al. 2012). Research concerning predictors of outcomes of PRT is limited, but a study by Sherer and Schreibman (2005) suggested that response to PRT was predicted by toy contact, approach, and avoidant behaviors, and verbal and nonverbal self-stimulatory behaviors. However, in order to estimate whether a child is likely to benefit from PRT, additional research is warranted to confirm the influence of these potential predictor variables and to identify other predictors of PRT outcomes.

The results of this systematic review further demonstrated a large variability in the PRT techniques that were implemented across studies and revealed that only two studies incorporated all PRT techniques (i.e., Minjarez et al. 2011; Suhrheinrich 2011). In particular, “gaining the child's attention and using contingent reinforcement” were often not incorporated or not specifically reported. This could be explained by the fact that researchers do often consider these technique as techniques that are automatically implemented when a clear opportunity to respond or natural reinforcement are provided (e.g., Koegel et al. 2002; Symon 2005). However, when assuming that the studies that incorporated these techniques also incorporated “gaining the child's attention” respectively “using contingent reinforcement”, the number of studies that incorporated all PRT techniques only slightly increases to five studies, suggesting that there is notable variability and/or flexibility regarding the combination of intervention components that constitute PRT.

Overall, with respect to the effect of PRT on child's behavior, we found evidence that supports the effectiveness of PRT and the theoretical model of PRT. However, future research should strengthen and extend the existing evidence base and provide additional support to the theoretical model of PRT. There are several specific directions for future research. First, studies should use true experimental designs to improve the certainty of evidence. Specifically, researchers should ensure that single-case designs replicate intervention effects across at least three participants and that group designs include a control group and randomly assign participants to groups to demonstrate experimental control (Horner et al. 2005; Black 1999). Second, pivotal skills should be defined operationally and measured systematically across studies. Third, future research should rigorously evaluate collateral changes in skills that are currently not investigated or investigated without using true experimental designs. Evidence for changes in these skills would extend the evidence base of PRT and support the claim that PRT results in widespread improvements in children (Koegel and Koegel 2006). Fourth, future research should investigate which characteristics predict the effectiveness of PRT. Finally and possibly most important, future research should seek to determine the components that define PRT and distinguish PRT from other interventions (e.g., EMT), because of the variability in the combination of PRT techniques across studies and the overlap between PRT and other interventions.

Caregivers and Staff Members

The results of this systematic review suggest that caregivers and staff members can be taught to implement PRT techniques effectively using an individualized training approach that combines several well-used instructional strategies (e.g., modeling, guided practice, reinforcement/feedback). This finding is consistent with results of previous reviews on caregiver and staff training (e.g., Lang et al. 2009; Patterson et al. 2012; Rispoli et al. 2011). However, the results of this systematic review also indicate a number of gaps in the current existing evidence base. First, the duration of training varied greatly across studies, indicating that it is unclear how much training caregivers and staff members need to correctly implement PRT techniques. Second, as studies incorporated a combination of instructional strategies or demonstrated mixed results with regard to the effectiveness of a single strategy (Huskens et al. 2012), it is not clear if certain instructional strategies are more effective than others to teach PRT techniques. Finally, it is not clear whether group training is effective, because the studies that evaluated the effectiveness of group training separately provided a suggestive level of evidence (Minjarez et al. 2013; Stahmer and Gist 2001). To increase the effectiveness and cost efficiency of caregiver and staff training in PRT, future research should seek to determine which training format, instructional strategies, and duration of training are most effective and efficient to teach caregivers and staff members to correctly implement PRT techniques.

Although most caregivers and staff members were able to correctly implement PRT techniques, some caregivers and staff members within some studies did not meet the criterion for fidelity of PRT implementation or did not maintain the use of PRT techniques (Coolican et al. 2010; Huskens et al. 2012; Randolph et al. 2011). These mixed results cannot be explained by training characteristics, because these characteristics did not vary within studies. However, research shows that fidelity of intervention implementation can be affected by certain staff characteristics, such as personality, attitude towards an intervention and individuals with disabilities, and perceived child–staff member relationship (Durlak and DuPre 2008; Peters-Scheffer et al. 2013), but it is not clear whether these staff characteristics also predict the fidelity of PRT implementation. Currently, research regarding the influence of parent characteristics on treatment fidelity is limited (Randolph et al. 2011). Research demonstrates that parent's level of education, family income or socioeconomic status, and parental stress affect children's intervention outcomes (e.g., Osborne et al. 2008; Reyno and McGrath 2006; Strauss et al. 2012), but it is unclear whether these caregiver characteristics also affect caregivers' fidelity of implementation. Therefore, future research should investigate whether certain caregiver and staff member characteristics predict the fidelity of PRT implementation.

The results of this systematic review indicate that there is limited evidence for collateral changes in caregivers' affect, verbalizations and self-efficacy, and staff members' level of involvement as a result of PRT. There is yet insufficient evidence to conclude that PRT results in collateral changes in caregivers' stress, empowerment, and interactional patterns. Because the current evidence base is limited, additional research regarding collateral changes in caregiver and staff behavior is warranted.

Conclusion

This systematic review found evidence to support the use of PRT for increasing self-initiations. Collateral improvements were found in communication and language, play skills, affect, and reductions in maladaptive behavior for a number of children. The overall results of this review provide some support for the claimed effectiveness of PRT and for the theoretical model of PRT. However, the majority of studies (56.4 %) provided only suggestive evidence due to methodological limitations. Also, while this systematic review suggests that caregivers and staff members were able to implement PRT techniques, evidence for collateral improvements in caregivers' and staff members' behaviors remains sparse. Future research that uses true experimental designs is necessary to strengthen and extend the evidence base for PRT, to determine child, caregiver, and staff characteristics that predict the effectiveness of PRT and the fidelity of implementation of PRT and to determine the components that define PRT and distinguish PRT from other interventions.