Social skills training (SST) is one of the most common interventions to help address social deficits in individuals with autism spectrum disorder (Interactive Autism Network Research Findings 2011). Traditional SST teaches children with ASD to interact with their peers by providing face-to-face, in vivo instruction on conversation, friendship, and problem-solving skills. Programs aiming to improve social skills rely primarily on child-facilitator interaction and the need for trained facilitators is one of the primary barriers to treatment (Gordon-Lipkin et al. 2016). Novel methods of SST delivery include the use of Behavioral Intervention Technologies (BITs), technology-based interventions aimed at producing positive behavioral and psychological changes (Mohr et al. 2013) as either an adjunct to or a replacement for face-to-face interventions. Limited information is available regarding how these programs perform compared to traditional face-to-face social skills training (F2F-SST). The goal of the present research is to use meta-analytic methods to compare preliminary evidence for BITs social skills training (BITs-SST) to F2F-SST.

F2F-SST groups provide a structured environment to learn and practice social skills with peers (e.g., Laugeson et al. 2009). Previous meta-analyses and systematic reviews found that traditional F2F-SST can be effective in improving social competence and friendship quality and decreasing loneliness (Bellini et al. 2007; Gates et al. 2017; Matson et al. 2007; Reichow et al. 2012; Spain and Blainey 2015; White et al. 2007). While research suggests that F2F-SST improves social deficits in children with ASD with effect sizes in the medium range (ES = 0.47–0.51; Gates et al. 2017; Reichow et al. 2012), some systematic reviews have critiqued the empirical support of SST for individuals with ASD due to the absence of large-scale group studies (Cappadocia and Weiss 2011; Matson et al. 2007; Rao et al. 2008).

There are a number of barriers to accessing F2F-SST for families with children and adolescents with ASD. F2F treatments require a clinician, transportation, and time-intensive training. Difficulties in accessibility are exacerbated by a national shortage of providers for ASD services, a problem which disproportionately impacts minorities and individuals of lower socioeconomic status (Gordon-Lipkin et al. 2016; Liptak et al. 2008; Magaña et al. 2013; Thomas et al. 2007). Technology can potentially alleviate financial stress by increasing access to treatment at more convenient times and for a lower cost (Horlin et al. 2014). Technology may help mitigate barriers to accessing comprehensive ASD services as an alternative or adjunct treatment for families who are uninsured or under-insured (Young et al. 2009). Remote access to treatment may also provide a practical solution for parents with financial difficulties or those who cannot afford to miss work to attend treatment sessions. Finally, technology for delivering treatments has become increasingly important during the COVID-19 pandemic, as social distancing requirements have restricted the opportunity for in-person therapeutic interactions.

In recent years, BITs have been used to teach social skills to children and adolescents with ASD (Hopkins et al. 2011; Rice et al. 2015; Thomeer et al. 2015; Wieckowski and White 2017; Yun et al. 2017). Social skills training delivered by BITs relies primarily on child-technology interactions (e.g., engaging with a smartphone application game, reading material, pointing and clicking on interactive features) to teach desired social behaviors, and typically employ at least some human support to monitor participants (Wieckowski and White 2017). Research suggests that BITs-SST decreases social deficits in children with ASD, with effect sizes in the trivial to the large range (ES = 0.29–1.0; Hopkins et al. 2011; Rice, Wall, Fogel, & Shic, 2015; Thomeer et al. 2015; Yun et al. 2017). Some possible advantages of BITs-SST include the reduction of anxiety caused by social interactions, the ability to have minimal distractions, the opportunity to utilize multiple virtual contexts to practice a variety of socials skills, and the ability to reduce instructor fatigue (Strickland 1996; Parsons and Mitchell 2002; Wieckowski and White 2017). While there have been concerns that gains may not be generalized to in-person social settings (Latash 1998) or may promote social avoidance (Wieckowski and White 2017), two comprehensive reviews reported that the interactive nature of BITs provides flexible and realistic role-play scenarios, allows participants to practice social skills in a safe setting, and supports generalizability (Parsons and Mitchell 2002; Wieckowski and White 2017).

There are also barriers to developing and delivering BITs for SST. The cost of developing BITs-SST may be expensive due to the programming costs or require various levels of human support, thus increasing the cost (Schueller et al. 2017; Wieckowski and White 2017). Although BITs-SST have the potential to deliver wide-scale interventions (Diehl et al. 2012; Ploog et al. 2013; Reed et al. 2011; Wieckowski and White 2017), research to date has largely been comprised of pilot studies and few large-scale, randomized controlled trials (RCTs). Many of these studies have also contained methodological weaknesses, such as small sample sizes, and lack of standardized assessment measure and follow-up measurement (Ploog et al. 2013; Reed et al. 2011). Thus, cross-trial comparisons and the generalizability of the outcomes are limited, and BITs-SST demand additional research.

To our knowledge, no systematic comparison of F2F-SST and BITs-SST using meta-analytic methods has yet been published. The aim of this study is to conduct a meta-analysis comparing RCTs of F2F-SST and BITs-SST interventions for children and adolescents with ASD to compare their efficacy.

Methods

Selection Process of Studies

Studies were selected using an electronic database search conducted in July of 2020. An “all text” comprehensive electronic database search was conducted from APA PsycINFO and MEDLINE databases using the following search terms: social* AND (skills or training or intervention) AND (child* OR presch* OR pre-sch* OR toddler* OR youth OR teen* OR adolescen*) AND (autis* OR ASD OR asperger* OR PDD*) AND (RCT OR random* control* trial) AND (SSRS or Social Social Skills Rating System or SSIS or Social Skills Improvement System or SRS or SRS-2 or Social Responsiveness Scale). A second search in “all text” was conducted of the Cochrane and EMBASE databases using the same search terms listed above. Neither search was not restricted to a certain date range in an effort to result in the most comprehensive body of literature to analyze; the search of PsycINFO and MEDLINE produced studies from 2013 to 2020 and the search in the Cochrane and EMBASE databases produced studies from 2016 to 2020. A third and final ancillary search conducted on Google Scholar resulted in 3 more studies added. Details can be found in Appendix Table 3.

Due to the study aims, the search was restricted to include only child and adolescent participants; the final study pool included participants aged 3 to 19 years old. These searches yielded an initial pool of 783 articles (77 from the Cochrane/EMBASE search, 703 from the PsycINFO and MEDLINE search, and three from ancillary searches; Fig. 1). A total of 172 duplicate studies were removed between the three searches. Next, the title and abstract of each article were examined and 516 studies were removed based on the inclusion and exclusion criteria of the present study (see below). This resulted in a pool of 92 studies screened for potential inclusion. Of these, 19 were not RCTs, 18 were not SSTs, 10 did not include the target outcome measures, 10 more were duplicates, seven were not full-length articles (e.g., posters), three were focused on adults, two were not focused on a population with ASD, one provided insufficient data, and one did not include a waitlist control or treatment-as-usual group as comparison (e.g., compared two interventions). The final review included 18 relevant RCTs: 14 F2F and 4 BITs studies. One study included more than one independent group; thus, the final analysis included 19 different observed comparisons published between 2009 and 2020.

Fig. 1
figure 1

PRISMA Flow Diagram of Data Acquisition

Inclusion and Exclusion Criteria

Studies were considered for inclusion in this meta-analysis based on the following criteria: (1) participant age (between 3 and 19 years old), (2) formal diagnosis of ASD, (3) social skills training intervention, (4) one of the four following social skills measures as a primary outcome measure (either the Social Responsiveness Scale (SRS), Social Responsiveness Scale-Second Edition (SRS-2), Social Skills Rating System (SSRS), or Social Skills Improvement System (SSIS)), and (5) study was an RCT. Studies including participants between the ages of three and 19 who had a DSM-IV or DSM-5 diagnosis of ASD or a related diagnosis (e.g., high-functioning ASD, Asperger’s syndrome, pervasive developmental disorder [PDD], and pervasive developmental disorder-not otherwise specified [PDD-NOS]) by a licensed professional were included in the analysis. This study includes individuals on the autism spectrum, including both high- and low-functioning ASD. Studies that included ASD and comorbid diagnoses (e.g., intellectual disability [ID]) were included if the target of the intervention was related to the treatment of social skills in children with ASD. Studies were excluded if the target of the intervention was a disorder other than ASD (e.g., a study on Fragile X or ID that included some participants with comorbid ASD) or was not related to the development of social skills (e.g., anxiety, depression). The present meta-analysis focuses on psychosocial skills training, and thus, studies were required to include an intervention intended to improve the social skills of participants based on the SRS, SRS-2, SSRS, or the SSIS as a primary outcome measure of social skills. Studies were only included if they used parent-report forms of the measures. While some studies included self- and collateral-reports (i.e., parent and teacher), the use of collateral reports was less consistent in the literature than parent reports. Therefore, any studies that only utilized self- or teacher-report results were excluded. Studies in which the main treatment approach was medical (e.g., medication) were also excluded in an effort to isolate the effects of social skills training (SST) on the social skills of individuals with ASD. Studies were included if participants were taking medication that was not altered directly before or during the intervention and was not part of the intervention. Only RCTs were included; case studies, literature reviews, and qualitative studies were excluded from the analysis.

An article was considered a F2F-SST study when the intervention was implemented in-person (often in group settings) without the use of electronic interventions. An article was considered a BITs-SST study when the intervention was implemented through interactive technological means (e.g., computer-based, tablets, applications). Passive use of technology such as viewing video clips to supplement SST was not considered BITs-SST because of the lack of interactive technology use.

Methodological Quality

The 18 studies included in this meta-analysis were evaluated using the Revised Cochrane Risk-of-Bias Tool for Randomized Trials (Sterne et al. 2019) method for determining study quality. The studies were evaluated to determine if they meet risk-of-bias standards in five domains: (1) the randomization process, (2) deviations from the intended interventions (effect of assignment to intervention), (3) missing outcome data, (4) measurement of the outcome, and (5) selection of the reported result. These domains were rated individually and studies were categorized as Low, Some, or High Risk-of-Bias. No studies fell into the High Risk-of-Bias category. Two blinded, independent raters evaluated the methodological quality of each article (ES and KB). Raters determined the level of risk of bias for each article and assigned a score of 0 (low), 1 (some), or 2 (high). Interrater reliability was analyzed in Statistical Package for the Social Sciences (SPSS) version 23.0 (IBM Corporation 2015), and a kappa coefficient was derived from the independent ratings.

Data Synthesis

A pretest-posttest-control group design (PPC; Morris 2008) was conducted using Comprehensive Meta-Analysis Version 3.3.07 (Borenstein et al. 2014). The PPC design evaluates the pretest and posttest assessment value across groups of participants who received SST or participated in a control condition. This allows for researchers to evaluate the overall change in comparison to a non-treatment group. For the overall effect, a version of the standardized mean difference of group differences in the change between pre- and post-assessments was calculated using a corrected inverse-variance weighted effect size (Hedges’ g; Hedges and Olkin 1985). Effects were standardized by the change score standard deviation.

Where possible, pre/posttest means, standard deviations, and sample sizes were included for each group. These were gathered from direct report in the study results. Standard deviations were derived from confidence intervals where they were not available in the article (Choque Olsson et al. 2017). Where different subgroups were present, pooled data were provided (Dekker et al. 2019). In studies where there were two observed groups that were evaluated with independent, matched control groups (i.e., Hopkins et al. 2011), the observed groups were analyzed independently rather than pooled based on statistical recommendations (Borenstein et al. 2009) and designs used in previous meta-analyses (Cuijpers et al. 2009; Andersson and Cuijpers 2009). Two studies (Matthews et al. 2018; Vernon et al. 2018) included more than one of the selected measures for social skills (i.e., SRS and SSIS). For these studies, the grand mean was utilized in order to derive a single overall effect for each sample. Mean effect sizes were calculated using the statistical software, accounting for the test-retest performance of each measure. One article (Yun et al. 2017) did not include information for the posttest standard deviation, thus necessitating the use of change statistics (F-value) and sample size. Additionally, another article (Matthews et al. 2018) did not include posttest mean or standard deviation for the control group; therefore, the mean change and standard deviation difference in each group was used. In addition to means, standard deviations, and sample size, pre- and post-SRS test correlations were used in order to account for change due to measurement reliability (Hunter and Schmidt 2004). These values were derived from information regarding test-retest performance resulting in values of 0.74, 0.88, 0.87, and 0.84 for the SRS, SRS-2, SSRS, and SSIS, respectively (SRS, McConachie et al. 2015; SRS-2, Constantino and Gruber 2012; SSRS, Gresham and Elliot 1990; SSIS, Gresham et al. 2010).

Outcome Assessment Measures

Several types of outcome measures exist which specifically detect changes in social functioning as a result of treatment. Two of the most widely used (Bölte et al. 2008; Epp 2008; White et al. 2007, 2015) measures for assessing social skills are the Social Responsiveness Scale (SRS; Constantino and Gruber 2005; updated to the SRS-2) and Social Skills Rating System (Gresham and Elliot 1990; updated to the Social Skills Improvement System; Gresham & Elliott, 2008). While such questionnaires can be criticized because they are not direct observational metrics, the SRS, SRS-2, SSRS, and SSIS are well-validated, commonly used, measure overlapping constructs, and produce overall scores that allow for statistical interpretation (McConachie et al. 2015; Reichow et al. 2012).

The SRS, SRS-2, SSRS, and SSIS were used in the present meta-analysis as they are comparable outcome measures which are most commonly used to measure changes in social functioning with this population (Bölte et al. 2008; Constantino and Gruber 2005, 2012; Epp 2008; Gresham and Elliot 1990, 2008; White et al. 2007; White et al. 2015). These outcome measures share a similar, parent-report questionnaire format. Using all four measures enabled researchers to expand this search as restricting the search to only one (either the SRS or the SSRS) was very limiting, yielding very few articles.

Social Responsiveness Scale and Social Responsiveness Scale-Second Edition

The SRS-2 (Constantino and Gruber 2012) is an updated version of the SRS (Constantino and Gruber 2005). The SRS-2 has the same 65-item parent- and teacher-report measures for preschool and school age children, and has added parent-, friend-, relative-, spouse-, and self-report forms for the adult form (Constantino and Gruber 2012). The SRS-2 is an internally consistent (.94 to .96 across three age groups) measure that evaluates social skills based on five subscales: Social Awareness, Social Cognition, Social Communication, Social Motivation, and Restricted Interests and Repetitive Behavior (Constantino and Gruber 2012). These five subscales are combined to create an overall measure of social behavior deficits. While strong consistency exists across items, internal consistency is not reported for specific subscales, posing a limitation in interpreting subscales. Test-retest reliability for the SRS-2 was not collected; for the SRS, this ranged from .88 to .95 with test-retest intervals between three and six months (Constantino and Gruber 2012).

The SRS is a 65-item parent- and/or teacher-report measure which obtains a ranking of ASD symptom severity (Constantino and Gruber 2005). As mentioned above, test-retest reliability ranges from .88 to .95. The SRS measures five social skills (social awareness, social information processing, capacity for reciprocal social responses, social anxiety/avoidance, and characteristic autistic preoccupations/traits) and combines these categories to form a comprehensive score of the severity of social deficits in children (Constantino and Gruber 2005). Researchers included only composite scores from the parent-report SRS results in the analyses.

Social Skills Rating System and Social Skills Improvement System

The SSRS is a student-, parent-, or teacher-report reliable measure (0.87; Gresham and Elliot 1990). The number of items on the SSRS varies between 34- and 57-items based on the informant and their age. The SSRS gathers behavior ratings from parents, teacher, and student (for third grade students and above) on cooperation assertion, responsibility, empathy, and self-control (Gresham and Elliot 1990). The SSRS has been updated and replaced with the Social Skills Improvement System (SSIS; Gresham and Elliot 2008). The SSIS is a revision of the SSRS which includes updated norms and four additional subscales (communication, engagement, bullying, and autism spectrum) with high reliability (0.84). The SSIS produces an overall composite social skills score as well as subscale scores. While not used in the present study, the subscale scores’ test-retest coefficients range between the .70s and .80s. In order to account for the varying reliability, researchers applied respective pre-post correlations. For both the SSRS and SSIS, researchers utilized only the parent-report composite scores from the studies.

Results

F2F-SST Studies

Fourteen of the identified studies utilized F2F-SST to teach social skills to youth with ASD (Choque Olsson et al. 2017; Dekker et al. 2019; Freitag et al. 2016; Jonsson et al. 2019; Laugeson et al. 2009; Lopata et al. 2010; Marshall et al. 2016; Matthews et al. 2018; Rabin et al. 2018; Schohl et al. 2014; Shum et al. 2019; Thomeer et al. 2019; Vernon et al. 2018; White et al. 2013). Specifically, Dekker et al. (2019) conducted an RCT of a manualized treatment that utilizes behavioral principles and social learning theory to teach social skills. They observed that children’s SSRS scores significantly improved on the cooperation subscale. Frietag et al. (2016) examined the effects of Social Skills Training Autism-Frankfurt (SOSTA-FRA), which is a manualized, structured, cognitive-behavioral, group-based social skills training program for youth with high-functioning ASD. They found that both immediately after treatment and at 3 months of follow-up, there was a significant reduction in SRS scores compared to treatment as usual (Freitag et al. 2016). Laugeson et al. (2009) determined that Program for the Education and Enrichment of Relational Skills (PEERS; Laugeson and Frankel 2006), which is a manualized treatment that includes both separate and concurrent social skills groups for youth with ASD and their parents, demonstrated significant social improvement. Similarly, Matthews et al. (2018) utilized the PEERS curriculum to compare traditional PEERS to a peer-mediated PEERS curriculum (i.e., each participant with ASD has a typically developing peer mentor). Both groups experienced significant gains in social skills, and those in the peer-mediated group improved more that was maintained at a 4-month follow-up. Additionally, Rabin et al. (2018) examined a Hebrew version of PEERS and determined that there was a significant improvement in social skills, which was maintained 16 weeks post-treatment. Schohl et al. (2014) also found that PEERS significantly improved social skills in youth with higher-functioning ASD. A study conducted by Shum et al. (2019) in Hong Kong determined that a Chinese translation and adaptation of PEERS resulted in significant gains in social skills for adolescents with ASD.

Thomeer and colleagues (2019) evaluated a comprehensive 5-week-long social skills group intervention called summerMAX. SummerMAX significantly improved social skills in children with ASD from pre- to post-treatment. Vernon et al. (2018) examined the effects of Social Tools And Rules for Teens (START), a manualized curriculum that includes free play, a learned social topic and practice, and a structured social activity. They found that there were significant post-treatment Group × Time differences between START and waitlist control. Lopata et al. (2010) also found that group social skills training resulted in statistically significant improvement in social skills for children with high-functioning ASD. Results from Marshall et al. (2016) suggested that individualized Social Stories are effective for improving social skills in youth with ASD. Choque Olsson et al. (2017) postulate that there are significant treatment effects for structured, manualized “KONTAKT” social skills group training for adolescents with ASD based on parent ratings both immediately following treatment and at 3-month follow-up; however, there were no significant group differences in child or teacher ratings. Jonsson and colleagues (2017) also used KONTAKT to improve communication skills, social awareness and navigation, and self-confidence. They reported a large effect size between pre- and post-treatment that continued through a 3-month follow-up. Lastly, White and colleagues determined that MASSI (White et al. 2009), a manual-based modular treatment that is delivered in individual therapy, group therapy, parent education, and coaching, significantly improves social skills in youth with ASD.

F2F-SST studies included manualized treatment delivered in group settings and an individualized Social Stories intervention. All studies had significant treatment gains from pre- to post-treatment. Of the three manualized treatments that conducted follow-up measures post-treatment, results were maintained at 3- to 4-month post-treatment.

BITs-SST Studies

Four studies utilizing BITs-SST were identified (Hopkins et al. 2011; Rice et al. 2015; Thomeer et al. 2015; Yun et al. 2017). Hopkins et al. (2011) conducted a study of Facesay, a computer program which utilizes human-like avatars and interactive games to teach facial processing, recognition, eye gaze, and joint attention. This intervention occurred entirely online and facilitators only served in a behavior monitoring capacity, such as giving praise or rewards to children who appropriately used their mouse or touch screen or who remained seated during the intervention (Hopkins et al. 2011). Children with both low- and high-functioning ASD who received BITs-SST had a significant positive change in their parent-reported social skills (Hopkins et al. 2011). Thomeer et al. (2015) found that a computerized program, Mind Reader, which uses facial video and vocal stimuli to teach simple and complex emotions, yielded a significant improvement in the social skills of children with high-functioning ASD from pre- to post-treatment. Social skills instruction and practice occurred online and were reinforced through in vivo practice with a staff clinician twice during each of the five treatment intervals (Thomeer et al. 2015). Rice and colleagues (2015) used the same computer program and found that, after controlling for pretest scores, there was a significant difference in reported social skills such that those in the experimental group showed significantly more improvement post-intervention than the control group. Yun et al.’ (2017) studied the Robotic Intervention System (i.e., iRobiQ and CARO) which utilizes an interactive robot to teach facial emotion recognition and eye contact, with a human facilitator present but serving only to ensure the intervention was being performed correctly. However, no statistically significant improvement in social skills was found (Yun et al. 2017). Of the four studies, three reported statistically significant improvement in social skills and none included follow-up measures.

Of the 18 studies (Table 1; 1266 participants) included in the analysis, 17 studies were entered as individual studies (F2F-SST = 14 studies, BITs-SST = 3 studies) and one study that provided independent samples (i.e., different control groups matched to condition) was entered independently. Specifically for BITs-SST, one study (Hopkins et al. 2011) provided two samples, high- and low-functioning ASD, with independent matched control groups. Where studies had multiple outcome measures, a grand mean effect size was derived in order to get an overall effect. Based on these criteria, the final analysis included 19 observations. The F2F-SST included a total of 1128 participants. Most F2F-SST studies utilized manualized protocols (e.g., KONTAKT, Skillstreaming, START, SOSTRA, PEERS, Social Stories). The interventions ranged in length from 2 weeks to one school year. The four BITs-SST studies included a total of 138 total participants. Treatment was provided in various formats, including utilization of computer-based software programs (e.g., CARO, FaceSay, iRobiQ, and Mind Reader), computer-based avatars, and a therapeutic robot. These interventions ranged in length from 8 to 12 weeks.

Table 1 Selected studies for social skills training for youth with ASD

Methodological Quality

Based on Cochrane Risk-of-Bias methodological quality rating guidelines, almost all articles (15) were categorized as Some Risk-of-Bias (Choque Olsson et al. 2017; Dekker et al. 2019; Freitag et al. 2016; Jonsson et al., 2018; Laugeson et al. 2009; Marshall et al. 2016; Matthews et al. 2018; Rabin et al. 2018; Schohl et al. 2014; Shum et al. 2019; Thomeer et al. 2015, 2019; Vernon et al. 2018; White et al. 2013; Yun et al. 2017). Only two articles (Hopkins et al. 2011; Rice et al. 2015) were categorized as Low Risk-of-Bias. One article (Lopata et al. 2010) received a rating of Low Risk-of-Bias from one rater and Some Risk-of-Bias from the other rater. None meets criteria for High Risk-of-Bias, so all articles were able to be included in the analysis. Therefore, the scores ranged from 0 to 1, with an average rating of 0.84. Interrater rater reliability indicated acceptable agreement (kappa = 0.77).

Data Synthesis

Generally, data synthesis was similar across the studies, using pre/posttest means and standard deviations for the control and treatment groups. A forest plot of studies, including the grand mean effect sizes, is presented in Fig. 2.

Fig. 2
figure 2

Forest plot for BITs and F2F subgroups. BITs, Behavioral Intervention Technologies; F2F, face-to-face; HFA, high-functioning autism observed group; LFA, low-functioning autism observed group; SRS, Social Responsiveness Scale; SSRS, Social Skills Rating Scale; SSIS, Social Skills Improvement System; g, Hedges’ g; p, p value for individual study and overall effect; diamond, overall effect size

The overall model indicated significant heterogeneity (χ2 (19) = 59.14, p < 0.01), suggesting significant variability in effect across studies. Heterogeneity is used to determine whether use of a fixed effects or random effects model is most appropriate, based on assumptions about how studies match the target population. When significant heterogeneity is present, a random effects model is suggested. Thus, for the overall effect size, a random effects model was employed. Additionally, both the F2F-SST and the BITs-SST estimates were found to have significant heterogeneity (χ2 (11) = 51.17, p < 0.01; and χ2 (4) = 4.32, p < 0.01, respectively; Fig. 2). The inconsistency estimate (I2) was calculated in order to gather information about the inconsistency in effect estimates due to heterogeneity between studies (Borenstein, Hedges, Higgins, & Rothstein, 2009; Card 2012). This estimate was medium to high for both the overall model (I2 = 69.57) and the F2F-SST-only studies (I2 = 74.59), and low for BITs-SST studies (I2 = 7.31; Higgins et al. 2003).

Additionally, risk of bias based on study precision was reviewed through assessing funnel plot symmetry (Fig. 3) and calculating Rosenthal’s Fail-Safe N (Rosenthal 1979). The funnel plot offers a visual representation of the effect based on study precision, where more precise studies (i.e., those with smaller standard errors) are intended to cluster around the central line, which is the overall effect size of the study. Funnel plot symmetry (i.e., the extent to which studies cluster evenly around the midline) was determined to be asymmetric after collaborative review of the plot, by evaluating study precision (standard error; y-axis) and effect (Hedges’ g; x-axis), in comparison to the effect size derived from the overall analysis (vertical line; Borenstein 2005). Though many of the studies did cluster around the overall effect size as expected, four studies with relatively low standard errors (Choque Olsson et al. 2017; Dekker et al. 2019, Freitag et al. 2016, and Rabin et al. 2018) demonstrated some pull away from the mean effect size to the left of the plot, indicating asymmetry toward a smaller effect. These results do suggest that the resultant effect may be biased, as these more precise estimates demonstrated a smaller effect. Funnel plot asymmetry was also evaluated using Egger’s regression (Egger et al. 1997), which confirmed asymmetry (t (17) = 4.73; p < 0.001). Characteristics of the studies that fell outside of the funnel plot were evaluated for common characteristics that may have introduced systematic error (i.e., severity of ASD, type of intervention). However, investigators did not find any explanatory variables that may have contributed to a smaller effect. Rosenthal’s Fail-Safe N was calculated in order to determine the number of studies needed to nullify the overall effect, resulting in 659 missing (“file drawer”) studies. Overall, the results suggest a non-trivial amount of bias between studies based on funnel-plot analysis.

Fig. 3
figure 3

Funnel plot for risk of bias. Asymmetric funnel plot has evidence of some risk of bias. Precise studies (i.e., those with larger samples) are indicated at the top of the chart and deviate from the grand mean effect. Mid-size studies cluster around the mean effect size

Primary Comparison

All studies combined indicated a medium to large overall effect of 0.83 (95% CI 0.60, 1.07) under the random effects model (Cohen 1988). Effect estimates for the overall model are included in Table 2. The overall sample was divided into F2F-SST (14 observed groups) and BITs-SST (5 observed groups) studies, and these subgroups were also compared using a mixed effects analysis. In isolation, both BITs-SST and F2F-SST studies demonstrated medium to large effect sizes (g = 0.93 and 0.81, respectively), indicating a significant improvement in social skills for both types of treatment when compared to control groups. Lastly, results of the mixed effects analysis did not demonstrate significant differences between F2F-SST and BITs-SST subgroups (overall between group heterogeneity, χ2 (1) = 0.28, p = 0.59), indicating comparable effects across treatment type. Taken all together, the present analyses preliminarily suggest comparable effectiveness of F2F-SST and BITs-SST when their treatment effects are compared to control groups.

Table 2 Effect size outcomes for combined social skills measures

Discussion

Social skills training for children and adolescents with ASD has traditionally focused on face-to-face interventions, and, in recent years, BITs-SST have also been developed and are beginning to be tested. Previous meta-analyses (Bellini et al. 2007; Gates et al. 2017; Matson et al. 2007; Reichow, Steiner, & Volkmar, 2012; Spain and Blainey 2015; White et al. 2007) have focused almost exclusively on F2F-SST and did not compare them with BITs-SST. The present meta-analysis compared RCTs of F2F-SST with BITs-SST for children and adolescents with ASD in an effort to further assess the preliminary support of BITs-SST and to inform future directions in the field of SST.

A total of 18 RCTs met inclusion criteria: 14 F2F-SST and four BITs-SST. Of these studies, one BITs-SST included two observed groups (Hopkins et al. 2011), resulting in 14 F2F-SST and five BITs-SST observed groups, totaling 1266 participants. BITs-SST studies included computer software (Thomeer et al. 2015), human-like avatars (Hopkins et al. 2011; Rice et al. 2015), and therapeutic robots (Yun et al. 2017). After methodological quality review based on Cochrane Risk-of-Bias guidelines, 15 articles were categorized as Some Risk-of-Bias, two were categorized as Low Risk-of-Bias, and only one article (Lopata et al. 2010) received a rating of Low Risk-of-Bias from one rater and Some Risk-of-Bias from the other. None met criteria for High Risk-of-Bias; therefore, all studies were included in our analysis.

The combined effect size of all 19 observed groups included in the analysis indicated improvement in social skills in children and adolescents who participated in either F2F-SST or BITs-SST compared to the control groups (g = 0.83), with effect sizes consistently in the large range regardless of treatment type (g = 0.81 and g = 0.93, respectively). The overall effect for F2F-SST is consistent with previous F2F-SST meta-analyses (Bellini et al. 2007; Gates et al. 2017; Matson et al. 2007; Reichow et al. 2012; Spain and Blainey 2015; White et al. 2007). Additionally, these preliminary analyses did not indicate significant differences between F2F-SST and BITs-SST. In other words, both F2F-SST and BITs-SST interventions improved social skills from pre- to post-treatment. These findings provide initial support for the continued investigation of BITs-SST with children and adolescents with ASD. However, there are only four BITs-SST studies included in this meta-analysis, and thus, recommendations for clinical practice would be premature given the small number of studies reviewed.

While previous BITs-SST studies have shown positive outcomes (Hopkins et al. 2011; Ploog et al. 2013; Reed et al. 2011; Rice et al. 2015; Thomeer et al. 2015; Wieckowski and White 2017; Yun et al. 2017), this is the first meta-analysis to compare the efficacy of BITs-SST with more traditional, evidence-based approaches to social skills training. The lack of significant differences in efficacy for F2F-SST and BITs-SST provides initial support for new approaches that may expand the options for social skills training for children and adolescents with ASD. If rigorous empirical evaluations of BITs continue to yield comparable results, these new approaches (e.g., BITs-SST) could potentially increase accessibility of social skills training, as technology-based intervention can serve large numbers of children with minimal reliance on the availability of mental health professionals. Although some critics have stated that improvements in a child’s behavior via technological interventions may not translate to real social interactions (Latash 1998), the preliminary analysis of four BITs studies conducted in this paper suggested that social improvements may be reported by parents for behaviors such as eye contact, non-verbal communication, and conversational skills (e.g., Yun et al. 2017). Furthermore, the BITs-SST included in this meta-analysis included participants with a range of functioning levels; thus, BITs-SST could have promise for addressing the diverse presentation of ASD symptoms. This meta-analysis provides preliminary support for BITs-SST and indicates that further investigation of their efficacy may be justified, especially considering that BITs-SST may be an appealing medium for children and adolescents.

Limitations

There are several limitations of this study. First, this meta-analysis examined 18 studies (19 observed groups), and only four included BITs-SST (five observed BITs-SST groups). Comparisons of pre- and post-treatment functioning for SST and control groups were an important criterion in order to reduce variability, and thus, a large number of the available BITs-SST studies were not included in the study due to lack of control groups or well-validated measure of social skills. Additionally, there exists a large discrepancy between the number of participants in the F2F-SST (n = 1128) studies compared to those in the BITs-SST studies (n = 138), which is likely due to the nascent literature related to BITs and its position as a relatively new field of study in the ASD population. Results from this study must be interpreted with caution due to the small number of BITs-SST studies included and the small overall sample size compared to that of the F2F-SST studies. In the 18 studies that were included in the overall model, there was also inconsistent reporting of follow-up data. Five of the 18 studies did not include follow-up data collection which met a threshold for meaningful analysis. Therefore, follow-up data were not analyzed (though reported, when available, in Table 1) and the maintenance of the treatment impact remains unknown. Future research would be strengthened by including follow-up assessment to better support the efficacy of such interventions. Future research should aim to examine how BITs-SST can impact social skills with more robust participant sizes to bolster this body of research.

Similarly, many of the treatments varied in length (from 2 weeks to an entire school year); thus, the necessary amount of intervention time is unclear. Additionally, because studies occurred in various settings, such as classrooms, computer labs, clinics, or participants’ homes, it is not clear which is the most appropriate setting for social skills training. While the increased flexibility in method of delivery may increase accessibility to treatment, the variations in delivery and environment make comparisons more challenging. The many differences in these settings may have contributed to some heterogeneity in the overall model. Additionally, the studies in this analysis represent individuals on the autism spectrum, ranging from low to high functioning. The clinical presentation and symptomology of the subjects included in the study may vary greatly and contribute to overall heterogeneity. Also, the study analyses were limited to parent-report measures and did not include direct behavioral observation of social skills improvement or self- and teacher-report measures. Finally, the studies were all written in English and conducted largely in the USA (four were conducted in the following countries: Australia, Hong Kong, Israel, Netherlands, and Sweden). Future research would be strengthened by representing a more global literature from a variety of countries and published in a variety of languages to best compare and contrast these groups.

The majority of the included studies fell within the Some Risk-of-Bias category and evidence of bias was also evident at the study-level through funnel plot analysis (Borenstein et al. 2009). Notably, the three largest studies (Choque Olsson et al. 2017; Dekker et al. 2019; Freitag et al. 2016) had relatively smaller effects, though they demonstrated the highest precision, resulting in an asymmetric funnel plot. This suggests that the magnitude of the overall effect may be positively biased and that further studies with similarly large samples should be conducted in order to gain more accurate and precise estimates of the overall effects.

Future Directions

Additional research is needed to fully understand the impact of BITs-F2F as there were only four BITs-SST studies (five observed groups) included in the present analysis. This is largely because, while there are many BITs for ASD, there are few randomized controlled trials that measure the efficacy of BITs for ASD using well-validated measures (Kim et al. 2018). Of these four studies, approaches to BITs-SST varied widely as did the time frames of intervention. Future research should focus on understanding the impact of each type of BITs-SST, and to compare specific types of BITs-SST to F2F-SST. As the body of research grows, it will be important to include follow-up data to compare the lasting effects and maintained benefits of both F2F-SST and BITs-SST. Further, it is important to analyze whether length of social skills intervention significantly impacts the effect size of social skills interventions. In congruence with Weisz et al. (2015), it is recommended that future studies utilize a deployment-focused model and analyze these interventions in clinical contexts to improve the ecological validity of findings.

Future research should also aim to compare several combinations of F2F-SST and BITs-SST (e.g., studies assessing the effect of simultaneously combining F2F-SST and BIT-SST or the implementation of BITs-SST first and then F2F-SST, or vice versa). For instance, previous literature suggests that practicing social skills online may reduce the level of distress and anxiety that occurs during in vivo face-to-face interactions (Kandalaft et al. 2013; Maskey et al. 2014; Parsons and Mitchell 2002). It could be hypothesized that administering BITs-SST first might help the individual to practice social skills in a setting with minimal anxiety, and then utilizing F2F-SST where anxiety may be slightly increased could provide the opportunity to generalize skills to realistic settings. Additionally, research could examine BITs-SST to assist with the maintenance of gains obtained through traditional methods.

Finally, future studies should examine the efficacy in populations who may particularly benefit from BITs. For example, individuals with few resources or those who do not live close to a social skills group might be able to access similar services online. Research shows that online applications and both synchronous (i.e., live) and asynchronous (i.e., live recordings which are then reviewed by a provider) social communication interventions significantly improve social communication skills in youth with ASD in rural locations and in a variety of counties (Simacek et al. 2020). Future research should examine the efficacy of social skills training delivered through telehealth and online applications, including BITs which have parents facilitate the social skills training with the help of a remote provider. This is particularly pertinent during the COVID-19 pandemic and the resultant reduction of F2F intervention services. While BITs-SST treatments are still emerging, and as the body of research grows, more methodologically rigorous RCTs which examine BITs-SST are needed.

Conclusions

Results of the analysis showed a medium to large effect size for BITs-SST and F2F-SST in improving the social skills of children and adolescents, and no significant differences between modalities. This is the first meta-analysis demonstrating the efficacy of BITs-SST versus a control group using a pretest-posttest-control design for teaching social skills to children and adolescents with ASD. The results provide initial support that the use of technology platforms may hold promise for delivering SST to youth with ASD. This possibility warrants future study as the ability to expand intervention resources and increase access to services could reduce some of the costs of obtaining services for individuals with ASD and their families. Although some providers may be cautious about the utilization of BITs in a clinical setting with children and adolescents with ASD (Stallard et al. 2010), there is an increased acceptability of BITs by clients and providers (Topooco et al. 2017) and new resources are needed as there is a shortage of providers compared to an increasing population of individuals with ASD (Gordon-Lipkin et al. 2016; Liptak et al. 2008; Magaña et al. 2013; Thomas et al. 2007). As technology becomes more prevalent in the lives of youth, further empirical support for effective ways to harness these tools for social skills treatment could be of great value to individuals seeking services for ASD.