Introduction

Considerable research has been conducted over the past 20 years to understand the causes and correlates of adolescent offending, including several longitudinal studies following children well into adulthood (Koops and Orobio de Castro 2004). These have allowed for a detailed and invaluable insight into associations between both risk and protective factors across several domains including individual, family, peer, school and community factors with general offending (see reviews by Lipsey 1995; Loeber and Farrington 1998; Shader 2002; Rennie and Dolan 2010). Despite this, adolescent offending continues to be a complex and persistent societal problem with significant consequences for individuals, families and communities (Blackburn 2003).

Preventative efforts are based on the assumption that the life course trajectories of adolescents can be changed by actively reducing those risk factors associated with antisocial behavior and building on the strengths and protective factors that support desistance. Youth justice and social care agencies are committed to empirically supported interventions that reduce persistent patterns of adolescent antisocial behavior. Some promising results have been obtained with cognitive or behavioral approaches, parent management training, pharmacological approaches and multimodal therapies (Walton 2012). Very few interventions have adopted a structured multimodal approach despite research indicating that serious and violent adolescent offending is multidetermined (Shader 2002). This systematic review focuses on one such intervention, namely, Multisystemic Therapy (Henggeler and Borduin 1990) (for a critique of the model, see Markham 2016).

Multisystemic Therapy is quite possibly one of the most empirically investigated interventions for antisocial behavior by adolescents and is currently delivered by more than 500 teams across 16 countries worldwide. Given the rapid widespread implementation, it might be supposed that the effectiveness of Multisystemic Therapy has been consistently empirically demonstrated. A preliminary search of the Cochrane Library, Campbell Collaboration and Google Scholar undertaken in December 2013 yielded a systematic review of the effectiveness of Multisystemic Therapy undertaken by Littell et al. (2005); a meta-analysis (Curtis et al. 2004) and several narrative literature reviews (e.g., LaFavor and Randall 2013). While narrative reviews can provide an informative overview, they are frequently vulnerable to sources of error and bias and little attempt is made to sample all of the available literature and critically consider study design and quality (Petticrew and Roberts 2006). By contrast, systematic reviews adopt a more rigorous, comprehensive and transparent process to synthesize large bodies of information (Petticrew and Roberts 2006).

The systematic review undertaken by Littell et al. (2005) examined 21 randomized control trials of Multisystemic Therapy from eight independent samples (total participants = 1230) including non-published studies (see Table 3 in “Appendix” for studies). A range of different outcome measures from official records of antisocial behavior to caregiver report of child problem behavior and self-report involvement in delinquency were identified across included studies. The authors concluded that whilst there was no evidence suggesting that Multisystemic Therapy had harmful effects; it remained unclear whether the treatment program had clinically significant advantages compared with other interventions. One critique of the previous research trials by Littell and colleagues was the involvement of the program developers in all but one of the included studies. The possible effect of developers-as-evaluators has been investigated by Petrosino and Soydan (2005) in a meta-analysis of 300 randomized control trials of interventions targeting recidivism and a review of 12 meta-analyses of offender treatment. In both cases larger mean effect sizes were found when evaluators were influential in the design and delivery of treatment. While highly involved researchers may be unduly influenced at various stages, it is also possible this finding is explained by an increased attention to integrity and delivery. The only independent study reviewed by Littell et al. (2005) did not find Multisystemic Therapy to be superior to usual services in reducing adolescent antisocial behavior (Canada, Leschied and Cunningham 2002) highlighting the need for further independent investigation of the effectivess of Multisystemic Therapy.

Additional critiques by Littell et al. (2005) involved poor descriptions of “usual services”, incomplete information about randomization procedures and unexplained attrition in at least three studies in the number of participants who had agreed to be assessed, who were then randomly assigned and reported in the results. Follow-up periods varied considerably within studies meaning that one participant could be followed up for twice as long as another participant thus making between group comparisons difficult.

The finding by Littell et al. (2005) that the effectiveness of Multisystemic Therapy was not convincingly demonstrated contradicted a meta-analysis by the program developers undertaken at a similar time and including six of the eight same studies (total participants = 708) (Curtis et al. 2004). An overall average effect size of d = 0.55 for criminal behavior (based on official records) was reported and Multisystemic Therapy participants and their families were found to be functioning better than 70% of participants treated by usual methods. The different conclusions reached by these two reviews sparked debate about quality assessment, inclusion criteria, allegiance effects and the estimation of effect sizes (see Henggeler et al. 2006; Littell 2006).

It is worth noting that the meta-analysis found a clear difference between efficacy and effectiveness trials. The former is typically conducted under optimal circumstances, i.e., closely supervised by developers, often university based with therapists who are graduate students. Larger effect sizes were observed under these conditions (d = 0.81, CI 95% = ± 0.33) than effectiveness studies involving treatment delivered by therapists in community settings (d = 0.26, CI 95% = ± 0.06). The significance of the study condition variable highlights possible challenges in the dissemination of Multisystemic Therapy to the real world.

Current Study

The systematic review undertaken by Littell et al. (2005) made an important contribution in bringing together previous findings and highlighting methodological limitations. However, it was based on research literature from 1985 up to January 2003 and therefore can be considered somewhat anachronistic. This is particularly relevant given the increased body of both international and independent research in line with the global spread of Multisystemic Therapy. Littell et al. (2005) were only able to include studies across three countries (America, Canada and Norway) and gaps in knowledge about international transportation remain. It was reported that there were 13 “ongoing” possibly randomized studies and it may be that the sufficient number of new primary studies may alter previous conclusions.

Given the mixed findings, continued efforts are needed to determine the effectiveness of Multisystemic Therapy. This current review aims to determine whether Multisystemic Therapy is more effective than usual services or no treatment for adolescents who are at risk of serious antisocial behavior and/or out-of-home placement. The primary focus is further offending as measured by official data which, despite its problems, remains the most significant test of any intervention designed to reduce antisocial behavior. The review followed the methodology outlined in the Cochrane Handbook for Systematic reviews of Interventions (Higgins and Green 2011) with the exception of the inclusion of non-English studies due to time and resource constraints.

The first objective was to identify the highest quality experimental studies that have measured the effectiveness of Multisystemic Therapy since the search undertaken by Littell et al. (2005) and provide a detailed description of their methods. The second objective was to determine whether Multisystemic Therapy (including adaptations for problem sexual behavior and contingency management/substance abuse) was more effective than treatment as usual/no treatment in addressing outcomes (primary outcomes: antisocial behavior and out-of-home placement; secondary outcomes: substance use, adolescent functioning; family functioning, peer relations and school performance) in adolescents aged 10–17 years.

Methods

Search Strategy

The current study searched several data sources (current as of 21 June 2014) including seven online bibliographic general reference databases [Cochrane; PsychINFO; Medline; EMBASE; Applied Social Sciences Index and Abstracts (ASSIA); National Criminal Justice Reference Service (NCJRS) and Web of Science]; two dissertation and thesis portals (ProQuest Dissertations and Theses Global and British Library ETHos); Government policy sources (United States Office of Juvenile Justice and Delinquency Prevention; United States Department of Health and Human Services; United Kingdom Ministry of Justice; United Kingdom Department of Health and NHS evidence) and four Multisystemic Therapy related websites (http://msti.org, http://mstuk.org, http://www.mstservices.com; Family Services Research Centre of the Medical University of South Carolina).

The following search terms were modified where appropriate to meet the search requirements of each database “Multisystemic Therapy OR Multisystemic OR Multi-systemic” AND “therap* or treat* or interven* or program*” AND “Outcome* OR evaluat* OR effect* OR experiment* OR trial OR compare* OR impact OR consequen* OR recidiv* OR reoffen* OR relapse OR reconvict* OR research” AND “youth* or adolesc* or young* or teen* or juvenile* or child* or minors* or boys* or girls*.”

The search was restricted to studies published after January 2002 in accordance with guidelines by Petticrew and Roberts (2006) to allow for an overlap of approximately 1 year before the end date for review updates. To broaden the search, reference lists of shortlisted articles as well as relevant book chapters were hand searched to identify potentially eligible studies. Additionally, a number of prominent authors in the field were contacted directly to identify unpublished or on-going research that might meet the inclusion criteria (Dr Littell, Dr Fonagy, Professor Henggeler, Professor Borduin, Professor Leschied and Professor Ogden). All but one responded within the time frame of the study.

Study Selection

Inclusion/Exclusion Criteria

A population, intervention, comparators and outcome (PICO) framework was used to support a robust search strategy and identify potential studies to be included. The inclusion criteria are laid out in Table 1.

Table 1 The PICO guide to identify relevant literature

The aim was to evaluate the effectiveness of Multisystemic Therapy as an intervention for reducing antisocial behavior and therefore adapted versions for problem sexual behavior and contingency management/substance abuse were included given the highly elevated rates of alcohol and drug use among those involved in the youth justice system (Tripodi and Bender 2011). Only programs licensed by Multisystemic Therapy Services Inc. were considered due to the stringent training and ongoing supervision/consultation processes. Where possible the methods employed in a review update should mimic those of the original review (Higgins and Green 2011). One of the studies included in the review by Littell et al. (2005) involved an adaptation for psychiatric emergencies. The validity of including this adaptation when examining the treatment effect for offending is questionable given the potential differences in clinical presentation. This was further indicated from the substantially different mean treatment lengths reported between the psychiatric emergency adaptation and standard Multisystemic Therapy (90 h versus 23–40 h, respectively).

Eligible study designs involved the random assignment of participants to treatment and comparison/control groups. Although it is recognized that there are challenges with random allocation, the randomized control trial is recognized as the optimal design for minimizing possible pre-existing differences between treatment and comparison groups as well as risk of inadvertent researcher bias. As highlighted by Peto, Collins and Gray (1995) “There is simply no serious scientific alternative to the generation of large-scale randomized evidence…. [randomized control trials] have a central role to play in the development of rational criteria for the planning of health care throughout the world” (p. 39).

The initial systematic search of reference databases provided a total of 2400 potentially relevant hits. An additional 184 articles were identified from other sources including hand searching reference lists, dissertation portals or government policy sources. The majority of hits were excluded during the initial screening of titles and abstracts only by the first author, due to clear irrelevance to the current review, duplication or non-English language. Of the 172 remaining studies; attempts were made to retrieve the full copies of the article via the University of Birmingham e library, on site library, interlibrary loans or direct contact with authors. Five articles could not be located.

A predefined form was used to assess study eligibility of those articles deemed potentially relevant during initial sifting. The search process is presented in Fig. 1. This resulted in 18 articles including three updates to two studies included in the systematic review by Littell et al. (2005) and 15 cross referenced publications of nine new trials (see Table 4 in “Appendix” for studies).

Fig. 1
figure 1

Search results and study selection

Quality Assessment

The 11 studies meeting inclusion criteria were each rigorously quality assured using a pre-defined checklist. Information was cross referenced across publications as recommended by Littell, Corcoran and Pillai (2008). A checklist developed by the United Kingdom’s National Institute of Clinical Evidence (2012) of the most common and well documented biases relevant to randomized control trials was adapted for this purpose. There are no summary scores or specific numerical algorithms within the tool; the use of which is questionable given the lack of standard techniques establishing reliability and validity of quality assessment scales (Juni et al. 1999). Furthermore, there is an explicit emphasis on considering the likely magnitude and direction of any possible bias supporting the critical evaluation of the implications for interpreting findings (Centre for Review and Dissemination 2009). A structured judgment process was used to combine the overall appraisal of bias and confidence in the findings into four possible categories: strong, good, weak and, rejected. All of the included studies were critically appraised and a random selection, 36.4% (4 papers) were quality checked by a second independent reviewer to ensure reliable ratings. The consensus ratings, where applicable, are presented in Table 5 in “Appendix”.

Data Extraction

Data were extracted from the included studies by the primary author using a predetermined form and any absent or unclear information was marked next to the relevant item.

Results

Quality of Included Studies in Review

The included studies ranged in quality (see Table 5 in “Appendix”). Four of these were rated “strong” (Asscher et al. 2013; Butler et al. 2011; Sawyer and Borduin 2011; Sundell et al. 2008); four others were rated as “good” (Borduin et al. 2009; Letourneau et al. 2009; Ogden and Hagen 2006; Weiss et al. 2013) and three were rated “weak” (Glisson et al. 2010; Henggeler et al. 2006; Timmons-Mitchell et al. 2006).

Referral pathways were frequently poorly described in that a discrepancy existed between eligible participants and those included for randomization. For example, in the United Kingdom, about a quarter of those referred could not be contacted or refused to consent to assessment. This was a similar proportion in the referrals of both Weiss et al. (2013) and Letourneau et al. (2009). It is possible that this proportion may represent those families with more chaotic lives who are less willing to cooperate with services and whose profiles could be substantially different from those who do agree. Furthermore, other staff frequently carried out the initial screening (e.g., social workers, Sundell et al. 2008; probation staff; Henggeler et al. 2006). Various local agencies may have made different judgments on possible eligibility introducing a level of selection bias among and across sites or referrers. It is therefore not known how representative the samples within the trials may be of the general target population of Multisystemic Therapy.

Appropriate randomization was undertaken in some studies, although there was variety in the method, the point at which this occurred and concealment of allocation. The method of randomization was not reported in three studies (Henggeler et al. 2006; Ogden and Hagen 2006; Weiss et al. 2013). A further two studies used the coin toss (Sawyer and Borduin 2011; Timmons-Mitchell et al. 2006); the validity of which is questionable (Clark and Westerberg 2009). Almost half of the studies did not explicitly state who undertook randomization and how this remained concealed (e.g., Weiss et al. 2013).

It was positive that almost all of the studies explored group differences at baseline on demographic, criminal histories and/or psychosocial characteristics. No statistically significant differences were found for most studies indicating that randomization had been successful (e.g., Asscher et al. 2013). Exceptions to this included Henggeler et al. (2006) and Borduin et al. (2009) where the likely direction of the effect of the bias was assessed as favoring the comparison group. Furthermore, in the Norway randomized control trial (Ogden and Hagen 2006), differences in baseline scores could have overinflated the estimates of treatment effects.

Sample sizes were relatively small with three studies having less than 100 participants (Borduin et al. 2009; Ogden and Hagen 2006; Timmons-Mitchell et al. 2006). Few studies specifically reported on having undertaken power calculations (only Butler et al. 2011; Glisson et al. 2010) making it difficult to assess whether the sample size was adequate to detect a true effect.

Almost half of the included studies provided inadequate descriptions of the comparison condition making it difficult to know what Multisystemic Therapy was being compared with. In almost all of the studies; it was unclear whether the groups had received the same care apart from the intervention being studied.

Some studies reported low rates of missing data (e.g., at 2 year follow-up 94% of participants completed assessment measures, Letourneau et al. 2009); whereas other studies had relatively high rates of non-completion (e.g., in Glisson et al. 2010, just over half of participants (57.0%) completed the 18 month assessment measures). Only one study provided no information on drop outs (Timmons-Mitchell et al. 2006). There was variation across studies in whether analyses had been undertaken to examine the missing and completed data and any possible group differences. This is potentially problematic as participants who drop out may have more significant difficulties and as a consequence treatment effects could be overinflated.

It was positive that all of the studies had follow-up periods of over 1 year. Recidivism rates increase with time indicating the need for long term observation strategies. However, the degree of difference between studies in follow-up periods is potentially problematic. It may be that, for the studies with shorter follow-up periods, recidivism rates would increase as time since assessment/discharge increases. For the two problem sexual behavior studies; the low base rate of sexual recidivism is an inherent difficulty in outcome research. Letourneau et al. (2009) reported only four sexual offences across the sample at the 2 year follow-up.

Most of the studies considered a wide range of outcomes, used reliable measures for their assessment and multiple sources of information. The exception to this was Timmons-Mitchell et al. (2006) where research assistants completed the Child and Adolescent Functional Assessment Scale solely from court records. No inter rater reliability information was reported for the coding process and it is not clear how comprehensive the file information was. Four studies relied on caregivers to report out-of-home placement (Glisson et al. 2010; Henggeler et al. 2006; Letourneau et al. 2009; Ogden and Hagen 2006) which is potentially less reliable as parents may be motivated in various different ways. In America, the vast majority of the studies only examined official data on antisocial behavior within the state (e.g., Borduin et al. 2009) potentially missing a proportion of those adolescents who moved to another state during the follow-up period.

It is positive that seven studies used other sources of information for involvement in antisocial behavior given that many offences are not reported; and even those that are brought to the attention of police may not be officially recorded. For the majority, this involved the Self-Reported Delinquency Scale (Elliott et al. 1985), which has demonstrable reliability and validity across a range of settings. However, this tool focuses on serious criminal behaviors and it is possible that adolescents may incorrectly label certain behaviors thus overstating seriousness. As with all self-report measures; there are issues related to social desirability. This may be particularly relevant given that most of the populations under examination were involved with the justice system and adolescents may be reluctant to provide accurate accounts for fear of further consequences.

In some studies the treatment and comparison groups were assessed on the same schedule (e.g., Weiss et al. 2013); however, in other trials, data was collected at different points (e.g., Timmons-Mitchell et al. 2006). This can be problematic because the outcome data for cases may not be comparable due to the differences in the length of observation periods.

In some studies, there were good descriptions of the care taken to ensure that those collecting the data were blind to the condition (e.g., teachers were informed that questionnaires were for a study on teen socialization in Borduin et al. 2009). However, in other studies, blinding was unclear (e.g., Timmons-Mitchell et al. 2006) or it was stated that those collecting the data were aware of the assignment (e.g., Letourneau et al. 2009). This potentially introduces an element of bias due to people’s preconceived beliefs about Multisystemic Therapy and how these may consciously or unconsciously influence their behavior.

Just over half of the studies were assessed as independent of the program developers and their associates (Asscher et al. 2013; Butler et al. 2011; Ogden and Hagen 2006; Sundell et al. 2008; Timmons-Mitchell et al. 2006; Weiss et al. 2013).

Characteristics of Included Studies

Population

Table 2 provides an overview of the characteristics of the 11 included studies. The majority were undertaken in America (Borduin et al. 2009; Glisson et al. 2010; Henggeler et al. 2006; Letourneau et al. 2009; Sawyer and Borduin 2011; Timmons-Mitchell et al. 2006; Weiss et al. 2013) and Europe including the Netherlands (Asscher et al. 2013); Sweden (Sundell et al. 2008); the United Kingdom (Butler et al. 2011) and an update to the Norway randomized control trial (Ogden and Hagen 2006).

Table 2 Characteristics of included studies

The participants’ characteristics were well described in most studies and it was possible to quantitatively synthesize some demographic information. The sample size ranged from 48 (Borduin et al. 2009) to 674 (Glisson et al. 2010) with over a 100 participants in eight of the studies. The size of the sample across all studies was 2042 adolescents (mean 185.6, SD = 171.5). The mean average age for the whole sample was 14.9 years (SD = 0.5) with Asscher et al. (2013) reporting the highest mean age (16.0 years) and Borduin et al. (2009) reporting the lowest (14.0 years). The samples were predominantly male; percentages of females ranged from 2.4% (Letourneau et al. 2009) to 39% (Sundell et al. 2008). The ethnicity of participants varied and the use of ethnic categories was not consistent across studies.

Samples were recruited from various sources; the majority from youth justice (Borduin et al. 2009; Butler et al. 2011; Glisson et al. 2010; Henggeler et al. 2006; Letourneau et al. 2009; Sawyer and Borduin 2011; Timmons-Mitchell et al. 2006); social care (Ogden and Hagen 2006; Sundell et al. 2008); alternative education (Weiss et al. 2013) or a combination (Asscher et al. 2013). Participant offending history, where reported, varied. The offending histories of participants were typically more severe in the American studies compared with Europe (e.g., Timmons-Mitchell et al. 2006 reported that the average number of all pre-treatment offences was 6.87, SD = 4.4). The exception to this was the problem sexual behavior sample in Letourneau et al. (2009) which appeared less delinquent (62% had no prior general offences).

Intervention

The majority (8) of the included studies examined standard Multisystemic Therapy; a further two studies examined problem sexual behavior and one for contingency management/substance abuse. The average length of treatment was provided by eight studies (excluding Asscher et al. 2013; Ogden and Hagen 2006; Weiss et al. 2013) and this was converted to weeks where possible. Glisson et al. (2010) had the lowest mean treatment length (15.0 weeks, SD = 6.3) and Multisystemic Therapy-problem sexual behavior had the longest (approximately 31 weeks) (Borduin et al. 2009; Letourneau et al. 2009). Only two studies provided information on actual therapy contact. Henggeler et al. (2006) reported that approximately half of therapist contacts were with family; 13% school, 5% peer; 14% youth and 19% community. By contrast, Weiss et al. (2013) reported increased involvement with adolescents (95% individual sessions) and school (94%).

The proportion of participants who successfully completed Multisystemic Therapy was frequently unreported (e.g., Butler et al. 2011; Glisson et al. 2010). Only Borduin et al. (2009) reported 100% treatment completion. Few studies provided clear information about how many of those randomly allocated actually started treatment, were discharged on the mutual agreement with caregivers or prematurely terminated (e.g., only Letourneau et al. 2009; Sundell et al. 2008 gave clear descriptions).

With regard to the therapists, most studies provided information about professional background and demographic characteristics. This ranged from qualification details (e.g., 86% had Masters degrees in Asscher et al. 2013) to specific information about professional background, additional training and years of clinical experience (e.g., Sundell et al. 2008; Butler et al. 2011). Given that this review only included licensed Multisystemic Therapy programs, as standard all therapists would have received the 5 day orientation training; weekly group supervision/consultation and quarterly boosters (Schoenwald 2008).

Every study in this review attempted to measure treatment integrity, most frequently using the Therapist Adherence Measure which is a 26 item Likert scale developed through Multisystemic Therapy expert consensus and validated in two studies (Henggeler et al. 1997, 1999). The measure is collected regularly with each family and is said to measure those therapist features which are specific to the Multisystemic Therapy principles. However, not all of the studies reported on the actual mean adherence score (e.g., Borduin et al. 2009; Butler et al. 2011) and those which did provided little referential information about what the score signified. Only Weiss et al. (2013) reported the mean therapist score to be in the moderately high to high range for adherence to treatment. In many studies; the Therapist Adherence Measure was not available from all families for reasons unreported (e.g., Sundell et al. 2008; Timmons-Mitchell et al. 2006).

Other measures of treatment integrity included independent coding of adherence to the nine Multisystemic Therapy principles from audio taped sessions by consultants (Glisson et al. 2010; Weiss et al. 2013). However, it was not clear whether those involved in the coding were aware of the ongoing research trial. Given that there continues to be a lack of clarity as to how the principles reflect the theory of change and are operationalized in clinical practice, it remains unclear as to whether such codings provide a reliable measure of fidelity to Multisystemic Therapy specifically. Glisson and colleagues (2010) used therapist logs to detail time spent addressing sub-systems (e.g., individual child, family with primary caregiver, school with caregiver). While this may provide a good indication of the multi-modal approach, it does not provide a measure of treatment integrity per se.

Comparison

All of the studies compared Multisystemic Therapy with another approach apart from Weiss et al. (2013). In this trial, the participants were all recruited from a self-contained intervention classroom and so were involved with a behavioral change program at some level.

There was variation in the information provided about comparison groups. The most comprehensive reports were provided by the two problem sexual behavior studies which included the theoretical orientation and content, format, supervision processes and therapist characteristics (Borduin et al. 2009; Letourneau et al. 2009). Other studies provided a general overview (e.g., in the Netherlands usual services consisted of counseling, family based treatments, detention or no treatment, Asscher et al. 2013). Some studies also reported relatively high proportions of comparison group participants to be placed out-of-home (e.g., about a quarter, Glisson et al. 2010; Ogden and Hagen 2006).

Data Synthesis

The majority of the studies examined outcomes across a range of domains including antisocial behavior, drug and alcohol use, adolescent functioning, family functioning, peer relations and school.

Antisocial Behavior

A standard operationalization of antisocial behavior is lacking and outcomes were reported in different ways including official data: arrests/charges (yes or no; rates, time to first arrest and type) and out-of-home placement (yes/no; number of days/years; combining various types such as prison, treatment setting, foster parents, institution or supervised living facility; official and caregiver report). Information was typically obtained from a range of official sources including correctional, probation and police services. In America, only Letourneau et al. (2009) accessed national sources as well as within state data. The follow-up periods for the official recidivism data ranged widely between 1 year (Henggeler et al. 2006) and 21.9 years (Sawyer and Borduin 2011) making it difficult to directly compare rates across studies.

About half of the studies found no significant differences on official measures of antisocial behavior (e.g., Asscher et al. 2014; Henggeler et al. 2006; Letourneau et al. 2013; Löfholm et al. 2009; Weiss et al. 2013) (see Table 6 in “Appendix” for findings from official sources). The study with the longest follow-up period (21.9 years) found significant differences on arrests for both violent and non-violent felonies, years incarcerated but not misdemeanor offences (Sawyer and Borduin 2011). In the United Kingdom, Butler et al. (2011) found significant between group differences in favor of Multisystemic Therapy for violent and non-violent offences at 18 months follow-up. The charges for which adolescents were formally arraigned were also significantly fewer for the Multisystemic Therapy group in Timmons-Mitchell et al. (2006). Survival analyses indicated that Multisystemic Therapy participants survived for longer without any type of arrest than the comparison group [χ2 (1) = 6.06, p = 0.01]. Lastly, Borduin et al. (2009) found that the Multisystemic Therapy-problem sexual behavior group had significantly fewer arrests for sexual crimes, non-sexual crimes and fewer days in custody (effect sizes reported between 0.086 and 0.155 which can be considered small). Survival analysis examining the proportion of participants who were not arrested in each group by time favored treatment efficacy [(χ2 (1, n = 48) = 8.17, p < 0.01)]. Three studies reported a significant difference in out-of-home placement in favor of Multisystemic Therapy based on caregiver report (Glisson et al. 2010; Letourneau et al. 2013; Ogden and Hagen 2006).

Seven studies used the Self-Report Delinquency scale (Elliott et al. 1985). Various subscales were used and reported upon differently (e.g., raw score/t score) thus making comparisons difficult. One trial could not be reported upon as analysis was not undertaken between the Multisystemic Therapy and comparison condition (Henggeler et al. 2006). Of the remaining six, differences between group scores were significant for four studies (Asscher et al. 2014 for property but not violent offences; Borduin et al. 2009; Letourneau et al. 2013; Ogden and Hagen 2006). Both Asscher et al. (2014) and Ogden and Hagen (2006) also reported effect sizes which were 0.37 and 0.26, respectively and could be considered small.

Alcohol and Substance Use

For the study investigating Multisystemic Therapy-contingency management, a significant group and time interaction effect on the self-report Form 90 for alcohol consumption was found (p = 0.049) (Henggeler et al. 2006). This indicated that adolescents under drug court for both the Multisystemic Therapy and Multisystemic Therapy-contingency management conditions reported less alcohol use at follow-up (12 months) controlling for baseline scores. This was also indicated for heavy alcohol, marijuana and polydrug use. There was also a significant difference between urine screens (drug court only: 45% positive; drug court and Multisystemic Therapy: 7% and drug court and Multisystemic Therapy-contingency management: 17%, p < 0.001) in favor of the two Multisystemic Therapy conditions.

Very few other trials used measures to assess substance use directly. The exception to this was Löfholm et al. (2009) who found that drug related problems reduced over time whilst risky alcohol increased for both groups. At the longest follow-up (2 years), Letourneau et al. (2009) also found no significant between group differences on self-reported substance use as measured by a subscale of the Personal Experience Inventory.

Adolescent Functioning

Adolescent outcomes were typically gathered from multiple informants including caregivers, adolescents and school staff. A range of measures were used; the most common being the Child Behavior Checklist (Achenbach 1991). Psychiatric symptomology (e.g., Borduin 2009), antisocial beliefs (e.g., Butler et al. 2011), self-esteem (Asscher et al. 2013) and psychopathy (e.g., Asscher et al. 2013) were also examined. Only one study included a measure to assess a protective factor (Sense of Coherence Scale, Asscher et al. 2013).

The caregiver reported Child Behavior Checklist was used by eight studies and significant between group differences were reported for half of these in favor of Multisystemic Therapy (Asscher et al. 2014 (externalizing); Butler et al. 2011 (aggression and delinquency subscales); Ogden and Hagen, 2006 (total) and Weiss et al. 2013 (externalizing)). Findings could not be directly compared because different scales were used; subscales and reporting of scores (raw or t scores). Four studies reported on the effect size which ranged from 0.35 (Weiss et al. 2013) to 0.53 (Asscher et al. 2014), which is in the small to medium range.

The adolescent report version of the Child Behavior Checklist (Youth Self-Report, Achenbach 1991) was used by seven studies. For the Multisystemic Therapy-contingency management trial, the scores were not reported in the “results’’ section as the t scores for the total were almost the same as the mean for the normative sample (Henggeler et al. 2006). Three of the six studies found a significant positive treatment effect; all on the externalizing scale (Asscher et al. 2014; Letourneau et al. 2009; Weiss et al. 2013). Two studies provided effect sizes that could be considered small (0.39 in Asscher et al. 2014; 0.26 in; Weiss et al. 2013).

Family Functioning

Six of the included studies examined the family domain and this was where the range in measures used varied the most. Areas of parenting (e.g., Butler et al. 2011), quality of parent–youth relationships (e.g., Asscher et al. 2013) and parental mental health (e.g., Borduin et al. 2009) were all explored. Some researchers used adapted scales of various parent and family assessments and combined them together (e.g., Asscher et al. 2013). One study also included an observer assessment of the family examining positive parenting, inept discipline and relationship quality (Asscher et al. 2013).

There were some mixed findings within and between studies and the variety in the measures used makes it challenging to draw any generalizations. For example, Butler et al. (2011) found that positive parenting increased in the Multisystemic Therapy group and decreased in the control group [ES time(multisystemic therapy) = 0.29, 95% CI = − 0.12, 0.72], although this was only for caregiver and not adolescent report. Löfholm et al. (2009) in Sweden found no treatment effect on parenting skills and across both conditions; parenting skill increased significantly over time as reported by caregivers but not by adolescents. In contrast, positive findings were found in Henggeler et al. (2009b) for youth report of Lax Discipline involving an increased follow through on adolescent misbehavior by caregivers in the Multisystemic Therapy group (p < 0.05). Only Asscher et al. (2014) found significant improvement for positive discipline for the Multisystemic Therapy group as rated by caregivers, adolescents and observers.

Peers

Peer relationships, most typically association with procriminal peers, were assessed in five of the studies mostly from youth and/or caregiver reports. Some studies found no significant differences in the association with delinquent peers between Multisystemic Therapy and comparison groups (e.g., Butler et al. 2011; Sundell et al. 2008) while others reported favorable treatment effects. This included Henggeler et al. (2009b) who reported that the scores for “Bad Friends” for Multisystemic Therapy participants decreased significantly more over time than for usual services (p < 0.05). Borduin et al. (2009) also reported favorable treatment effects for both the parent and teacher composite measure and adolescent report of emotional bonding and social maturity for the Multisystemic Therapy group. Asscher et al. (2014) found significant between group differences on increased contact with prosocial peers in favor of Multisystemic Therapy but not decreased affiliation with deviant peers.

School

Education was the area least considered and was assessed by only three studies involving information about grades, suspensions and attendance (Borduin et al. 2009; Sundell et al. 2008; Weiss et al. 2013). In the alternative education sample, no treatment effects were found on the Teacher externalizing scale, school grades or days suspended (Weiss et al. 2013). Borduin et al. (2009) reported a significant effect for parent and teacher report of grades in favor of Multisystemic Therapy. Lastly, Löfholm et al. (2009) reported no treatment effect for school attendance.

After Multisystemic Therapy

The vast majority of the studies provided no information regarding aftercare services. Only Sundell et al. (2008) reported that at the 7 month follow-up; two-thirds (66%) of Multisystemic Therapy adolescents were still receiving services and 39% had been re-referred for investigations resulting in new services. This was similar to the comparison group. At the 24 month follow-up, a third of the Multisystemic Therapy group was still receiving services (Löfholm et al. 2009).

Discussion

The current study aimed to investigate the evidence for Multisystemic Therapy, which is an intensive family and community based intervention for young people aged 10–17 years old who are at high risk of offending behavior and/or out-of-home placement. Multisystemic Therapy is currently delivered worldwide across 16 countries and is a recommended intervention for the prevention of serious adolescent offending by a number of governmental departments including the Office of Juvenile Justice and Delinquency Prevention, United States Department of Justice and the National Institute of Clinical Excellence, Department of Health in the United Kingdom. It might be supposed, therefore, that the effectiveness of Multisystemic Therapy has been consistently empirically demonstrated. However, the most recent independent systematic review found inconclusive evidence for the program’s effectiveness in comparison to other services (Littell et al. 2005).

The systematic review undertaken by Littell et al. (2005) included trials conducted over 10 years ago, almost all of which had involvement from the program developers in the United States. An updated review of the evidence was therefore considered pertinent, particularly in light of the probable increase in both international and independent research. Developing understanding of interventions that effectively reduce persistent patterns of antisocial behavior among adolescents continues to be an ongoing priority in delinquency prevention policy. The current study, therefore, aimed to identify the highest quality and most recently conducted experimental studies to determine whether Multisystemic Therapy was more effective than treatment as usual/no treatment in addressing both primary and secondary outcomes. The findings of the 11 independent samples (3 updates and 9 new trials) will be discussed in the context of previous research and arranged by program type.

Multisystemic Therapy: Standard

Six new randomized control trials and two updates to previous trials have been conducted to evaluate the effectiveness of Multisystemic Therapy adding to the previous five randomized control trials (four published, one unpublished) reported by Littell et al. (2005). The six most recent studies were all conducted in real world settings with community practitioners, by research teams not associated with the original developers (excluding Glisson et al. 2010 where the second author is on the Board of Directors). Multisystemic Therapy has now been evaluated in six countries (America, Canada, Netherlands, Norway, Sweden and the United Kingdom) potentially increasing the external validity of findings.

International research is not, however, without challenges. There continues to be debate about the level of adaptation needed to transport Multisystemic Therapy to local contexts and systems (Ogden et al. 2009; Kiddy 2014). Furthermore, the interpretation of findings is complicated by the social, cultural and ethnic factors that are unique to a particular country and influence comparative “usual services” for managing adolescents with antisocial behavior (Epping-Jordan 2004). Aggregating findings can mask real differences in contributing contextual factors. In both the Netherlands and Sweden; adolescents (up to 20 years of age) with antisocial behavior are primarily managed through the child welfare system. Services frequently adopt an in-home and family orientated therapeutic approach. By contrast, out-of-home placements are often the primary intervention in America where young offenders are managed through the justice system. Whilst the legal system in the United Kingdom is potentially more comparable to the United States, placement in secure settings is limited to those most serious young offenders and youth offending services deliver a comprehensive package of support to target those individual and system factors that put an adolescent at risk of (re)offending and build upon protective factors.

Norway was the first European country to implement Multisystemic Therapy and the most recent report at 24 month follow-up showed some positive treatment effects on out-of-home placement, self-reported delinquency and parent/teacher reported adolescent difficult behaviors (Ogden and Hagen 2006). However, one of the four original sites was removed from analyses leading to a substantial reduction in the sample size in the update due to the lack of Therapist Adherence Measure data, for reasons unreported, thereby preventing assessment of treatment integrity.

By contrast, the Swedish trial found that usual services performed equally as well as Multisystemic Therapy. Decreases in adolescent problem behaviors, improvements within family and social skills were observed for both groups (Sundell et al. 2008). The researchers suggest that these findings may be explained by the difference in implementation (Löfholm et al. 2009). In Sweden, the program was guided by local initiatives; whereas in Norway there was a national strategy thus potentially increasing the support given to teams, demonstrating commitment, increasing the acceptability of the intervention to practitioners and their level of accountability for outcomes.

An alternative explanation is that, in Norway, about 50% of participants in the comparison condition were in residential settings compared with 18% in Sweden. The proportion of participants out-of-home is a complicating factor given that increased contact with other adolescents with risky behaviors may well increase the chance of iatrogenic effects (Dishion et al. 1999), thereby potentially disfavoring the Norwegian comparison group. A further possibility is that usual services in Sweden may be of a higher quality and, therefore, the comparison group experienced more positive outcomes. It is also possible that Multisystemic Therapy was not delivered with adequate fidelity in Sweden where lower mean Therapist Adherence Measure scores by 1 SD were found compared with American studies. Previous research undertaken by the program developers has linked Therapist Adherence Measure scores with positive outcomes (Schoenwald et al. 2005), although this correlation has not been a consistent finding (e.g., studies in the United Kingdom, Sweden and Canada have found no such associations).

Whether the Therapist Adherence Measure itself actually provides a measure of adherence remains contentious (Littell 2006). Sample items such as (“My family and the therapist worked together effectively”) may be related to constructs such as client satisfaction and therapeutic alliance rather than adherence to Multisystemic Therapy principles per se. Furthermore, the Therapist Adherence Measure is a family rated measure; which arguably provides little independent assessment of adherence. Families who experience positive outcomes may quite simply give better feedback. Treatment integrity can be defined as the extent to which an intervention is implemented as intended involving therapist adherence, competence and treatment differentiation (Perepletchikova et al. 2007). In this respect, the Therapist Adherence Measure itself provides little indication about the level of competence with which therapists deliver the multiple components of Multisystemic Therapy. This remains unexplored within the research literature but is important given that therapists are required to be expert in several different therapeutic approaches. It is not clear how this is achieved in clinical practice, especially given that therapists come from a broad range of backgrounds including applied psychology, social work, youth justice, family therapy and nursing (Fox and Ashmore 2014).

Other countries where no significant differences on a range of primary outcomes between Multisystemic Therapy and usual services were found include convictions or out-of-home placement in Canada (Leschied and Cunningham 2002) and for frequency, timing and type of rearrest in the Netherlands (Asscher et al. 2014). In the latter case, small positive treatment effects (ranging from 0.25 to 0.36) were found for parent and adolescent reported externalizing behavior.

In this review, the three studies with favorable treatment effects on official data were the United Kingdom study (Butler et al. 2011), an American family court (Timmons-Mitchell et al. 2006) and the 21.9 year follow-up by the program developers (Sawyer and Borduin 2011).

In the United Kingdom, Multisystemic Therapy was investigated in an ethnically diverse urban sample and compared with existing youth offending protocols. While in both conditions there were reduced re-offenses and out-of-home placements, there was a significant between group difference in the number of non-violent offences at 18 month follow-up. Consistent with this finding, post treatment adolescent and caregiver reported externalizing behavioral problems showed significantly greater reduction in the Multisystemic Therapy group. No group differences immediately post treatment were found for any of the secondary outcomes (e.g., parental supervision or association with deviant peers). The researchers suggest that changes within these domains may occur later, as with the official data where between-group differences only emerged at the 18 month follow-up. However, given that the assessment measures were only completed at intervention end, it is difficult to draw any conclusions about a possible delay in the effects of treatment.

Butler and colleagues concluded that Multisystemic Therapy adds value to current services in the United Kingdom; however, these findings do need to be interpreted with some caution. The sample size was relatively small (N = 108) and underpowered to be able to explore any mechanisms of change contributing to outcomes. The trial was also conducted in two North London boroughs limiting the external validity to other parts of the United Kingdom.

A positive treatment effect on official antisocial behavior was also found by Timmons-Mitchell et al. (2006), which was the first independent replication in America with serious juvenile offenders. Worryingly, however, two-thirds of the adolescents in the Multisystemic Therapy condition still went on to be arrested within the 18 month follow-up period. There were some substantial methodological limitations with the study including the randomization method (coin toss by court personnel), poor description of usual services, collection of data from a single secondary source (court records) and limited examination of the 11% drop outs. Treatment effects are likely to be overinflated when drop outs are not used in analyses as these cases tend to have more negative outcomes; thus, the direction of bias would likely have been in favor of Multisystemic Therapy.

Lastly, in the longest follow-up trial by Sawyer and Borduin (2011), significant differences in arrests for both violent and non-violent felonies, years incarcerated but not misdemeanor offences in favor of Multisystemic Therapy were found. It is positive that these participants continue to be followed up and such lengthy periods of observation are rare in interventions aimed at reducing antisocial behavior. However, Multisystemic Therapy was compared with individual therapies underpinned by psychodynamic, client centered or behavioral approaches which are unlikely to represent current practices among health, youth justice and social care agencies.

Multisystemic Therapy: Problem Sexual Behavior

This review found two new randomized control trials for Multisystemic Therapy-problem sexual behavior adding to one previously reported upon in Littell et al. (2005). Multisystemic Therapy is quite possibly one of the only programs for adolescents with sexually harmful behavior which has been investigated using a randomized control trial research design (Langstrom et al. 2013). The worthwhile efforts by researchers to overcome some of the significant logistical, legal, and ethical challenges in the pursuit of conducting randomized control trials with this specific population should be noted.

Multisystemic Therapy-problem sexual behavior has only been investigated in America, with oversight by the developers either as the main researchers (e.g., Borduin et al. 1990, 2009) or expert consultants (Letourneau et al. (2009)), thereby limiting the external validity of findings. The study reported in Littell et al. (2005) had a very small sample size (n = 16; 8 multisystemic therapy and 8 individual therapy with no sex offender treatment component) (Borduin et al. 1990). Recidivism rates (arrest data from court and police records) at the 3 year (range 21–49 months) follow-up were considerably lower for Multisystemic Therapy adolescents than the comparison group (sexual offences: 12.5 vs. 75%; non-sexual offences 25 vs.50%).

In this updated review; one of the few studies that found a significant treatment effect of Multisystemic Therapy on antisocial behavior from both official data and self-report was for problem sexual behavior (Borduin et al. 2009). Findings from the assessment measures pre and post treatment indicated that Multisystemic Therapy was more effective in decreasing problem behaviors in youth, improving family relations (cohesion and adaptability), peer relations (emotional bonding and social maturity) and academic performance (improved grades). There was a large observation period (average 8.9 years), which is particularly important given the relatively low base rate for sexual recidivism although the sample size was relatively small (N = 48). It is worth noting that the treatment length was the longest compared with the 11 included studies in this review (approximately 31 weeks) and almost twice the average treatment length as stated on the Multisystemic Therapy Services website (approximately 17.4 weeks). Any possible interaction between treatment dose on outcomes is unknown given the highly individualized approach in Multisystemic Therapy.

The second included study investigating Multisystemic Therapy for problem sexual behavior did not report any treatment effect on officially recorded offending (general arrests) but did on self-reported delinquency and out-of-home placement (Letourneau et al. 2013). The 2 year follow-up period is not considered long enough for investigating sex offender treatment (Collaborative Outcome Data Committee 2007). However, this was a larger clinical trial (N = 128) and the only one to use community practitioners. Multisystemic Therapy was also reported to have been more effective in decreasing problematic sexual behaviors and externalizing behaviors. It is worth noting that participants were substantially less “delinquent” than in other trials. At baseline, scores on the Child Behavior Check List were in the normal range and various measures were dichotomized due to low incidence, for example, the Self-Report Delinquency Scale and Adolescent Sexual Behavior Inventory. How generalizable these findings may be to the chronic and versatile juvenile offenders which Multisystemic Therapy purports to target is therefore questionable.

Multisystemic Therapy: Contingency Management/Substance Abuse

This updated review found one randomized control trial conducted specifically for substance abusing adolescents (Henggeler et al. 2006), which adds to one previous study in the review by Littell et al. (2005). Previous research by Henggeler, Pickrel and Brondino (1999) involved a sample of adolescents randomly assigned to Multisystemic Therapy or a community program (N = 118). Six months following completion, Multisystemic Therapy participants reported less use of alcohol, marijuana and other drugs than those accessing usual services. Fewer out-of-home placements were also observed. A smaller proportion of the participants (n = 80) were followed up for an average of 4 years. Multisystemic Therapy adolescents had fewer convictions and higher levels of abstinence from marijuana as indicated from self-report and urine analysis (abstinence: 55% for multisystemic therapy vs. 28% treatment as usual) (Henggeler et al. 2002).

The randomized control trial in this current review was a relatively complex trial including four conditions, three of which involved drug court (and usual services or and Multisystemic Therapy or and Multisystemic Therapy-contingency management) or family court (Henggeler et al. 2006). The three drug court conditions all appeared to be more effective in decreasing substance use and criminal behavior than family court. Data from urine screens and self-report indicated that adding Multisystemic Therapy or Multisystemic Therapy-contingency management further improved the substance abuse rated outcomes for adolescents but not the criminal or placement outcomes.

The Multisystemic Therapy-contingency management study was part efficacy and effectiveness in that all of the therapists were employed by the research center and had supervision from the program developers (Henggeler et al. 2006). It remains to be seen how these findings may be replicated in real world settings. Furthermore, it is highly concerning that about three quarters of adolescents were placed out-of-home in the drug court conditions given that “at home” is one of the key measures of program success (drug court 87%; drug court and multisystemic therapy 71%; drug court and multisystemic therapy-contingency management 74%). The researchers argue that the high level of supervision and weekly review involved with the drug court conditions contributed to this finding. The full resource implications of this intensive approach need to be considered and a 5 year follow-up is planned which will help in understanding the sustainability of the findings.

Limitations of Existing Research

The current study identified a number of new trials (nine) and three updates to previous trials investigating the effectiveness of Multisystemic Therapy since the systematic review undertaken by Littell et al. (2005). All of the studies used comparison groups, therefore, any treatment effects must be considered relative to those rather than absolute (Löfholm et al. 2013). What has emerged from the findings is the complexity involved when synthesizing data and drawing conclusions across international contexts. Usual services are heavily influenced by social, legal and political systems; they consist of changing and active approaches influenced by new theory and methods (Löfholm et al. 2013). Relative effects of treatment may also vary over time as community agencies adopt key features of Multisystemic Therapy, most likely to involve an increased emphasis on systemic and community approaches within services for adolescent offenders.

Most studies continue to use relatively small sample sizes; the largest randomized control trial was undertaken in the Appalachian counties (N = 674, Glisson et al. 2010). Sample size is important for sufficient statistical power to more reliably detect any possible group differences and begin to explore some of the possible moderating factors, such as gender, age and ethnicity. This would help with identifying those who may benefit the most from Multisystemic Therapy. Further to this, the majority of studies have been undertaken with predominantly male samples. Female delinquents have consistently been found to have specific difficulties including greater levels of sexual abuse victimization and mental health problems (Emeka and Sorensen 2009). It is important for future trials to allow for establishing the effectiveness of Multisystemic Therapy with females, exploring their experience of the program and making adaptations as necessary. Age as a moderating factor also requires further examination given that Multisystemic Therapy views the primary caregiver as the main conduit of change (Henggeler et al. 2009a) and that adolescence is a critical stage of development where the influence of peers and school factors may become stronger. There is a large developmental gap in the target age range for Multisystemic Therapy (10–17 years) and it is unlikely that models can be uniformly applied to this group as a whole. Greater use of split samples on the basis of participant age may support in identifying discrepancies between adolescents of differing ages.

The outcomes for the proportion of adolescents and their families who drop out of treatment does not appear to have been investigated within Multisystemic Therapy research. The average completion rate has been reported to be 74% (as cited in Sundell et al. 2008), which is comparable to other offending behavior treatment programs (Olver et al. 2011). A consistent finding is that those who do not complete treatment tend to fare worse, therefore, highlighting the value of exploring key predictors of attrition.

On the Multisystemic Therapy Services website, it states that the average length of treatment is up to 60 h of contact provided during a 4-month period (approximately 17.4 weeks). Average treatment length where stated in the included studies for the most part was generally higher than this reported figure. Multisystemic Therapy adopts an individualized approach to meet the needs of young people and families and as such there is no set treatment manual. Treatment length within and across studies varied considerably and the multicomponent nature of Multisystemic Therapy makes it difficult to know what exactly is being delivered, evaluated and how replicable this is in practice. It is difficult to know if there is any interaction between treatment dose on outcomes; and whether a particular number of sessions over a period of time may be associated with successful outcomes. The most widely cited measure to examine treatment integrity was the Therapist Adherence Measure, which is arguably a poor indicator given that it is completed by family members, neglects to consider the competence with which interventions are delivered and has not been established to measure any knowledge or clinical skill specific to Multisystemic Therapy.

It was positive that many of the studies used a number of outcome measures across domains and from multiple sources (caregiver, official, adolescent, teacher, social worker) given that Multisystemic Therapy adopts a social ecological approach. Furthermore, examining the depth and breadth of treatment effects can help to form better decisions. A general critique of the way in which outcomes are assessed in Multisystemic Therapy is that the specific referral problem behaviors (e.g., nonattendance at school; family aggression), treatment goals for each individual case and progress towards these is not reported. This adds to the challenge of how “success” is defined and how changes in the various systems contribute to treatment progress. Given that inherent within the program is building upon individual and system strengths, it was surprising that only Sundell et al. (2008) used a measure to explore a protective factor. Furthermore, research would benefit from wider consideration of education given its relevance to desistance (Lösel and Bender 2003; Payne et al. 2003); the value of school in measuring adolescent progress and that “in work or school” is one of the routine outcomes gathered by Multisystemic Therapy Services Inc. (http://www.mstservices.com).

The lack of carefully defining and tracking other services accessed by participants in the conditions was evident across studies. Given that most samples were referred from justice and/or social care agencies, there are likely to be contacts with other services (e.g., probation officers) but what this involved was often left unreported. Only Weiss et al. (2013) reported that three quarters of participants had received some form of mental health service outside of the project. While this was similar between conditions and assessed as being not significantly related to the primary outcome, access to other services is an important area to explore because Multisystemic Therapy is an intensive, costly resource and efforts should be made to reduce any possible duplication. Furthermore, whether and what contribution other services may make to the outcomes achieved is left unexamined.

Surprisingly, a complete absence of data about the arrangements for aftercare services at treatment end was found across the studies. In Sweden, around one-third of Multisystemic Therapy participants were still receiving services at the 2 year follow-up. The chronicity of conduct disorder is well recognized as are relapse rates for substance misuse. Despite the assertion by Multisystemic Therapy Services that most cases need minimal formal after care services (Multisystemic Therapy Services 2008, p. 1), from the authors own clinical experience it is highly unlikely that adolescents and their families who often have intergenerational dysfunction, trauma and abuse histories and long standing contact with social care or justice services will not need some form of formal aftercare package.

The advantages of Multisystemic Therapy do need recognition. It is one of the most widely evaluated and internationally transported interventions. Over and above effectiveness, the program addresses those known risk factors for reoffending among multiple domains within a structured framework. Multisystemic Therapy is delivered within the adolescents’ natural ecology thus potentially reducing barriers to accessing services and increasing the generalisablity of the skills taught. Assessing and promoting treatment fidelity as part of the outcome literature as well as focusing on clinician accountability are all highly valued features.

Study’s Limitations and Strengths

This review has a number of strengths and limitations. Published outcome research has been well documented by Multisystemic Therapy Services (e.g., Multisystemic therapy: research at a glance, 2015) and it is unlikely that any published studies would have been missed. Comprehensive search terms were used and substantial attempts to find unpublished research from several sources (including dissertation and thesis portals, government websites, searching relevant reference lists and contact with experts). All of the experts who responded were not aware of any further research which might be relevant.

It is recognized that the search process is not without bias. Firstly, due to time constraints and resources, only research in English was included. With a movement toward publishing research in English, the risk of language bias likely presents as less of a potential issue (Higgins and Green 2011). During the search, reference lists were scanned and it is acknowledged that the sole use of titles to identify articles of potential relevance involved some level of subjectivity. The expansion of the search in this way relied on the reference lists of the shortlisted articles, regardless of how the article itself had been identified.

Bias is also evident in the use of pre-defined criteria to establish which studies to include within this review. It could be argued that high criteria were set for inclusion (i.e., randomized control trials), which may be to the detriment of considering equally valuable studies. This includes a number of quasi experimental designs investigating Multisystemic Therapy (e.g., Painter 2009) as well as benchmarking studies (e.g., Curtis et al. 2009). Furthermore there are a few published and unpublished qualitative studies that make important contributions to the knowledge base, for example, investigating client or therapist experience of Multisystemic Therapy (e.g., Tighe et al. 2012; Markham 2016).

This review focused on randomized control trials due to various factors that can affect nonrandom allocation potentially predisposing the treatment group to better or worse outcomes. One good example of this is a trial conducted in Washington State where allocation to Multisystemic Therapy was left to the discretion of court personnel or inappropriately based on case numbers (Barnoski 2004, sample size N = 145). The review found no treatment effect for recidivism data at 18 month follow-up, however, on examination of the data, participants who had been allocated to Multisystemic Therapy scored significantly higher on the risk assessment tool used at baseline. The conclusions were that the validity of the trial had been compromised and a re-evaluation was recommended. The stringent inclusion criteria, therefore, served to minimize the risk of any inadvertent bias in allocation and increase the chances that included studies were appropriate and measured similar concepts.

Conclusion

The current study provided an update to a previous systematic review by Littell et al. (2005) of the most recently conducted randomized control trials for the effectiveness of Multisystemic Therapy. Despite the rapid international expansion of Multisystemic Therapy, it would seem that the evidence base continues to present conflicting evidence. Some studies found that Multisystemic Therapy had a positive treatment effect on official measures of antisocial behavior, self-reported involvement in delinquency, caregiver report of externalizing behavior problems and affiliation with antisocial peers. However, this review found that the findings were neither consistent across studies or within studies on the various measures used to assess outcomes.

Empirical investigations of Multisystemic Therapy using randomized control trials have now been undertaken in six countries. All of the studies used comparison groups, therefore, any treatment effects must be considered relative to those rather than absolute (Löfholm et al. 2013). This review has highlighted that the impact of the cultural, legal and political differences across counties upon the usual treatment condition and outcome measures needs to be taken into account when synthesizing data and drawing generalised conclusions. This is especially relevant for the Scandinavian counties where adolescents are typically managed through the child welfare system.

A further confounding factor is that samples sizes continue to be relatively small (8 of the 11 included studies had just over 100 participants and this exceeded 200 in only 2 studies) resulting in a lack of power in findings in the literature. Four studies made some attempt to examine subgroups of participants (e.g., age, gender and ethnicity) to explore who may benefit the most from Multisystemic Therapy; but subgroup sample sizes were very small. The majority of studies have been undertaken with predominantly male samples and are often too small to be able to fully explore any interactions between participant characteristics and outcome. This would help with more specifically answering the question of who may benefit the most which is essential to support the prioritization of this relatively expensive and intensive program. Greater consistency in the descriptions of samples, the demographic factors and specific cultural information from which they are drawn is recommended and would enable comparisons to be more easily made.

In conclusion, there is a considerable amount of research literature available on Multisystemic Therapy, which perhaps leads some to erroneously presume that the empirical support has been consistently demonstrated and that those families for which the program may be more successful is well known. Multisystemic Therapy research has been driven by examining effectiveness and the current study indicates that greater effort needs to be made to investigate both moderating factors and the mechanisms of change. This review has demonstrated the complexity of comparing randomized control trials across international contexts and identified that there is much work to be done in terms of understanding why Multisystemic Therapy might work and under which conditions it may be most successful.