Introduction

High-quality implementation of prevention programs has commonly been assumed to be a precondition of their effectiveness. The idea is that the recipients of a program can change and get better only if they are provided with exactly what the program promises. Nevertheless, very few studies have empirically examined this assumption. Existing studies are limited regarding evidence on which components of the implementation process are most important in ensuring the success of parenting programs delivered in ordinary service settings. In addition, the studies to date have focused on only one program at a time, providing insight into the role of implementation fidelity, but with limited generalizability across programs. Using data from an effectiveness trial encompassing four different programs, the present study attempts to examine which aspects of implementation integrity are associated with changes in parents’ and children’s behaviors. Also, the study attempts to elucidate the factors likely to be associated with implementation integrity.

Implementation integrity refers to “the degree to which treatment is delivered as intended” (Yeaton and Sechrest 1981). We adopted a definition of implementation integrity based on Dane and Schneider’s (1998) conceptualization, according to which it has four components: (1) adherence or fidelity (the degree to which program components are delivered as prescribed); (2) dose (the frequency and quantity of program administration); (3) quality of delivery (the extent to which facilitators approach a theoretical ideal in transmitting the core components); and (4) participant involvement (the levels of participation and enthusiasm). Altogether, the four components ensure that a program is implemented as intended, which reduces what has been called Type-III error (Dobson and Cook 1980), which refers to the error of attributing the failure of a program to its components or theory, when it is due to defective implementation.

Parenting programs aim at improving parenting in order to ameliorate the relationship between parents and children and are widely recognized as effective in reducing children problems (for a review see Furlong et al. 2012). However, very few studies have investigated the effects of the components of implementation integrity on parenting program effectiveness. Among the few available, adherence to the program manual has been associated with intervention’ effectiveness (for a review, see Durlak and DuPre 2008), but not always (Breitenstein et al. 2010). Quality of delivery and participant involvement have been associated with better outcomes in parents and children (Eames et al. 2009; Forgatch et al. 2005), while dose has shown contradictory results (Dane and Schneider 1998). Overall, there is evidence that the different aspects of implementation integrity are related to parenting program effects to varying extents.

There are some major limitations in the field. First, there are few studies that have investigated all the components of implementation integrity together. As Domitrovich and Greenberg (2000), Dusenbury et al. (2003), and Durlak and DuPre (2008) have pointed out in their reviews, few intervention studies adopt more than two components of implementation integrity (usually adherence and dose), and even fewer have linked these components to program effects. Berkel et al. (2011) have called for an integrative approach that accounts for the components in order to understand which are most important for program effectiveness. Thus, studies distinguishing and evaluating the effects of each aspect of implementation integrity on program outcomes are needed.

Moreover, the factors predicting high implementation integrity are not completely clear. Some authors have investigated the association between adherence and group leaders’ training, concluding that the more precisely group leaders are trained, the more likely they will implement a program with high fidelity (Rohrbach et al. 2010; Seng et al. 2006). Also, some characteristics of group leaders have been shown to be related to participants’ attendance (dose). For instance, racial and socioeconomic similarity between participants and group leaders is associated with leaders’ therapeutic engagement (Orrell-Valente et al. 1999), which in turn is related to higher rates of retention. Finally, although some studies have associated parental involvement with program adherence (Breitenstein et al. 2010), they have considered the influence of just one component at a time. A narrow approach limits knowledge of the relative predictive role of each factor, so a more comprehensive approach is required.

The Current Study

The aims of this study are twofold. First, we aimed to understand whether the components of implementation integrity—adherence, quality of implementation, dose, and participant involvement—affect the effectiveness of parenting programs. To achieve this goal, we used the four most common programs in Sweden when the present evaluation started. Three of these programs are, to some extent, behaviorally based (Comet, Cope, Incredible Years), while one is non-behavioral (Connect). Finding effects of implementation integrity across different types of programs permits drawing conclusions that are not limited to a specific program but are applicable to parenting programs in general. By contrast with previous studies, we assessed the dimensions of implementation integrity at both group and individual level. Moreover, we adopted a multi-informant approach, combing observational data, team-leader reports, and parent-reports, rather than focusing solely on leader reports, which have been shown to overestimate implementation integrity (Dusenbury et al. 2005). Next, we examined the role of implementation integrity for the program outcomes using data from an effectiveness trial. Some researchers have argued that the programs found to be effective in efficacy trials may fail to function as well when they are delivered in regular service settings (effectiveness trials) due to poor implementation integrity (Bumbarger and Perkins 2008). Nevertheless, since most studies focusing on implementation integrity have been based on efficacy trials, we chose to test our research questions in an effectiveness trial. Second, we investigated factors thought to be related to good implementation integrity. In keeping with Berkel et al.’s propositions (2011), we examined both leader-related aspects (adherence, quality of delivery) and participant-related aspects (participant involvement, dose). The former are likely to be dependent on features of the facilitators, such as gender, age, education, and experience, whereas the latter may be primarily dependent on participants’ perceptions of their facilitators and program.

Method

Design and Procedure

The present study is part of a larger project, The National Comparison of Parenting Programs, which aims to evaluate the effects on disruptive child behaviors of the most commonly used, manual-based parenting programs in Sweden. The study was designed as a randomized controlled effectiveness trial with pre- and post-test, one two-year follow up after post-test. Given the focus on implementation, the current paper is based on the measurements at pre- and post-test. The behavioral parenting programs considered were Cope (Cunningham 2006), Incredible Years (Webster-Stratton et al. 2004), and Comet (Kling et al. 2006), a Swedish program similar to Patterson’s Parent Management Training—Oregon Model, and one non-behavioral, attachment-based program, Connect (Moretti and Obsuth 2009), was also included (see Table 1). Parents were randomly assigned to a program or a control condition and they were unaware that different programs were available.

Table 1 Description of the parenting programs’ aims and format

In order to reduce barriers to participation, in each administrative region, the programs were offered by the human services units (e.g., schools, social welfare agencies, and child and adolescent psychiatry clinics) to all the parents in need. Most parents had contacted a unit on their own, but a few were recruited through advertisements about the availability of parenting programs in their communities (which was also a part of normal routine in these communities). However, fewer parents started on the Incredible Years program (75.4%) than on the other programs. This was because of organizational problems as some of the parents recruited for Incredible Years had to travel long distances to take part in the program, and as a result, many chose not to attend. The procedures have been described in detail elsewhere (blind for review). A total of 104 parenting groups were run by 76 pairs of team leaders. Parents completed a questionnaire before and immediately after the intervention. After program completion, they were asked questions concerning their commitment, their satisfaction, and the competence of the group leaders.

Participants

Parents of 749 children participated. The children’s ages ranged from 3 to 12 years, with average age 7.70 years (SD = 2.60). They were randomly assigned to one of the four parenting programs or to a control condition (for the randomization procedures, see Stattin et al. 2015). Only parents who participated in one of the programs were included, thereby excluding the parents in a waitlist control condition or in a self-help condition (where parents read a book). For the present study, we used the report of one parent for each family. If both parents attended the meetings, we selected the parent who had participated in most sessions of the program as the primary reporter. If the number of attendances was equal between parents, we chose the mother. Overall, mothers were the primary reporters (85%). The final sample comprised 535 parents, with an average age of 37.7 years (SD = 7.51), ranging from age 20, to age 60. About three out of four were married or cohabiting (74%), and the rest were single parents. In most cases (89%), both parents were born in one of the Scandinavian countries. The average monthly household income after tax was 30,000–40,000 SEK ($3500–$4700). There were 6.1% whose monthly incomes were as low as 0–10,000 SEK ($0–$1200), and 24.9% had an income higher than 50,000 SEK ($5900). Only the 6.3% of the parents acknowledged that their monthly income was not fully adequate. Finally, 45.5% of the parents had completed some university-level education, and 9% had only a compulsory-school education.

Parents attending the parenting programs did not differ with regard to marital status, monthly income, economic strain, or educational level. Because Connect was only provided for parents of children older than 9, parents participating in Connect were older and had older children than parents participating in the other programs (see Stattin et al. 2015).

One hundred and eleven team leaders, in 76 team-leader pairs, delivered the programs. All leaders received specific pre-project training. Their mean age was 49 (SD = 8.5), and 80% (N = 94) were women. The majority had a university degree (95%, N = 106), the rest a high-school diploma.

Measures

Outcomes of the Program

Parenting Competence

The 17-item Parenting Sense of Competence Scale (PSOC, Johnston and Mash 1989) was used to assess competence in parenting. Higher scores indicate higher competence. Cronbach’s alphas for subscales were .81 and .95 at T1 and T2, respectively.

Parents’ Reactions

Parents’ reactions to child misbehavior were assessed on five scales: Attempted to understand (5-item), Angry outbursts (5 item) (Stattin et al. 2011), Harsh parenting (7-item), Rewarding (2-item), and Praising (2-item) (Webster-Stratton et al. 2001). Higher scores indicate higher frequency of parenting reactions. Cronbach’s alphas were .69 for Attempted understanding, .79 for Angry outbursts, and .63 for Harsh parenting at T1, and .68, .76, and .72 at T2, respectively. Correlations between the two items measuring Praising were .64 (p < .001) at T1, and .58 (p < .001) at T2, while correlations between the two items measuring Rewarding were .69 (p < .001) at T1, and .64 (p < .001) at T2.

Children’s Externalizing Problems

Eyberg’s Child Behavior Inventory (ECBI) (Eyberg and Ross 1978), which comprises an Intensity and a Problem Scale, was used to assess children’s externalizing problems. The Intensity Scale assesses the frequency of 36 externalizing behaviors, and the Problem Scale the extent to which parents consider each of the externalizing behaviors to be problematic. The alphas for the Intensity Scale were .93 at T1 and .94 at T2, and for the Problem Scale .91 on both occasions. The Swanson, Nolan and Pelham Rating Scale (SNAP-IV) (Swanson et al. 1992) was used to assess inattention, hyperactivity/impulsivity and oppositional defiant disorder (ODD) (Kazdin et al. 1989). Cronbach’s alpha was .91 for inattention, .92 for hyperactivity/impulsivity, and .91 for ODD at T1, and .92, .91, and .91, respectively, at T2. For all measures, higher scores indicate greater child problems.

Dimensions of Implementation Integrity

Participants’ involvement was assessed through parent reports after program completion. In keeping with Dane and Schneider’s suggestion that involvement represents “levels of participation and enthusiasm”, we used an item indicating parents’ satisfaction with the program and an item assessing the quantity of homework the parents completed at home (see Table 2 for a description). These two items were analyzed separately because they represent two different aspects of participants’ involvement, as it was confirmed by their moderate correlation (r = .40).

Table 2 Measures of implementation integrity

Dose was assessed using the records of attendance kept by group leaders (see Table 2). Because there were variations in the number of sessions for each program, we converted attendance rates into an ordinal scale referring to the percentage of sessions attended by parents (1 = less than 25%, 2 = 26–50%, 3 = 51–75%, 4 = more than 75%).

Adherence and quality of delivery were assessed through observations made by independent raters following the definition of Dane and Schneider (1998). Three sessions per group were randomly selected and video-recorded, resulting in 228 videotaped group sessions. Of these, 56 (25%) were randomly extracted, stratified by program, and coded by independent experts with extensive experiences of being a group leader and trainer of other leaders. To train the expert raters the following procedure was adopted. First, two raters for each program rated five videotaped sessions together until they approached consensus (these five sessions were not part of the tapes that were finally rated). Then, they independently rated about half of the sessions. Next, to avoid drift, the experts together rated another five videotapes to maintain consensus in their ratings. Finally, they independently rated the rest of the videotapes. We used averaged scores across raters. The items used to rate the video-recorded sessions are shown in Table 2.

Adherence (i.e., the extent to which the group leader followed the program manual) was measured with one item, while quality of the delivery was assessed with 4 items (see Table 2 for a description) that ranged from 1 (not at all) to 10 (totally). Interrater agreement, as indicated by the correlation between the ratings of the independent assessors, was high (r = .84). The number of coded sessions for Comet was 17, for Cope 14, for Incredible Years 7, and for Connect 18. As the number of parents who started were 172 for Comet, 175 for Cope, 92 and 196 for Connect, the percentage of the coded sessions were about equal for each of the programs.

Factors Related to Implementation Integrity

Parents’ Perceptions of Group Leaders

Parents rated their leaders at the end of the program. They reported the extent to which they could lead the group, support parents, and understand parents’ problems, using one item for each behavior. Responses were rated on a 5-point scale ranging from 1, not at all, to 5, fully.

Team Leaders’ Characteristics

The team leaders were asked to state their gender, age, and level of education, and also asked whether they were specialized in a relevant area, such as psychotherapy. Because each group had two team leaders, we used gender composition (both females, both males, or mixed gender), average age, and average education, and aggregate specialization (i.e., none specialized, only one specialized, and both specialized) to represent the team-leader pairs.

Statistical Analyses

First, we investigated whether adherence and quality of delivery represent two different dimensions. The dimensions were highly correlated, and a confirmatory factor analysis (CFA) showed that adherence and quality of delivery were parts of the same construct [χ2(4) = 3.62, p > .05; CFI = 1.00; RMSEA = .00; SRMR = .01]. Therefore, we combined these two dimensions into an implementation quality aggregate score by computing the mean of the ratings of the five items.

We used two-level multilevel regression models to address the first study question—how implementation fidelity is related to changes in parents’ behaviors and competence, and children’s behavior problems. The observations are nested in parenting groups and include the group-level implementation-quality measure. Clustering may lead to inflated Type-I error if not treated properly (Duncan et al. 2006). Thus, we used multilevel modeling with two levels in MPlus with the maximum likelihood robust (MLR) estimator (Muthén and Muthén 1998–2012): group level (Level 2) and individual level (Level 1). In all models, we controlled for pre-test levels of child and parent outcomes, type of program, and child and parents’ age.

When an observation is missing at group level in nested data, the observations of the individual are also considered missing for the cluster in question. Therefore, we imputed missing data at group level using all available data external to the study models using a multiple-imputation technique (Enders 2010). Implementation quality and team leader characteristics were group level data. Because implementation quality was assessed by the ratings of a subset of video recordings, data were available for 58% of the groups. Overall, 70% of the participants were attending these groups. Thus, 70% of the individual level observations had also valid group level data on implementation quality. The main source of missing data for individual level observations was longitudinal attrition. The rate of longitudinal attrition was between 14 and 18%. We imputed five data sets, and merged them with the individual-level data. To examine the predictors of the dimensions of implementation fidelity, we fitted linear regression models using the TYPE = COMPLEX option in MPlus and the MLR estimator (Muthén and Muthén 1998–2012). The TYPE = COMPLEX option provides corrected standard-error estimates, reducing potential bias in test statistics due to clustering. In these models, we entered dummy-coded variables to control for differences across the programs.

Results

Descriptive Analyses

All programs were implemented with relatively high quality, with the lowest mean rating, on a 10-point scale, of M = 7.03 for Incredible Years. Despite the high quality of implementation, there were some differences between the programs. Cope and Comet had the highest quality, while Incredible Years had the lowest (see Table 3). Parents in Connect, followed by Comet, showed less absenteeism than parents in Cope and Incredible Years. Also, parents attending Comet completed their homework more often than those attending the other programs. Finally, Comet parents were more satisfied than parents attending Cope, Connect and Incredible Years. In sum, implementation integrity was generally high for all the programs. However, Cope and Comet were implemented to a higher standard than the other programs, and Comet was most appreciated by parents.

Table 3 Comparisons between the parent-training programs on the dimensions of implementation integrity

Finally, we computed the correlations between attendance to program sessions and family structure (1 = married and cohabiting 0 = single parent), and between attendance to program sessions and child age. The correlations were r = .015, n.s., and r = − .009, n.s., respectively, suggesting that attendance to program was not associated to family structure and child age.

Is Implementation Integrity Associated with Changes in Parenting and Child Problem Behaviors?

We examined the associations between the components of implementation integrity and changes in the parent and child outcomes in multilevel models. Implementation quality was entered as a group-level variable (Level 2), while parents’ involvement and attendance were entered at individual level (Level 1).

Effects of Implementation Integrity on Parent Outcomes

Implementation quality at group level and attendance at individual level were not significantly related to changes in parenting behaviors or parents’ sense of competence (Table 4). By contrast, parents’ involvement was significantly related to positive changes in parenting. Specifically, parents who completed their homework decreased most in angry outbursts (B = − .04, p < .05), and increased in their use of praise (B = .14, p < .01) and reward (B = .18, p < .01), and in sense of parenting competence (Β = .08, p < .05). Finally, parents’ satisfaction with the program significantly predicted decreases in harsh parenting (B = –.07, p < .01), and increases in sense of parenting competence (Β = .16, p < .01). In sum, parents’ involvement, i.e. satisfaction and homework completion, affected rates of change in parent behaviors and competence due to participation, whereas group-level implementation quality did not.

Table 4 Predicting changes in parental outcomes from implementation quality, dose, parental responsiveness (completion of homework and satisfaction): multilevel models

Effects of Implementation Integrity on Child Outcomes

Implementation quality was not associated with changes in child problem behaviors and ADHD symptoms (see Table 5). But, the more parents were satisfied with their program, the more they reported reductions in their children’s ECBI intensity (Β = − .15, p < .001) and problem (Β = − .04, p < .01) scores, inattention (Β = − .10, p < .001), and ODD symptoms (Β = − .09, p < .05). However, neither attendance nor homework completion predicted changes in child problem behaviors or ADHD symptoms. In sum, parents’ satisfaction with their program predicted changes in child outcomes, whereas dose, homework completion, and group-level implementation quality did not predict program outcomes.

Table 5 Predicting changes in child outcomes from implementation quality, dose, parental responsiveness (completion of homework and satisfaction). Multilevel models

Which are the Factors Associated with the Components of Implementation Integrity?

We examined predictors of the dimensions of implementation integrity. Specifically, we investigated whether leaders’ characteristics (age, gender, education, specialization), and parents’ perception of the leaders (leaders with good group management skills, supportive leaders, leaders that understand their problems) were predictors of dose, homework completion, and satisfaction with the program (Table 6). To account for differences across the programs, we entered dummy-coded variables into the models as controls. Thus, the unique effect of each predictor variable refers to its impact beyond differences due to the programs.

Table 6 Predictors of dose, homework completion, and satisfaction with the program

After controlling for differences across the programs, implementation quality seemed to be higher when team leaders had specialized training relevant to prevention (β = .30, p < .05), and lower when leaders were older (β = − .31, p < .001) and when the leadership pair comprised two women (β = − .12, p < .05) rather than being of mixed gender. Also, dose was related to parents’ perceptions of their leaders. Specifically, parents perceiving leaders as understanding of their problems (β = .15, p < .01) attended more. Homework completion was positively predicted by parents perceiving their group leaders as understanding their problems (β = .13, p < .01). Finally, parents’ program satisfaction was predicted by having supportive team leaders (β = .30, p < .001), and leaders with good group management skills (β = .34, p < .001). In sum, parents’ perceptions of leaders as competent, supportive, and sensitive to their problems seem to be promotive of implementation integrity.

Discussion

The aim of this study was to examine whether parents in parenting programs benefited more when their programs were well implemented. Specifically, we investigated the effects of all aspects of implementation integrity, namely implementation quality (adherence and quality of delivery), participant involvement (homework and satisfaction), and dose (attendance). In general, we did not find a significant effect of group-level implementation quality, neither an effect of dose, i.e. attendance, but we found an effect of parents’ involvement, i.e. satisfaction with the program and homework completion. Independent of the number of sessions they attended, the more participants were satisfied and practiced what they learned during the sessions, the more they positively changed their way of parenting. Thus, our study suggests that the key component for the success of a program is parents’ involvement during the sessions rather than simple attendance.

While the lack of effects of dose has been confirmed in other studies (e.g. Dane and Schneider 1998), the lack of effect of implementation quality is quite unexpected. However, this finding should be interpreted with caution. Group-level implementation quality was rated very highly across all the programs, and there was low variability in these ratings, which may have resulted in a non-significant effect of group-level implementation quality on how much parents and children changed. Therefore, we are cautious about stating that implementation quality does not matter. Further studies are needed to test the role of group-level implementation quality on program outcomes using data with greater variability.

Contrary to the findings related to implementation quality, it emerges clearly that participants’ involvement, which consists of homework completion and satisfaction with the program, might influence how much parents and children benefit. Independent of the quality of implementation, the parents who actively committed to and were satisfied with their program displayed more changes, such as increased feelings of being competent in parenting, improved parenting strategies, and decreased child problem behaviors. It does not come as a surprise that active participation is associated with the benefits of participating in a parenting program. Scholars have widely demonstrated, through reviews and meta-analyses, that interactive delivery methods are one of the principle elements in effective prevention (Nation et al. 2003; Tobler et al. 2000). The underlying assumption is that interactive methods are effective because they favor active participation. However, the assumption that active participation in a parenting program is related to higher program effectiveness has rarely, if ever, been tested. This study contributes to the literature by demonstrating empirically that parents need to be actively involved, for example through keeping practicing at home the techniques they learned during the program, if they want to obtain the maximum benefit.

How do parents become actively involved in a program? In our study, it emerged that they were more likely to be satisfied, do their homework, and attend when they perceived their group leaders as supportive and understanding. These qualities are among the requirements for a group leader to build up a “therapeutic alliance” with parents (for a review, see Ackerman and Hilsenroth 2003). In both individual and family-therapy settings, therapeutic alliance is an important predictor of both attendance (e.g. Orrell-Valente et al. 1999) and improvements in clients (e.g. Hogue et al. 2006). Our results are in line with this, but it is not clear why and how in a group of parents, some develop these views on their group leaders while others do not. In the current study, socio-cultural characteristics of the group leaders, such as sex and experience, did not seem to influence the association, as has been found in some other studies (e.g., Orrell-Valente et al. 1999). Future studies should investigate further the reasons why some parents perceive their leaders as supportive and others do not.

We also found some predictors of implementation quality. Our study suggests that mixed gender pairs of team leaders, and those with a specialization, such as therapist training, implement programs better than female pairs, and team leaders without a specialization. Leaders’ age was negatively associated with quality of implementation. This result is in contrast with a recent study showing that older leaders are more likely to understand reasons for not changing the content of a program than younger and inexperienced leaders (Hill et al. 2007). However, in this study, only attitudes toward implementation were assessed. In our trial, in the majority of cases, group leaders delivered their programs in the manner to which they were accustomed. Consequently, older group leaders might have received training several years ago, by contrast with younger group leaders. For that reason, younger leaders might have been more sensitive to the importance of delivery of the programs without deviations from the program manual than the older leaders. However, this hypothesis cannot be confirmed in our study, and should be tested in future studies.

This study has some limitations. The first is related to the timing of assessments. Participants’ involvement was assessed by parent reports at post-test. Parents were also asked to report on their and their children’s behaviors. It is possible that perceptions of changes from pre- and post-test would have affected their satisfaction with, attendance of, and commitment to the program, rather than the opposite. In other words, it is equally possible that the parents of children who benefited were more likely to be satisfied with, keep attending, and be actively involved in the programs. It is not possible to test the direction of effects in the current study. An ideal design to assess directionality would encompass measurements about parents and leaders following each program session. Nevertheless, such a measurement-intensive design would be difficult to implement in an effectiveness trial. Future studies may overcome this difficulty by using automated feedback technologies, with smart phones or tablet computers.

Another limitation is that we investigated only a limited set of the factors that might explain implementation integrity. Durlak and DuPre (2008) point out that implementation might be affected by many factors at both macro level (community factors, organizational factors) and micro level (characteristics of the innovation and the providers). We focused solely on micro-level factors, i.e., provider characteristics. Moreover, some of the measures, i.e. parents’ satisfaction and homework completion, were single items and, as such, not ideal for the measurement of complex constructs, such as parents’ involvement. Finally, because the participation was on a voluntary base and there were some organizational problems with the implementation of one of the programs (i.e. Incredible Years), generalizability of the results to all the parents (e.g. high and low-income families) cannot be guaranteed. However, we limited this problem by offering the programs to all the parents and contacting directly some of them, as previously described. Future studies should use better measures for micro-levels factors and account for the influences of macro-level factors on the different aspects of implementation integrity.

As well as limitations, this study has some strengths. It represents one of the first attempts to assess the impact of each component of program integrity simultaneously, which allowed us to understand the relative impact of each component. Moreover, the components were examined across different types of parenting programs, which makes our results likely to apply to parenting programs in general. Finally, it is one of the few studies that have assessed the role of implementation fidelity within an effectiveness trial, which allows us to draw conclusions that are applicable in real-life settings.

To conclude, our study highlights the importance of the active participation of parents in maximizing the positive effects of parenting programs. Group leaders with good training and empathic skills may be the key to promoting parents’ involvement.