The KiVa antibullying curriculum and outcome: Does fidelity matter?

https://doi.org/10.1016/j.jsp.2014.07.001Get rights and content

Abstract

Research on school-based prevention suggests that the success of prevention programs depends on whether they are implemented as intended. In antibullying program evaluations, however, limited attention has been paid to implementation fidelity. The present study fills in this gap by examining the link between the implementation of the KiVa antibullying program and outcome. With a large sample of 7413 students (7–12 years) from 417 classrooms within 76 elementary schools, we tested whether the degree of implementation of the student lessons in the KiVa curriculum was related to the effectiveness of the program in reducing bullying problems in classrooms. Results of multilevel structural equation modeling revealed that after nine months of implementation, lesson adherence as well as lesson preparation time (but not duration of lessons) were associated with reductions in victimization at the classroom level. No statistically significant effects, however, were found for classroom-level bullying. The different outcomes for victimization and bullying as well as the importance of documenting program fidelity are discussed.

Introduction

In order to achieve desired aims, school-based prevention and intervention programs should be implemented as intended. In fact, a growing body of empirical evidence suggests that the closer the implementation of an intervention adheres to its original design, the more likely the desired outcomes take place (Durlak and DuPre, 2008, Dusenbury et al., 2003, Weare and Nind, 2011, Wilson and Lipsey, 2007). However, about half of prevention studies do not monitor implementation in any way (for a meta-analysis, see Durlak, Weissberg, Dymnicki, Taylor, & Schellinger, 2011). With respect to antibullying program evaluations in particular, it has been pointed out repeatedly that more attention needs to be paid to implementation fidelity (Ryan and Smith, 2009, Smith et al., 2004, Ttofi and Farrington, 2010, Vreeman and Carroll, 2007).

Collecting implementation data is important for several reasons. The overall degree of program delivery informs program developers of whether the program is feasible and thus likely to be implemented with fidelity in the future. Second, when implementation is monitored, support can be adjusted for schools that are likely to fall short of implementation. Finally, a significant association between the level of implementation and outcome (e.g., reduction of victimization) provides further support for the effects obtained being actually caused by the program rather than by other factors such as developmental changes or selection effects (i.e., initial differences between intervention and control schools).

The KiVa antibullying program (Kärnä, Voeten, Little, Poskiparta, Kaljonen, et al., 2011) has been identified as one of the most promising whole-school programs to prevent bullying (Ttofi & Farrington, 2010). Evidence of the effectiveness of KiVa has been acquired both in a randomized controlled trial (Kärnä, Voeten, Little, Poskiparta, Kaljonen, et al., 2011, Kärnä et al., 2013) and during broad roll-out of the program in Finnish schools (Kärnä, Voeten, Little, Poskiparta, Alanen, et al., 2011). In both studies, KiVa was found to reduce bullying and victimization significantly already after 9 months (one school year) of implementation, especially in elementary school level (Grades 1–6). However, we know relatively little about teachers’ fidelity to the program across classrooms. Therefore, the purpose of the present study is (a) to test whether stronger effects of the KiVa program can be obtained with a higher degree of implementation of the curriculum and (b) to provide insight into the preconditions of success of evidence-based antibullying programs more generally—for instance, which aspects of implementation are most strongly associated with positive outcomes.

Implementation fidelity refers to the degree to which individuals delivering a prevention program implement it as intended by program developers (Dane and Schneider, 1998, Durlak and DuPre, 2008). It can be conceptualized and operationalized in terms of quantity (how much was done) or quality (how well it was done) of program implementation.

In many studies, the analysis of implementation data focused on the quantity of the program content delivered to targeted individuals (e.g., children) or groups (e.g., classrooms). This aspect of implementation has been interchangeably referred to as lesson adherence, or dosage, and it has been operationalized, for instance, as the number of sessions completed, percentage of tasks covered, or time spent on program delivery (Cross et al., 2004, Dane and Schneider, 1998, Durlak and DuPre, 2008, Dusenbury et al., 2003, Elliott and Mihalic, 2004, Ennett et al., 2011, Jones et al., 2011).

Evidently, there is natural variation in the amount of program content covered in school-based prevention programs (Dusenbury et al., 2005, Lillehoj et al., 2004), and a level of 100% adherence is rarely reached (Dane and Schneider, 1998, Durlak and DuPre, 2008, Dusenbury et al., 2003) regardless of assessment methods, targeted students, or types of programs. This phenomenon has been referred to as adaptation (Durlak & DuPre, 2008). For instance, teachers may intentionally eliminate some program aspects if they find them too challenging (Fagan and Mihalic, 2003, Hahn et al., 2002) or if students are not interested or responsive (Martens, van Assema, Paulussen, Schaalma, & Brug, 2006). Some degree of program adaptation (e.g., choosing between program elements) is usually taken for granted, and successful outcomes have been reported at levels such as the delivery of 60% of program content (Fagan and Mihalic, 2003, Ferrer-Wreder et al., 2010, Hahn et al., 2002).

Findings from recent reviews have demonstrated that a higher degree of program elements delivered (Durlak and DuPre, 2008, Wilson and Lipsey, 2007) has a positive association with the outcomes obtained (Durlak et al., 2011). However, non-significant dose-outcome associations have also been reported: the explanations provided for such findings include minimal variability in the degree of implementation (Domitrovich et al., 2010, Lillehoj et al., 2004, Low et al., 2014, Resnicow et al., 1998), teachers' lack of prior experience with prevention programs (Lillehoj et al., 2004), or the use of implementation ratings provided by teachers rather than trained observers (Dane and Schneider, 1998, Lillehoj et al., 2004).

Besides adherence, or the dosage of program content covered, another aspect of fidelity is the quality of implementation. Observational measures are often used to assess implementation quality, such as the interaction between students and the teacher delivering the program (Hahn et al., 2002, Hirschstein et al., 2007, Melde et al., 2006, Mihalic et al., 2008, Resnicow et al., 1998), the degree of program content taught clearly and correctly (Fagan and Mihalic, 2003, Kam et al., 2003, Lillehoj et al., 2004, Melde et al., 2006, Spoth et al., 2002), or the extent to which teachers encouraged and coached students to apply intervention concepts beyond formal lessons (Domitrovich et al., 2010, Tortu and Botvin, 1989).

The quality of implementation may be equally or even more important than the quantity. However, the use of quality as a measure of implementation is rare. In the review by Durlak and DuPre (2008), only 6 out of 59 studies paid attention to the quality of delivery whereas quantity (dosage) was assessed in 29 studies. This lack of attention to quality might be due to the fact that collecting observational data is time-consuming and expensive, and yet the studies utilizing observations have produced mixed findings—with some reporting a positive association between the quality of delivery and student outcomes (Domitrovich et al., 2010, Kam et al., 2003, Resnicow et al., 1998) and some finding the opposite (Hirschstein et al., 2007).

Besides observational data, teachers' self-reports of their preparedness and knowledge of intervention model can be used as an indicator of the quality in which a program was delivered (Dane & Schneider, 1998). Better prepared teachers, who are more comfortable with a program's methods and who more strongly support its purpose, are more likely to implement the program in a competent manner (Rohrbach, Graham, & Hansen, 1993). For instance, Kallestad and Olweus (2003) found that reading the program manual was one of the strongest predictors of antibullying lesson implementation.

School bullying has been defined as repeated aggression towards a relatively powerless peer (Olweus, 1993, Smith and Sharp, 1994). A substantial number of antibullying programs has been evaluated (Merrell et al., 2008, Ryan and Smith, 2009, Smith et al., 2004, Ttofi and Farrington, 2010, Vreeman and Carroll, 2007). The findings have been somewhat inconsistent, some showing small to moderate positive effects of the programs, and some unexpectedly negative effects indicating increases rather than decreases in bullying problems. In a review of Vreeman and Carroll (2007), only 4 out of 10 curriculum studies showed decreases in bullying problems. Although a recent meta-analysis based on 44 studies (Ttofi & Farrington, 2010) provided more encouraging findings, with average reductions of 17–23% for bullying others and 17–20% for being bullied, even these effects are relatively modest in size. One reason might be less-than-optimal implementation of the programs. It can be assumed that a higher degree of program delivery generates larger effects in antibullying trials, as evaluations of other types of school-based prevention programs have indicated (Dane and Schneider, 1998, Durlak and DuPre, 2008, Kam et al., 2003, Lillehoj et al., 2004, Rohrbach et al., 1993).

Implementation fidelity of antibullying programs has rarely been documented and hardly ever related to program outcomes. In general, the content of such programs has varied from standardized curricula to rather flexible guidelines. Concrete manuals describing program content and learning objectives promote systematic implementation (Dane and Schneider, 1998, Ryan and Smith, 2009), but they also provide a good starting point for the measurement of the implementation fidelity (Brown et al., 2011, Cross et al., 2004, Hirschstein et al., 2007, Kallestad and Olweus, 2003). Cross et al. (2004), for instance, found that during the Friendly Schools intervention, the average of classroom activities implemented was 67%. Brown et al. (2011) examined the implementation of the Steps to Respect program, reporting an exceptionally high level of classroom implementation; on average, more than 90% of activities were covered.

When the intervention content is loosely defined, the evaluation of the degree of implementation (at least in quantitative terms) is difficult. In some trials, schools have been encouraged to develop their own written antibullying policies, choose from among alternative intervention activities the ones they wish to implement, or engage in both strategies (Eslea and Smith, 1998, Fekkes et al., 2006). In such cases, the average degree of implementation delivery across schools (or classrooms) cannot be defined, and more importantly, between-school or between-classroom differences in implementation can hardly be related to the outcomes obtained. The identification of more vs. less effective activities, on the other hand, would demand a large sample size and preferably random assignment of schools to different conditions implementing different activities.

In some studies, the documentation of implementation has been based on interviews with a single respondent per school (Eslea and Smith, 1998, Stevens et al., 2001), or implementation data provided by several teachers has been aggregated at the school level (Salmivalli, Kaukiainen, & Voeten, 2005). It is thus assumed that each classroom in a school was targeted with the same degree of adherence to intervention content. However, the same content coverage is not likely, especially if the intervention includes activities delivered by teachers in classrooms.

Another threat to the reliability of implementation data is the assessment of implementation by teacher reports collected only after the intervention has ended (Fekkes et al., 2006, Kallestad and Olweus, 2003). Teachers' recall of their implementation actions during the intervention phase lasting for several months may not be accurate. In order to get comprehensive information regarding implementation and to increase the validity of self-reports, ratings should be collected at several time points during the delivery of intervention.

Keeping in mind the above limitations in measuring implementation it may not be surprising that only few studies reported findings regarding the association between the implementation of antibullying programs and the outcomes obtained. A study by Salmivalli et al. (2005) found a positive association between the overall degree of implementation and reductions in bullying and victimization at the school level. Kallestad and Olweus (2003) reported positive effects of some program aspects but not others. Still other studies have revealed very complicated patterns of findings leaving much room for interpretation. For instance, Hirschstein et al. (2007) assessed two aspects of classroom implementation of Steps to Respect bullying prevention program: lesson adherence and the quality of instruction. Although greater lesson adherence was related to better teacher-reported interpersonal skills among students, it was unrelated to observed or student-reported outcomes, including bullying and victimization. Unexpectedly, the quality of instruction (e.g., teachers' emotional tone and classroom management as rated by observers) was associated with increasing levels of student-reported victimization and difficulties in responding assertively to bullying. It should be noted that two implementation measures that were not directly related to the implementation of classroom curriculum (supporting skill generalization and coaching of students involved in bullying) were related to positive observed outcomes—but only for older students in the sample. Another study (Low et al., 2014) examining the implementation of the same program found no effects of lesson adherence on student outcomes either. However, students' engagement (as perceived by teachers) averaged across several lessons predicted classroom-level reductions in student-reported victimization but not in bullying perpetration.

In summary, studies examining the association between the implementation of antibullying programs and outcomes obtained are few, and they have produced mixed findings. There is clearly a need for more studies assessing several aspects of implementation fidelity at multiple time points in large enough samples. Also, modeling the effects of implementation at the level in which the intervention takes place (e.g., classroom) has not been adequately employed.

KiVa is a research-based antibullying program that has been developed in the University of Turku, Finland, with funding from the Ministry of Education and Culture. The KiVa program involves both universal actions targeted at all students, and indicated actions targeted at students who have been identified as targets or perpetrators of bullying (Salmivalli, Poskiparta, Ahtola, & Haataja, 2013). The name of the program, “KiVa” is a Finnish word meaning kind or nice, but it is also an acronym for “Kiusaamista Vastaan” (against bullying).

Although KiVa shares several features of other existing antibullying programs, such as student lessons delivered in classrooms, it has some unique aspects differentiating it from other programs. Most importantly, KiVa is based on the participant role approach to bullying (Salmivalli, Lagerspetz, Björkqvist, Österman, & Kaukiainen, 1996), and thus it focuses on influencing the responses of peer bystanders witnessing bullying. It is assumed that once bystanders do not provide social rewards (such as laughing or cheering) to perpetrators, but instead support and defend victimized peers and express disapproval of bullying, an important motivation to bully others is lost. It should be noted that once the victims perceive the social environment as more supportive, their experience of the situation might be altered even if the bullies do not immediately stop their mean behaviors (Sainio et al., 2011, Salmivalli, Poskiparta, Ahtola and Haataja, 2013). Thus, an important aim of the KiVa curriculum delivered in classrooms is to change the bullying-related norms and consequently reduce both bullying perpetration and experienced victimization.

As for the classroom curricula, at the elementary school level there are two different developmentally appropriate sets of student lessons (10 student lessons × 90 min each)—one for Grades 1–3 and the other for Grades 4–6. The main aims of the student lessons are to raise awareness of the role bystanders play in the bullying process, to increase empathic understanding of the victim's plight, and to provide students with safe strategies to support and defend their victimized peers. The topics of the lessons proceed from more general ones, such as the importance of respect in relationships, group communication, and group pressure, to bullying and its mechanisms and consequences. Each lesson includes various types of activities, such as teacher-led discussion, small group discussion, learning-by-doing exercises, adopting class rules, as well as individual tasks such as playing an antibullying computer game. Classroom teachers deliver the lessons during the school year (in Finland, from August to May, i.e., one lesson per month) according to the guidelines provided in the teacher's manuals.

The effects of KiVa antibullying program have been evaluated in several studies based on a randomized controlled trial that took place from 2007 to 2009 (Kärnä, Voeten, Little, Poskiparta, Kaljonen, et al., 2011, Kärnä et al., 2013) and nationwide roll-out of the program (Kärnä, Voeten, Little, Poskiparta, Alanen, et al., 2011). There is some initial evidence of the degree of implementation of KiVa (at the school level) being associated with program outcomes as reported by Kärnä, Voeten, Little, Poskiparta, Alanen, et al. (2011). Despite the promising findings, there is so far a lack of evidence about the effects obtained at varying levels of implementation and the relative importance of different implementation aspects for outcomes (i.e., classroom level reductions in bullying and victimization).

Although authors of prevention studies have argued that the assessment of fidelity is an essential feature of program evaluations (Durlak and DuPre, 2008, Durlak et al., 2011, Dusenbury et al., 2003, Ennett et al., 2011, Gingiss et al., 2006), it has been uncommon in antibullying program trials (for meta-analyses, see Ryan and Smith, 2009, Ttofi and Farrington, 2010). When data on implementation has been gathered, it has been assessed at a very general level (e.g., Eslea and Smith, 1998, Fekkes et al., 2006, Kallestad and Olweus, 2003, Salmivalli et al., 2005), in small samples (Eslea and Smith, 1998, Stevens et al., 2001), and typically reported as descriptive information rather than associated with the outcomes obtained.

Although the KiVa program has been found to reduce bullying and victimization (Kärnä, Voeten, Little, Poskiparta, Alanen, et al., 2011, Kärnä, Voeten, Little, Poskiparta, Kaljonen, et al., 2011, Kärnä et al., 2013), much remains to be known about program implementation and whether it explains variation in program outcomes. Therefore, we focused on the extent to which teachers delivered the classroom curriculum (quantity of the efforts) and how well they did it (quality of the efforts) throughout a school year. Thus, the current study extends the effectiveness studies of the KiVa program by examining the implementation fidelity of the program curriculum and relating it to reductions in bullying perpetration as well as victimization. From a broader perspective, it is one of the first studies in the field (a) assessing several aspects of lesson implementation, (b) utilizing longitudinal multilevel modeling of hierarchical data, and (c) linking implementation aspects with outcomes at the classroom level.

Implementation fidelity was systemically measured by collecting monthly teacher reports on the quantity as well as the quality of implementation of the KiVa program during the nine months of the trial. The measures of lesson adherence and duration of lessons were chosen to reflect the quantity of implementation; they represent the core features of the lesson plans provided in the structured teacher manuals of KiVa. Both measures have also been used in the previous research in the field (Dane and Schneider, 1998, Durlak and DuPre, 2008). The third measure, lesson preparation, was used as an indicator of the quality of implementation. All fidelity aspects were measured on continuous scales, providing more statistical power to detect effects on the outcome variables.

Besides exploring the overall degree of implementation of the KiVa curriculum, we tested the hypotheses that classroom-level reductions of victimization would be larger in classrooms where teachers reported more lesson adherence (Hypothesis 1a), longer duration of lessons (Hypothesis 1b), and more lesson preparation (Hypothesis 1c). Similarly, we hypothesized larger classroom-level reductions of bullying in classrooms with higher lesson adherence (Hypothesis 2a), longer lesson duration (Hypothesis 2b), and more lesson preparation (Hypothesis 2c). We had no hypotheses regarding more or less effective aspects of implementation. However, based on previous findings (Merrell et al., 2008, Vreeman and Carroll, 2007) we expected that effects on victimization might be larger than effects on bullying perpetration.

Section snippets

Participants

The sample comprises the intervention schools that took part in the first evaluation studies of the KiVa antibullying program (Kärnä, Voeten, Little, Poskiparta, Kaljonen, et al., 2011, Kärnä et al., 2013). In the present study, we included only the elementary schools (Grades 1 thru 6) that were in the intervention condition. In total 77 elementary schools took part, 39 intervention schools from the first phase of evaluation in 2007–2008, and 38 intervention schools from the second phase in

Descriptive statistics

Means and standard deviations for and correlations between student-level variables are shown in Table 1. Moderate stability from pretest to posttest (across 12 months) was observed for victimization (r = .52) and bullying (r = .44). Victimization and bullying were positively correlated within measurement occasions (r = .54 and .50) for pretest and posttest as well as across occasions (r = .24 and .34). Boys were victimized more and bullied other students more at both time points. Older students were

Discussion

Scholars in the field of prevention have clearly demonstrated that programs often are not implemented with the same standard (e.g., Dane and Schneider, 1998, Durlak and DuPre, 2008). In school-based effectiveness studies, an important question is to what extent teachers, who are often responsible for program delivery, have actually put the program in practice—with good or poor fidelity. Prior to the present study, very little research on bullying interventions focused on this issue. More

References (93)

  • P.D. Bliese

    Within-group agreement, non-independence, and reliability: Implications for data aggregation and analyses

  • C.P. Bradshaw et al.

    Bullying and peer victimization at school: Perceptual differences between students and school staff

    School Psychology Review

    (2007)
  • E.C. Brown et al.

    Outcomes from a school-randomized controlled trial of Steps to Respect: A bullying prevention program

    School Psychology Review

    (2011)
  • S. Caravita et al.

    Unique and interactive effects of empathy and social status on involvement in bullying

    Social Development

    (2009)
  • N.A. Card et al.

    Direct and indirect aggression during childhood and adolescence: A meta-analytic review of gender differences, intercorrelations, and relations to maladjustment

    Child Development

    (2008)
  • G.W. Cheung et al.

    Evaluating goodness-of-fit indexes for testing measurement invariance

    Structural Equation Modeling

    (2002)
  • W.M. Craig et al.

    Prospective teachers' attitudes toward bullying and victimization

    School Psychology International

    (2000)
  • D. Cross et al.

    Australia: The Friendly Schools project

  • E.A. Davis et al.

    Designing educative curriculum materials to promote teacher learning

    Educational Researcher

    (2005)
  • R. Dunn et al.

    No light at the end of tunnel vision: Steps for improving lesson plans

    Clearing House

    (2010)
  • J.A. Durlak et al.

    Implementation matters: A review of research on the influence of implementation on program outcomes and the factors affecting implementation

    American Journal of Community Psychology

    (2008)
  • J.A. Durlak et al.

    The impact of enhancing students' social and emotional learning: A meta-analysis of school-based universal interventions

    Child Development

    (2011)
  • L. Dusenbury et al.

    A review of research on fidelity of implementation: Implications for drug abuse prevention in school settings

    Health Education Research

    (2003)
  • L. Dusenbury et al.

    Quality of implementation: Developing measures crucial to understanding the diffusion of preventive interventions

    Health Education Research

    (2005)
  • D. Elliott et al.

    Issues in disseminating and replicating effective prevention programs

    Prevention Science

    (2004)
  • C.K. Enders

    Applied missing data analysis

    (2010)
  • S.T. Ennett et al.

    Evidence-based practice in school substance use prevention: Fidelity of implementation under real-world conditions

    Health Education Research

    (2011)
  • M. Eslea et al.

    The long-term effectiveness of anti-bullying work in primary schools

    Educational Research

    (1998)
  • A.A. Fagan et al.

    Strategies for enhancing the adoption of school-based prevention programs: Lessons learned from the Blueprints for violence prevention replications of the Life Skills Training program

    Journal of Community Psychology

    (2003)
  • M. Fekkes et al.

    Effects of antibullying school program on bullying and health complaints

    Archives of Pediatrics and Adolescent Medicine

    (2006)
  • L. Ferrer-Wreder et al.

    Is more better? Outcome and dose of a universal drug prevention effectiveness trial

    The Journal of Primary Prevention

    (2010)
  • A.M. Gadermann et al.

    Estimating ordinal reliability for Likert-type and ordinal item response data: A conceptual, empirical, and practical guide

    Practical Assessment, Research and Evaluation

    (2012)
  • P. Gingiss et al.

    Bridge-It: A system for predicting implementation fidelity for school-based tobacco prevention programs

    Prevention Science

    (2006)
  • S. Goenka et al.

    Process evaluation of a tobacco prevention program in Indian schools-methods, results and lessons learnt

    Health Education Research

    (2010)
  • J.W. Graham

    Missing data analysis: Making it work in the real world

    Annual Review of Psychology

    (2009)
  • E.J. Hahn et al.

    Efficacy of training and fidelity of implementation of the Life Skills Training program

    Journal of School Health

    (2002)
  • M. Hirschstein et al.

    Walking the talk in bullying prevention: Teacher implementation variables related to initial impact of the Steps to Respect program

    School Psychology Review

    (2007)
  • E.V.E. Hodges et al.

    Individual risk and social risk as interacting determinants of victimization in the peer group

    Developmental Psychology

    (1997)
  • L. Hu et al.

    Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives

    Structural Equation Modeling—A Multidisciplinary Journal

    (1999)
  • S.M. Jones et al.

    Two‐year impacts of a universal school‐based social‐emotional and literacy intervention: An experiment in translational developmental research

    Child Development

    (2011)
  • J.H. Kallestad et al.

    Predicting teachers' and schools' implementation of the Olweus bullying intervention program: A multilevel study

    Prevention and Treatment

    (2003)
  • C.-M. Kam et al.

    Examining the role of implementation quality in school-based prevention using the PATHS curriculum. Promoting Alternative Thinking Skills Curriculum

    Prevention Science

    (2003)
  • A. Kärnä et al.

    Going to Scale: A nonrandomized nationwide trial of the KiVa antibullying program for grades 1–9

    Journal of Consulting and Clinical Psychology

    (2011)
  • A. Kärnä et al.

    Effectiveness of the KiVa anti-bullying program: Grades 1–3 and 7–9

    Journal of Educational Psychology

    (2013)
  • A. Kärnä et al.

    A large-scale evaluation of the KiVa antibullying program: Grades 4–6

    Child Development

    (2011)
  • L. Kyriakides et al.

    An analysis of the revised Olweus Bully/Victim Questionnaire using the Rasch measurement model

    British Journal of Educational Psychology

    (2006)
  • Cited by (51)

    View all citing articles on Scopus

    The development of the KiVa program was financed by the Finnish Ministry of Education. The current study is supported by Academy of Finland grant 134843 to the last author. The authors thank the whole KiVa research team for support. Portions of this study were presented at the biennial Meeting of the Society for Research on Adolescence, Vancouver, March, 2012.

    View full text