Skip to main content

Advertisement

Log in

On the Importance of Treatment Effect Heterogeneity in Experimentally-Evaluated Criminal Justice Interventions

  • Original Paper
  • Published:
Journal of Quantitative Criminology Aims and scope Submit manuscript

Abstract

Objectives

This paper aims to suggest a framework to think of a more practical way to consider the broader impact of a program intervention beyond just its average, by considering the concept of treatment effect heterogeneity—how the same intervention may produce differential effects for different subgroups of individuals.

Methods

Using an application of data on an experimental intervention from the Johns Hopkins Prevention Intervention Research Center, the current study demonstrates the contribution of more general growth mixture modeling approaches, such as Group-Based Trajectory Model (Nagin in Group-based modeling of development. Harvard University Press, Cambridge, 2005) and growth mixture modeling (Muthén in New developments and techniques in structural equation modeling. Lawrence Erlbaum Associates, Mahwah, pp 1–33, 2001) for assessing meaningful heterogeneous effects of a treatment across clusters or classes of individuals following distinct patterns of development over time.

Results

The findings demonstrate how population-averaged treatment effects might underestimate substantively meaningful localized effects among more theoretically and policy relevant subgroups of individuals such as those with non-normative growth (high–low) and those with more room for improvement (low–low) in the development of self-control.

Conclusions

We are calling for the assessment of a program in terms of both average and localized effects because we might wrongfully conclude that a given program is not effective when it in fact has a great impact, but only on the segments of population who need it the most.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. To wit, Sherman (2009: 7) notes that such high standards for policy evaluation “could help make a world in which governments can refuse to waste money on ineffective criminal sanctions despite populist pressures; a world in which citizens can demand that government must test policies with well-controlled experiments before spending vast sums in the name of crime prevention.”

  2. While the estimation of average treatment effects has been given great prominence recently in criminology in large part due to the “evidence-based” movement (Sherman et al. 1998), even medical scholars, from whom criminologists expropriated the term, have become increasingly critical of the average treatment effect in favor of a formal recognition of the critical importance of heterogeneous treatment effects (Kravitz et al. 2004).

  3. For a detailed overview of the RCM, see Holland (1986). For a more detailed discussion of the potential outcomes framework and how various quantities relate to criminological examples, see Loughran and Mulvey (2010).

  4. In fact, perceptual deterrence studies in criminology have long recognized that there is a substantial heterogeneity in the way sanction threats influence behaviors and that sanction threats are consequential only for a subgroup of the general population (Parker and Grasmick 1979; Tittle and Logan 1973; Zimring and Hawkins 1968, 1971, 1973). Given that the deterrent effect of “formal sanction can be effective only if reinforced by informal sanctions” (Tittle and Logan 1973: 386), the effects of formal sanctions are hypothesized to be augmented for those with a relatively high “stake in conformity” (Toby 1957; Briar and Piliavin 1965). In this vein, Pogarsky (2002) hypothesized that individuals can be classified into three offending categories—acute conformist, incorrigibles, and deterrables—and such subgroup heterogeneity affects the strength of the effect of certainty and severity of punishment.

  5. The biggest challenge in studying treatment effect heterogeneity is how we identify and define subpopulations, especially when there are no strong substantive justifications to specify relevant subgroups of interest in advance. Conventional subgroup analysis is followed by formal tests of interaction between treatment and covariates used in defining subgroups of interest. While it is strongly recommended that these covariates be pre-specified based on explicit rationales, the utility of such subgroup specific analysis is often limited because there are too many known (and even unknown) characteristics that can potentially moderate the effect of treatment and these multiple covariates often simultaneously condition the relation between treatment and outcome. Although determined empirically and thus subject to change under different model specifications, we believe distinct developmental trajectories as a function of multiple covariates are useful points of interest when assessing treatment effect heterogeneity. Nagin (2005) rather suggests that there are many disadvantages of using a priori rules for specifying group membership (e.g., simply assuming highly untenable population homogeneity without empirically verifying the existence of heterogeneous subpopulations, incorrectly specifying the number of subpopulations and the probability that an individual is a member of a certain subpopulation).

  6. For the purpose of illustration, these models will be estimated by distinct statistical software packages that are more commonly utilized in the field when analyzing each model (HLM and SAS proc Traj), although Mplus can estimate all of these models by allowing or restricting latent classes and variances within those classes.

  7. In fact, Haviland et al. have utilized GBTM to draw causal inferences in the context of non-experimental, observational study by first examining the developmental patterns prior to the treatment and testing how such prior growth trajectories condition the effect of treatment (Haviland and Nagin 2005). Later, they used the estimated prior developmental trajectories of outcome measures in combination with propensity scores for experiencing treatment to better achieve the balance between treatment and control groups (Haviland and Nagin 2007; Haviland et al. 2008). In doing so, they demonstrated how GBTM can also be used to identify potentially heterogeneous treatment effects that may meaningfully vary depending on the prior developmental trajectories. The salient differences in the current study are GBTM is estimated after the treatment in the context of randomized experiment. Since randomization creates an ideal counterfactual situation where we can observe normative developmental patterns in the absence of treatment, GBTM can still be used to explore (not to identify) potentially heterogeneous treatment effects by directly comparing the growth patterns between treatment and control groups.

  8. Gottfredson and Hirschi clearly deny the possibility of decreasing level of self-control over time by claiming that:

    “… we briefly reconcile the fact of stability with the idea that desocialization is rare. Combining little or no movement from high self-control to low self-control with the fact that socialization continues to occur throughout life produces the conclusion that the proportion of the population in the potential offender pool should tend to decline as cohorts age…the documented number of “late-comers to crime, or “good boys gone bad is sufficiently small to suggest that they may be accounted for in large part by misidentification or measurement error.” (Gottfredson and Hirschi 1990: 107–108, emphasis added).

    “The idea is that the child is taught “self-control” by parents or other responsible adults at an early age, and that this trait is subsequently highly resistant to extinction” (Hirschi 2004: 541).

  9. Although more complicated functional forms better capture meaningful pattern of variation, a simpler functional form can still provide an easy to understand, good approximation of the general pattern of growth trajectories of interest. Considering the primary interest of this analysis is to investigate different rate of change between individuals in order to assess whether relative rankings of self-control between individuals remain stable over time, a simplified model with only linear growth parameter at level 1 model is employed hereafter.

  10. It should be noted that the correlation between initial status and rate of change may vary depending on the specific time point selected for initial status (Raudenbush and Bryk 2002: 167).

  11. It should be noted that we cannot stratify the sample into subgroups of individuals with a similar developmental pattern according to joint-group GBTM results and then estimate subgroup specific treatment effects using HLM because the utility of such sub-group only analysis is hampered by the increased risk of ‘false positive’ results and insufficient power due to multiple comparisons. Most of all, It is a typical case of “statistical inference after model selection” (Berk et al. 2010).

  12. Nonetheless, we cannot interpret this negative sign as a ‘harmful effect’ because program participation also increased the initial level (intercept) of self-control from ‘low’ to ‘high,’ which in fact is a beneficial effect. Thus, caution is needed in interpreting the direction of the slope parameter because a complete assessment of the treatment effect requires consideration of the changes in both intercept and slope. In this study, we are focusing exclusively on the slope parameters to demonstrate how the average treatment effect might mask some substantively important heterogeneous treatment effects for different subgroups of individuals following distinct developmental trajectories.

  13. While it is possible to estimate simultaneously how the probability of group membership varies as a function of the treatment status (which was the primary focus of the GBTM analysis), the current GMM is estimated based on the assumption that individual has a certain trajectory that does not change over time to better investigate how the program produces a change in within-class trajectory (Muthén et al. 2002: 461), which is the primary goal of the current study.

  14. Traditional subgroup analyses based on a priori known characteristics of individuals suggest that the treatment has much stronger and statistically significant effect for male, black, and especially black male groups than their counterparts. It should be noted that such conventional approaches involve multiple comparisons after dividing the sample into smaller sub-samples (especially when there are many covariates that can potentially influence treatment effects), which leads to the well-recognized problems resulting from multiple comparisons (e.g., false-positive and false-negative findings).

    Group

    All

    Male

    Black

    Black and Male

    Slope parameter

    0.027

    0.045

    0.039

    0.066

    Statistical significance

    0.080

    0.051

    0.023

    0.008

     

    n = 399

    n = 213

    n = 345

    n = 181

References

  • Abadie A, Angrist JD, Imbens GW (2002) Instrumental variables estimates of the effect of subsidized training on the quantiles of trainee earnings. Econom Econom Soc 70(1):91–117

    Google Scholar 

  • Angrist JD (2004) Treatment effect heterogeneity in theory and practice. Econ J R Econ Soc 114(494):C52–C83

    Google Scholar 

  • Berk R, Brown L, Zhao L (2010) Statistical inference after model selection. J Quant Criminol 26:217–236

    Article  Google Scholar 

  • Bitler MP, Gelbach JB, Hoynes HW (2005) Welfare reform and health. J Hum Resour 40(2):309–334

    Google Scholar 

  • Bitler MP, Gelbach JB, Hoynes HW (2006) What mean impacts miss: distributional effects of welfare reform experiments. Am Econ Rev 96(4):988–1012

    Article  Google Scholar 

  • Briar S, Piliavin I (1965) Delinquency, situation, inducement, and commitment to conformity. Soc Probl 13:33–45

    Article  Google Scholar 

  • Burt C, Sweeten G, Simons R (2014) Self-control through emerging adulthood: Instability, multidimensionality, and criminological significance. Criminology 52(3):450–487

    Article  Google Scholar 

  • Byar DP (1985) Assessing apparent treatment-covariate interactions in randomized clinical trials. Stat Med 4(3):255–263

    Article  Google Scholar 

  • Chernozhukov V, Hansen C (2004) The effects of 401(K) participation on the wealth distribution: an instrumental quantile regression analysis. Rev Econ Stat 86(3):735–751

    Article  Google Scholar 

  • Clear TR (2010) Policy and evidence: the challenge to the American Society of Criminology: 2009 presidential address to the American Society of Criminology. Criminology 48(1):1–25

    Article  Google Scholar 

  • Cook P (2012) Calibrating effect size. In: 12th Annual Jerry Lee crime prevention symposium. http://gemini.gmu.edu/cebcp/JerryLeePresentations.html

  • Dehejia R (2005) Program evaluation as a decision problem. J Econom 125(1–2):141–173

    Article  Google Scholar 

  • Farrington D, Welsh B (2005) Randomized experiments in criminology: What have we learned in the last two decades? J Exp Criminol 1:9–38

    Article  Google Scholar 

  • Gottfredson MR (2006) The empirical status of control theory in criminology. In: Cullen FT, Wright JP, Blevins K (eds) Taking stock: the status of criminological theory. Transaction Publishers, New Brunswick, NJ, pp 77–100

  • Gottfredson MR, Hirschi T (1990) A general theory of crime. Stanford University Press, Stanford, CA

    Google Scholar 

  • Haviland A, Nagin DS (2005) Causal inference with group-based trajectory models. Psychometrika 70:1–22

    Article  Google Scholar 

  • Haviland A, Nagin DS (2007) Using group-based trajectory modeling in conjunction with propensity scores to improve balance. J Exp Criminol 3:65–82

    Article  Google Scholar 

  • Haviland A, Nagin DS, Rosenbaum PR, Tremblay RE (2008) Combining group-based trajectory modeling and propensity score matching for causal inferences in nonexperimental longitudinal data. Dev Psychol 44(2):422–436

    Article  Google Scholar 

  • Hay C, Forrest W (2006) The development of self-control: examining self-control theory’s stability thesis. Criminology 44:739–774

    Article  Google Scholar 

  • Heckman JJ (1992) Haavelmo and the birth of modern econometrics: a review of the history of econometric ideas by Mary Morgan. J Econ Lit 30(2):876–886

    Google Scholar 

  • Heckman JJ (2005) Haavelmo and the birth of modern econometrics: a review of the history of econometric ideas by Mary Morgan. J Econ Lit 30(2):876–886

    Google Scholar 

  • Heckman JJ, Smith JA (1995) Assessing the case for social experiments. J Econ Perspect 9(2):85–110

    Article  Google Scholar 

  • Heckman JJ, Smith JA, Clements N (1997) Making the most out of programme evaluations and social experiments: accounting for heterogeneity in programme impacts. Rev Econ Stud 64:487–535

    Article  Google Scholar 

  • Heckman JJ, Urzua S, Vytlacil E (2006) Understanding instrumental variables in models with essential heterogeneity. Rev Econ Stat 88(3):389–432

    Article  Google Scholar 

  • Hedeker D, Gibbons RD (1994) A random-effects ordinal regression model for multilevel analysis. Biometrics 50:933–944

    Article  Google Scholar 

  • Hirschi T (2004) Self-control and crime. In: Baumeister RF, Vohs KD (eds) Handbook of self-regulation: research, theory, and applications. Guilford Press, New York, pp 537–552

    Google Scholar 

  • Holland PW (1986) Statistics and causal inference. J Am Stat Assoc 81:945–960

    Article  Google Scholar 

  • Kilmer B (2008) Does parolee drug testing influence employment and education outcomes? Evidence from a randomized experiment with noncompliance. J Quant Criminol 24:93–123

    Article  Google Scholar 

  • Kravitz RL, Duan N, Braslow J (2004) Evidence-based medicine, heterogeneity of treatment effects, and the trouble with averages. Milbank Q 82:661–687

    Article  Google Scholar 

  • Kreuter F, Muthén B (2008) Analyzing criminal trajectory profiles: bridging multilevel and group-based approaches using growth mixture modeling. J Quant Criminol 24:1–31

    Article  Google Scholar 

  • Loughran TA, Mulvey EP (2010) Estimating treatment effects: matching quantification to the question. In: Piquero AR, Weisburd D (eds) Handbook of quantitative criminology. Springer, New York, pp 163–180

  • Manski CF (2007) Partial identification of counterfactual choice probabilities. Int Econ Rev 48(4):1393–1410

    Article  Google Scholar 

  • McGarrell EF, Hipple NK (2007) Family group conferencing and re-offending among first-time juvenile offenders: the Indianapolis experiment. Justice Q 24(2):221–246

    Article  Google Scholar 

  • Moffitt TE (1993) Adolescence-limited and life-course persistent antisocial behavior: a developmental taxonomy. Psychol Rev 200:674–701

    Article  Google Scholar 

  • Muthén BO (2001) Latent variable mixture modeling. In: Marcoulides GA, Schumacker RE (eds) New developments and techniques in structural equation modeling. Lawrence Erlbaum Associates, Mahwah, NJ, pp 1–33

    Google Scholar 

  • Muthén B, Muthén L (2010) Mplus (version 6.1). Los Angeles

  • Muthén B, Shedden K (1999) Finite mixture modeling with mixture outcomes using the EM algorithm. Biometrics 55:463–469

    Article  Google Scholar 

  • Muthén B, Brown H, Masyn K, Jo B, Khoo S-T, Yang C-C, Wang C-P, Kellam SG, Carlin JB, Liao J (2002) General growth mixture modeling for randomized preventive interventions. Biostatistics 3(4):459–475

    Article  Google Scholar 

  • Na C, Paternoster R (2012) Can self-control change substantially over time? Rethinking the relationship between self- and social control. Criminology 50(2):427–462

    Article  Google Scholar 

  • Nagin DS (2005) Group-based modeling of development. Harvard University Press, Cambridge, MA

    Google Scholar 

  • Nagin DS, Paternoster R (1993) Enduring individual differences and rational choice theories of crime. Law Soc Rev 27:467–496

    Article  Google Scholar 

  • Nagin DS, Piquero AR (2010) Using the group-based trajectory modeling to study crime over the life course. J Crim Justice Educ 21:105–116

    Article  Google Scholar 

  • Nagin DS, Tremblay R (2005) What has been learned from group-based trajectory modeling? Examples from physical aggression and other problem behaviors. Ann Am Acad Polit Soc Sci 602:82–117

    Article  Google Scholar 

  • Neyman J (1923) Statistical problems in agricultural experiments. J R Stat Soc 2(Supplement 2):107–180

    Google Scholar 

  • Parker J, Grasmick HG (1979) Linking actual and perceived certainty of punishment. Criminology 17:366–379

    Article  Google Scholar 

  • Pate AM, Hamilton EE (1992) Formal and informal deterrents to domestic violence: the Dade County spouse assault experiment. Am Sociol Rev 57:691–698

    Article  Google Scholar 

  • Pocock SJ, Assmann SE, Enos LE, Kasten LE (2002) Subgroup analysis, covariate adjustment and baseline comparisons in clinical trial reporting: current practice and problems. Stat Med 21:2917–2930

    Article  Google Scholar 

  • Pogarsky G (2002) Identifying “deterrable” offenders: implications for research on deterrence. Justice Q 19(3):431–451

    Article  Google Scholar 

  • Raudenbush SW (2005) How do we study “what happens next”? Ann AM Acad Polit Soc Sci 602:131–144

    Article  Google Scholar 

  • Raudenbush SW, Bryk AS (2002) Hierarchical linear models, 2nd edn. Sage Publications, Thousand Oaks

    Google Scholar 

  • Rothwell PM (2005) External validity of randomised controlled trials: “To whom do the results of this trial apply?”. Lancet 365(9453):82–93

    Article  Google Scholar 

  • Rubin DB (1974) Estimating causal effects of treatment in randomized and nonrandomized studies. J Educ Psychol 66(5):688–701

    Article  Google Scholar 

  • Rubin DB (1977) Assignment to treatment groups on the basis of a covariate. J Educ Stat 2:1–26

    Google Scholar 

  • Rubin DB (1978) Bayesian inference for causal effects: the role of randomization. Ann Stat 6(1):34–58

    Article  Google Scholar 

  • Sampson RJ, Laub JH (2005) When prediction fails: from crime-prone boys to heterogeneity in adulthood. Ann Am Acad Polit Soc Sci 602:73–81

    Article  Google Scholar 

  • Sherman LW (2007) The power few: experimental criminology and the reduction of harm. J Exp Criminol 3:299–321

    Article  Google Scholar 

  • Sherman LW (2009) Evidence and liberty: the promise of experimental criminology. Criminol Crim Justice 9(1):5–28

    Article  Google Scholar 

  • Sherman LW, Berk RA (1984) The specific deterrent effects of arrest for domestic assault. Am Sociol Rev 49:261–272

    Article  Google Scholar 

  • Sherman LW, Smith DA (1992) Crime, punishment, and stake in conformity: legal and informal control of domestic violence. Am Sociol Rev 57:680–690

    Article  Google Scholar 

  • Sherman LW, Gottfredson D, MacKenzie D, Eck J, Reuter P, Bushway S (1998) Preventing crime: what works, what doesn’t, what’s promising. https://www.ncjrs.gov/works/index.htm

  • Sherman LW, Strang H, Angel C, Woods D, Barnes GC, Bennett S, Inkpen N (2005) Effects of face-to-face restorative justice on victims of crime in four randomized, controlled trials. J Exp Criminol 1:367–395

    Article  Google Scholar 

  • Tittle CR, Logan CH (1973) Sanctions and deviance: evidence and remaining questions. Law Soc Rev 7(3):371–392

    Article  Google Scholar 

  • Toby J (1957) Social disorganization and stake in conformity: complementary factors in the predatory behavior of hoodlums. J Crim Law Criminol Police Sci 48:12–17

    Article  Google Scholar 

  • Ttofi MM, Farrington DP (2011) Effectiveness of school-based programs to reduce bullying: a systematic and meta-analytic review. J Exp Criminol 7:27–56

    Article  Google Scholar 

  • Ttofi MM, Farrington DP (2012) Bullying prevention programs: the importance of peer intervention, disciplinary methods and age variations. J Exp Criminol 8:443–462

    Article  Google Scholar 

  • Zimring F, Hawkins G (1968) Deterrence and marginal groups. J Res Crime Delinq 2:100–114

    Article  Google Scholar 

  • Zimring F, Hawkins G (1971) The legal threat as an instrument of social change. J Soc Issues 27(2):33–48

    Article  Google Scholar 

  • Zimring F, Hawkins G (1973) Deterrence: the legal threat in crime control. University of Chicago Press, Chicago

    Google Scholar 

Download references

Acknowledgments

We are grateful to the Johns Hopkins Prevention Intervention Research Center for providing the data necessary to undertake this study.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chongmin Na.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Na, C., Loughran, T.A. & Paternoster, R. On the Importance of Treatment Effect Heterogeneity in Experimentally-Evaluated Criminal Justice Interventions. J Quant Criminol 31, 289–310 (2015). https://doi.org/10.1007/s10940-014-9245-2

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10940-014-9245-2

Keywords

Navigation