Skip to main content
Log in

Propensity score analysis: promise, reality and irrational exuberance

  • Published:
Journal of Experimental Criminology Aims and scope Submit manuscript

Abstract

Objectives

The aim of this work is to examine the promise that propensity scores can yield accurate effect estimates in nonrandomized experiments, review research on the realities of the conditions needed to meet this promise, and caution against irrational exuberance about their capacity to meet this promise.

Methods

A review of selected experimental work that illustrates both the promise and realities of propensity score analysis.

Results

Propensity score analysis of nonrandomized experiments can yield the same results as randomized experiments. Those estimates depend on meeting the strong ignorability assumption that the available covariates well describe selection processes and on use of comparison groups that are from the same location with very similar focal characteristics. When those assumptions are not met, propensity scores may not yield accurate estimates.

Conclusions

The use of propensity score analysis has proliferated exponentially, especially in the last decade, but careful attention to its assumptions seems to be very rare in practice. Researchers and policymakers who rely on these extensive propensity score applications may be using evidence of largely unknown validity. All stakeholders should devote far more empirical attention to justifying that each study has met these assumptions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  • Belister, S. V., Martens, E. P., Pestman, W. R., Groenwold, R. H. H., de Boer, A., & Klungel, O. H. (2011). Measuring balance and model selection in propensity score methods. Pharmacoepidemiology and Drug Safety, 20, 1115–1129.

    Article  Google Scholar 

  • Cook, T. D., Shadish, W. R., & Wong, V. C. (2008). Three conditions under which experiments and observational studies produce comparable causal estimates: new findings from within-study comparisons. Journal of Policy Analysis and Management, 27, 724–750.

    Article  Google Scholar 

  • Feng, P., Zhou, Z.-H., Zou, Q.-M., Fan, M.-Y., & Li, X.-S. (2011). Generalized propensity score for estimating the average treatment effect of multiple treatments. Statistics in Medicine, 12, 681–697. doi:10.1002/sim.4168.

    Google Scholar 

  • Francis, G. (2012). Too good to be true. Publication bias in two prominent studies from experimental psychology. Psychonomic Bulletin and Review, 19, 151–156. doi:10.3758/s13423-012-0227-9.

    Article  Google Scholar 

  • Guo, S., & Fraser, M. W. (2010). Propensity score analysis: Statistical methods and applications. Thousand Oaks: Sage Publications.

    Google Scholar 

  • Ioannidis, J. P. A. (2005). Why most published research findings are false. PLOS Medicine, 2(8), e124. doi:10.1371/journal.pmed/0020124.

    Article  Google Scholar 

  • Ioannidis, J. P. A. (2008). Perfect study, poor evidence: interpretation of biases preceding study design. Seminars in Hematology, 45, 160–166.

    Article  Google Scholar 

  • Ioannidis, J., & Lau, J. (2001). Evolution of treatment effects over time: empirical insight from recursive cumulative meta-analyses. Proceedings of the National Academy of Science USA, 98, 831–836.

    Article  Google Scholar 

  • Ioannidis, J. P. A., & Panagiotou, O. A. (2011). Comparison of effect sizes associated with biomarkers reported in highly cited individual articles and in subsequent meta-analyses. Journal of the American Medical Association, 305, 2200–2210.

    Article  Google Scholar 

  • Ioannidis, J. P., & Trikalinos, T. A. (2007). An exploratory test for an excess of significant findings. Clinical Trials, 4, 245–253.

    Article  Google Scholar 

  • Kyzas, P. A., Loizou, K. T., & Ioannidis, J. P. (2005). Selective reporting biases in cancer prognostic factor studies. Journal of the National Cancer Institute, 97, 1043–1055.

    Article  Google Scholar 

  • LaLonde, R. (1986). Evaluating the econometric evaluations of training programs with experimental data. American Economic Review, 76, 604–620.

    Google Scholar 

  • Light, R. J., Singer, J. D., & Willett, J. B. (1990). By design: Planning research in higher education. Cambridge: Harvard University Press.

    Google Scholar 

  • Luellen, J. (2007). A comparison of propensity score estimation and adjustment methods on simulated data (Unpublished doctoral dissertation). The University of Memphis, Memphis, TN.

  • McCandless, L.C., Richardson, S. & Best, N. (2012). Adjustment for missing confounders using external validation data and propensity scores. Journal of the American Statistical Association, 107, 40–51. http://dx.doi.org/10.1080/01621459.2011.643739

    Google Scholar 

  • Moser, S., West, S. G., & Hughes, J. N. (2012). Trajectories of math and reading achievement in low achieving children in elementary school: How are they affected by retention in first and later grades? Journal of Educational Psychology, 104, 603–621. doi:10.1037/a0027571

    Google Scholar 

  • Peikes, D. N., Moreno, L., & Orzol, S. M. (2008). Propensity score matching: a note of caution for evaluators of social programs. The American Statistician, 62, 222–231.

    Article  Google Scholar 

  • Pohl, S., Steiner, P. M., Eisermann, J., Soellner, R., & Cook, T. D. (2009). Unbiased causal inference from an observational study: results of a within-study comparison. Educational Evaluation and Policy Analysis, 31, 463–479.

    Article  Google Scholar 

  • Popper, K. R. (1959). The logic of scientific discovery. New York: Basic Books.

    Google Scholar 

  • Renkewitz, R., Fuchs, H. M., & Fiedler, S. (2011). Is there evidence of publication biases in JDM research? Judgment and Decision Making, 6, 870–881.

    Google Scholar 

  • Rosenbaum, P. R., & Rubin, D. B. (1983). The central role of the propensity score in observational studies for causal effects. Biometrika, 70, 41–55.

    Article  Google Scholar 

  • Rubin, D. B. (2001). Using propensity scores to help design observational studies: application to the tobacco litigation. Health Services and Outcomes Research Methodology, 2, 169–188.

    Article  Google Scholar 

  • Shadish, W. R., & Cook, T. D. (2009). The renaissance of field experimentation in evaluating interventions. Annual Review of Psychology, 60, 607–629.

    Article  Google Scholar 

  • Shadish, W. R., Clark, M. H., & Steiner, P. M. (2008). Can nonrandomized experiments yield accurate answers? A randomized experiment comparing random to nonrandom assignment. Journal of the American Statistical Association, 103, 1334–1343.

    Article  Google Scholar 

  • Shadish, W.R., Steiner, P.M., & Cook, T.D. (2008). Peikes, D.N., Moreno, L. & Orzol, S.M. (2008). Propensity score matching: A note of caution for evaluators of social programs. The American Statistician, 62, 222-231: Comment by Shadish, Steiner and Cook. Unpublished manuscript.

    Google Scholar 

  • Simmons, J. P., Nelson, L. D., & Simonsohn, U. (2011). False-positive psychology: undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychological Science, 22, 1359–1366. doi:10.1177/0956797611417632.

    Article  Google Scholar 

  • Steiner, P. M., Cook, T. D., & Shadish, W. R. (2011). On the importance of reliable covariate measurement in selection bias adjustments using propensity scores. Journal of Educational and Behavioral Statistics, 36, 213–236.

    Article  Google Scholar 

  • Steiner, P. M., Cook, T. D., Shadish, W. R., & Clark, M. H. (2010). The importance of covariate selection in controlling for selection bias in observational studies. Psychological Methods, 15, 250–267.

    Article  Google Scholar 

  • Wagenmakers, E.-J., Wetzels, R., Borsboom, D., & van der Maas, H. L. J. (2011). Why psychologists must change the way that they analyze their data: the case of Psi: comment on Bem (2011). Journal of Personality and Social Psychology, 100, 426–432. doi:10.1037/a0022790.

    Article  Google Scholar 

  • Zhao, Z. (2004). Using matching to estimate treatment effects: data requirements, matching metrics and Monte Carlo evidence. The Review of Economics and Statistics, 86, 91–107.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to William R. Shadish.

Additional information

This research was supported in part by grant R305D100033 from the Institute for Educational Sciences, U.S. Department of Education, and by a grant from the University of California Office of the President to the University of California Educational Evaluation Consortium. The opinions expressed are those of the author and do not represent views of the University of California, the Institute for Educational Sciences, or the U.S. Department of Education.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Shadish, W.R. Propensity score analysis: promise, reality and irrational exuberance. J Exp Criminol 9, 129–144 (2013). https://doi.org/10.1007/s11292-012-9166-8

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11292-012-9166-8

Keywords

Navigation