Skip to main content

Matching in R

  • Chapter
  • First Online:
Design of Observational Studies

Part of the book series: Springer Series in Statistics ((SSS))

  • 3472 Accesses

Abstract

Simple calculations in the statistical language R illustrate the computations involved in one simple form of multivariate matching. The focus is on how matching is done, not on the many aspects of the design of an observational study. The process is made tangible by describing it in detail, step-by-step, closely inspecting intermediate results; however, essentially, three steps are illustrated: (1) creating a distance matrix, (2) adding a propensity score caliper to the distance matrix, and (3) finding an optimal match. In practice, matching involves bookkeeping and efficient use of computer memory that are best handled by dedicated software for matching. Sect. 14.10 describes currently available software.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 119.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 159.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Dynarski [15, Table 2] presents two analyses, one with no covariates, the other with many more covariates, obtaining similar estimates of effect from both analyses. That is a reasonable approach in the context of her paper. In constructing a matched control group in this chapter, I have omitted a few of the covariates that Dynarski used, including “single-parent household” and “father attended college,” from a comparison involving a group defined by a deceased father. In general, if one wants to present parallel analyses with and without adjustment for a particular covariate, say “single-parent household,” then one should not match on that covariate, but should control for it in one of the parallel analyses using analytical techniques; e.g., Sect. 22.2 or [16, 32, 37, 42] and [36, §3.6] .

  2. 2.

    In detail, the covariates used in the match are (1) faminc: family income in units of $10,000; (2) incmiss: income missing (incmiss=1 if family income is missing, incmiss=0 otherwise); (3) black (black=1 if black, black=0 otherwise), (4) hispanic (hispanic=1 if hispanic, hispanic=0 otherwise), (5) afqtpct: Armed Forces Qualifications Test (AFQT), (6) edmissm: mother’s education missing (edmissm=1 if missing, edmissm=0 otherwise), (vii) edm: mother’s education (edm=1 for less than high school, edm=2 for high school, edm=3 for some college, edm=4 for BA degree or more), (viii) female (female=1 for female, female =0 for male).

  3. 3.

    Because of this restriction, the counts of seniors in various groups are slightly different than in [15].

  4. 4.

    Technically, the fitted probabilities in logit regression are invariant under affine transformations of the predictors.

Bibliography

  1. Aitkin, M., Francis, B., Hinde, J., Darnell, R.: Statistical Modelling in R. Oxford University Press, New York (2009)

    Google Scholar 

  2. Angrist, J.D. , Lavy, V.: Using Maimonides’ rule to estimate the effect of class size on scholastic achievement. Q. J. Econ. 114, 533–575 (1999)

    Google Scholar 

  3. Baiocchi, M., Small, D.S., Lorch, S., Rosenbaum, P.R.: Building a stronger instrument in an observational study of perinatal care for premature infants. J. Am. Stat. Assoc. 105, 1285–1296 (2010)

    Google Scholar 

  4. Baiocchi, M., Small, D.S., Yang, L., Polsky, D., Groeneveld, P.W.: Near/far matching: a study design approach to instrumental variables. Health Serv. Outcomes Res. Method 12, 237–253 (2012)

    Google Scholar 

  5. Bertsekas, D.P.: A new algorithm for the assignment problem. Math. Program 21, 152–171 (1981)

    Google Scholar 

  6. Bertsekas, D.P.: The auction algorithm for assignment and other network flow problems: a tutorial. Interfaces 20, 133–149 (1990)

    Google Scholar 

  7. Bertsekas, D.P.: Linear Network Optimization. MIT Press, Cambridge (1991)

    MATH  Google Scholar 

  8. Bertsekas, D.P.: Network Optimization: Continuous and Discrete Models. Athena Scientific, Belmont (1998)

    Google Scholar 

  9. Bertsekas, D.P., Tseng, P.: The relax codes for linear minimum cost network flow problems. An. Oper. Res. 13, 125–190 (1988)

    Google Scholar 

  10. Campbell, D.T. : Factors relevant to the validity of experiments in social settings. Psychol. Bull. 54, 297–312 (1957)

    Google Scholar 

  11. Card, D., Krueger, A.: Minimum wages and employment: a case study of the fast-food industry in New Jersey and Pennsylvania. Am. Econ. Rev. 84, 772–793 (1994). http://www.irs.princeton.edu/

  12. Chambers, J.: Software for Data Analysis: Programming with R. Springer, New York (2008)

    Google Scholar 

  13. Dalgaard, P.: Introductory Statistics with R. Springer, New York (2002)

    Google Scholar 

  14. Derigs, U. : Solving nonbipartite matching problems by shortest path techniques. Ann. Operat. Res. 13, 225–261 (1988)

    Google Scholar 

  15. Dynarski, S.M.: Does aid matter? Measuring the effect of student aid on college attendance and completion. Am. Econ. Rev. 93, 279–288 (2003)

    Google Scholar 

  16. Fleiss, J.L., Levin, B., Paik, M.C.: Statistical Methods for Rates and Proportions. Wiley, New York (2001)

    MATH  Google Scholar 

  17. Hansen, B.B.: Optmatch: flexible, optimal matching for observational studies. R News 7, 18–24 (2007)

    Google Scholar 

  18. Hansen, B.B., Klopfer, S.O.: Optimal full matching and related designs via network flows. J. Comp. Graph. Stat. 15, 609–627 (2006)

    Google Scholar 

  19. Ho, D., Imai, K., King, G., Stuart, E.A.: Matching as nonparametric preprocessing for reducing model dependence in parametric causal inference. Polit. Anal. 15, 199–236 (2007)

    Google Scholar 

  20. Kelz, R.R., Sellers, M.M., Niknam, B.A., Sharpe, J.E., Rosenbaum, P.R., Hill, A.S., Zhou, H., Hochman, L.L., Bilimoria, K.Y., Itani, K., Romano, P.S., Silber, J.H.: A National comparison of operative outcomes of new and experienced surgeons. Ann. Surgery (2020). https://doi.org/10.1097/SLA.0000000000003388

  21. Kilcioglu, C., Zubizarreta, J.R.: Maximizing the information content of a balanced matched sample in a study of the economic performance of green buildings. Ann. Appl. Stat. 10, 1997–2020 (2016)

    Google Scholar 

  22. LaLonde, R.J.: Evaluating the econometric evaluations of training programs with experimental data. Am. Econ. Rev. 76, 604–620 (1986)

    Google Scholar 

  23. Lu, B., Greevy, R., Xu, X., Beck, C.: Optimal nonbipartite matching and its statistical applications. Am. Stat. 65, 21–30 (2011)

    Google Scholar 

  24. Maindonald, J., Braun, J.: Data Analysis and Graphics Using R. Cambridge University Press, New York (2001)

    Google Scholar 

  25. McCullagh, P. , Nelder, J.A. : Generalized Linear Models. Chapman and Hall/CRC, New York (1989)

    Google Scholar 

  26. Ming, K., Rosenbaum, P.R.: A note on optimal matching with variable controls using the assignment algorithm. J. Comput. Graph. Stat. 10, 455–463 (2001)

    Google Scholar 

  27. Pimentel, S.D.: Large, sparse optimal matching with R package rcbalance. Obs. Stud. 2, 4–23 (2016)

    Google Scholar 

  28. Pimentel, S.D., Kelz, R.R., Silber, J.H., Rosenbaum, P.R.: Large, sparse optimal matching with refined covariate balance in an observational study of the health outcomes produced by new surgeons. J. Am. Stat. Assoc. 110, 515–527 (2015)

    Google Scholar 

  29. R Development Core Team.: R: a Language and Environment for Statistical Computing. R Foundation, Vienna (2019). http://www.R-project.org

  30. Rigdon, J., Baiocchi, M., Basu, S.: Near-far matching in R: the nearfar package. J. Stat. Soft. 86, 5. https://doi.org/10.18637/jss.v086.c05

  31. Rosenbaum, P.R.: From association to causation in observational studies. J. Am. Stat. Assoc. 79, 41–48 (1984)

    Article  Google Scholar 

  32. Rosenbaum, P.R.: Permutation tests for matched pairs with adjustments for covariates. Appl. Stat. 37, 401–411 (1988) (Correction: [36, §3] )

    Google Scholar 

  33. Rosenbaum, P.R.: Optimal matching in observational studies. J. Am. Stat. Assoc. 84, 1024–32 (1989)

    Article  Google Scholar 

  34. Rosenbaum, P.R.: A characterization of optimal designs for observational studies. J. R. Stat. Soc. B 53, 597–610 (1991)

    MathSciNet  MATH  Google Scholar 

  35. Rosenbaum, P.R.: Stability in the absence of treatment. J. Am. Stat. Assoc. 96, 210–219 (2001)

    Article  MathSciNet  Google Scholar 

  36. Rosenbaum, P.R.: Observational Studies (2nd ed.). Springer, New York (2002)

    Book  Google Scholar 

  37. Rosenbaum, P.R.: Covariance adjustment in randomized experiments and observational studies (with Discussion). Stat. Sci. 17, 286–327 (2002)

    Article  Google Scholar 

  38. Rosenbaum, P.R.: Modern algorithms for matching in observational studies. Annu. Rev. Stat. Appl. 7, 143–176 (2020). https://doi.org/10.1146/annurev-statistics-031219-041058

    Article  Google Scholar 

  39. Rosenbaum, P.R., Rubin, D.B.: The central role of the propensity score in observational studies for causal effects. Biometrika 70, 41–55 (1983)

    Article  MathSciNet  Google Scholar 

  40. Rosenbaum, P.R., Rubin, D.B.: Reducing bias in observational studies using subclassification on the propensity score. J. Am. Stat. Assoc. 79, 516–524 (1984)

    Article  Google Scholar 

  41. Rosenbaum, P.R., Rubin, D.B.: Constructing a control group by multivariate matched sampling methods that incorporate the propensity score. Am. Stat. 39, 33–38 (1985)

    Google Scholar 

  42. Rubin, D.B.: Using multivariate matched sampling and regression adjustment to control bias in observational studies. J. Am. Stat. Assoc. 74, 318–328 (1979)

    Google Scholar 

  43. Shadish, W.R., Cook, T.D., Campbell, D.T.: Experimental and Quasi-Experimental Designs for Generalized Causal Inference. Houghton-Mifflin, Boston (2002)

    Google Scholar 

  44. Silber, J.H., Rosenbaum, P.R., McHugh, M.D., Ludwig, J.M., Smith, H.L., Niknam, B.A., Even-Shoshan, O., Fleisher, L.A., Kelz, R.R., Aiken, L.H.: Comparison of the value of nursing work environments in hospitals across different levels of patient risk. JAMA Surg. 151, 527–536 (2016)

    Google Scholar 

  45. Wooldridge, J.M.: Econometric Analysis of Cross Section and Panel Data. MIT Press, Cambridge (2002)

    Google Scholar 

  46. Yu, R., Rosenbaum, P.R.: Directional penalties for optimal matching in observational studies. Biometrics 75(4), 1380–1390 (2019). https://doi.org/10.1111/biom.13098

  47. Yu, R., Silber, J.H., Rosenbaum, P.R.: Matching methods for observational studies derived from large administrative databases. Stat. Sci. (2019)

    Google Scholar 

  48. Zubizarreta, J.R.: Using mixed integer programming for matching in an observational study of kidney failure after surgery. J. Am. Stat. Assoc. 107, 1360–1371 (2012)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

R. Rosenbaum, P. (2020). Matching in R . In: Design of Observational Studies. Springer Series in Statistics. Springer, Cham. https://doi.org/10.1007/978-3-030-46405-9_14

Download citation

Publish with us

Policies and ethics