Abstract
Simple calculations in the statistical language R illustrate the computations involved in one simple form of multivariate matching. The focus is on how matching is done, not on the many aspects of the design of an observational study. The process is made tangible by describing it in detail, step-by-step, closely inspecting intermediate results; however, essentially, three steps are illustrated: (1) creating a distance matrix, (2) adding a propensity score caliper to the distance matrix, and (3) finding an optimal match. In practice, matching involves bookkeeping and efficient use of computer memory that are best handled by dedicated software for matching. Sect. 14.10 describes currently available software.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Dynarski [15, Table 2] presents two analyses, one with no covariates, the other with many more covariates, obtaining similar estimates of effect from both analyses. That is a reasonable approach in the context of her paper. In constructing a matched control group in this chapter, I have omitted a few of the covariates that Dynarski used, including “single-parent household” and “father attended college,” from a comparison involving a group defined by a deceased father. In general, if one wants to present parallel analyses with and without adjustment for a particular covariate, say “single-parent household,” then one should not match on that covariate, but should control for it in one of the parallel analyses using analytical techniques; e.g., Sect. 22.2 or [16, 32, 37, 42] and [36, §3.6] .
- 2.
In detail, the covariates used in the match are (1) faminc: family income in units of $10,000; (2) incmiss: income missing (incmiss=1 if family income is missing, incmiss=0 otherwise); (3) black (black=1 if black, black=0 otherwise), (4) hispanic (hispanic=1 if hispanic, hispanic=0 otherwise), (5) afqtpct: Armed Forces Qualifications Test (AFQT), (6) edmissm: mother’s education missing (edmissm=1 if missing, edmissm=0 otherwise), (vii) edm: mother’s education (edm=1 for less than high school, edm=2 for high school, edm=3 for some college, edm=4 for BA degree or more), (viii) female (female=1 for female, female =0 for male).
- 3.
Because of this restriction, the counts of seniors in various groups are slightly different than in [15].
- 4.
Technically, the fitted probabilities in logit regression are invariant under affine transformations of the predictors.
Bibliography
Aitkin, M., Francis, B., Hinde, J., Darnell, R.: Statistical Modelling in R. Oxford University Press, New York (2009)
Angrist, J.D. , Lavy, V.: Using Maimonides’ rule to estimate the effect of class size on scholastic achievement. Q. J. Econ. 114, 533–575 (1999)
Baiocchi, M., Small, D.S., Lorch, S., Rosenbaum, P.R.: Building a stronger instrument in an observational study of perinatal care for premature infants. J. Am. Stat. Assoc. 105, 1285–1296 (2010)
Baiocchi, M., Small, D.S., Yang, L., Polsky, D., Groeneveld, P.W.: Near/far matching: a study design approach to instrumental variables. Health Serv. Outcomes Res. Method 12, 237–253 (2012)
Bertsekas, D.P.: A new algorithm for the assignment problem. Math. Program 21, 152–171 (1981)
Bertsekas, D.P.: The auction algorithm for assignment and other network flow problems: a tutorial. Interfaces 20, 133–149 (1990)
Bertsekas, D.P.: Linear Network Optimization. MIT Press, Cambridge (1991)
Bertsekas, D.P.: Network Optimization: Continuous and Discrete Models. Athena Scientific, Belmont (1998)
Bertsekas, D.P., Tseng, P.: The relax codes for linear minimum cost network flow problems. An. Oper. Res. 13, 125–190 (1988)
Campbell, D.T. : Factors relevant to the validity of experiments in social settings. Psychol. Bull. 54, 297–312 (1957)
Card, D., Krueger, A.: Minimum wages and employment: a case study of the fast-food industry in New Jersey and Pennsylvania. Am. Econ. Rev. 84, 772–793 (1994). http://www.irs.princeton.edu/
Chambers, J.: Software for Data Analysis: Programming with R. Springer, New York (2008)
Dalgaard, P.: Introductory Statistics with R. Springer, New York (2002)
Derigs, U. : Solving nonbipartite matching problems by shortest path techniques. Ann. Operat. Res. 13, 225–261 (1988)
Dynarski, S.M.: Does aid matter? Measuring the effect of student aid on college attendance and completion. Am. Econ. Rev. 93, 279–288 (2003)
Fleiss, J.L., Levin, B., Paik, M.C.: Statistical Methods for Rates and Proportions. Wiley, New York (2001)
Hansen, B.B.: Optmatch: flexible, optimal matching for observational studies. R News 7, 18–24 (2007)
Hansen, B.B., Klopfer, S.O.: Optimal full matching and related designs via network flows. J. Comp. Graph. Stat. 15, 609–627 (2006)
Ho, D., Imai, K., King, G., Stuart, E.A.: Matching as nonparametric preprocessing for reducing model dependence in parametric causal inference. Polit. Anal. 15, 199–236 (2007)
Kelz, R.R., Sellers, M.M., Niknam, B.A., Sharpe, J.E., Rosenbaum, P.R., Hill, A.S., Zhou, H., Hochman, L.L., Bilimoria, K.Y., Itani, K., Romano, P.S., Silber, J.H.: A National comparison of operative outcomes of new and experienced surgeons. Ann. Surgery (2020). https://doi.org/10.1097/SLA.0000000000003388
Kilcioglu, C., Zubizarreta, J.R.: Maximizing the information content of a balanced matched sample in a study of the economic performance of green buildings. Ann. Appl. Stat. 10, 1997–2020 (2016)
LaLonde, R.J.: Evaluating the econometric evaluations of training programs with experimental data. Am. Econ. Rev. 76, 604–620 (1986)
Lu, B., Greevy, R., Xu, X., Beck, C.: Optimal nonbipartite matching and its statistical applications. Am. Stat. 65, 21–30 (2011)
Maindonald, J., Braun, J.: Data Analysis and Graphics Using R. Cambridge University Press, New York (2001)
McCullagh, P. , Nelder, J.A. : Generalized Linear Models. Chapman and Hall/CRC, New York (1989)
Ming, K., Rosenbaum, P.R.: A note on optimal matching with variable controls using the assignment algorithm. J. Comput. Graph. Stat. 10, 455–463 (2001)
Pimentel, S.D.: Large, sparse optimal matching with R package rcbalance. Obs. Stud. 2, 4–23 (2016)
Pimentel, S.D., Kelz, R.R., Silber, J.H., Rosenbaum, P.R.: Large, sparse optimal matching with refined covariate balance in an observational study of the health outcomes produced by new surgeons. J. Am. Stat. Assoc. 110, 515–527 (2015)
R Development Core Team.: R: a Language and Environment for Statistical Computing. R Foundation, Vienna (2019). http://www.R-project.org
Rigdon, J., Baiocchi, M., Basu, S.: Near-far matching in R: the nearfar package. J. Stat. Soft. 86, 5. https://doi.org/10.18637/jss.v086.c05
Rosenbaum, P.R.: From association to causation in observational studies. J. Am. Stat. Assoc. 79, 41–48 (1984)
Rosenbaum, P.R.: Permutation tests for matched pairs with adjustments for covariates. Appl. Stat. 37, 401–411 (1988) (Correction: [36, §3] )
Rosenbaum, P.R.: Optimal matching in observational studies. J. Am. Stat. Assoc. 84, 1024–32 (1989)
Rosenbaum, P.R.: A characterization of optimal designs for observational studies. J. R. Stat. Soc. B 53, 597–610 (1991)
Rosenbaum, P.R.: Stability in the absence of treatment. J. Am. Stat. Assoc. 96, 210–219 (2001)
Rosenbaum, P.R.: Observational Studies (2nd ed.). Springer, New York (2002)
Rosenbaum, P.R.: Covariance adjustment in randomized experiments and observational studies (with Discussion). Stat. Sci. 17, 286–327 (2002)
Rosenbaum, P.R.: Modern algorithms for matching in observational studies. Annu. Rev. Stat. Appl. 7, 143–176 (2020). https://doi.org/10.1146/annurev-statistics-031219-041058
Rosenbaum, P.R., Rubin, D.B.: The central role of the propensity score in observational studies for causal effects. Biometrika 70, 41–55 (1983)
Rosenbaum, P.R., Rubin, D.B.: Reducing bias in observational studies using subclassification on the propensity score. J. Am. Stat. Assoc. 79, 516–524 (1984)
Rosenbaum, P.R., Rubin, D.B.: Constructing a control group by multivariate matched sampling methods that incorporate the propensity score. Am. Stat. 39, 33–38 (1985)
Rubin, D.B.: Using multivariate matched sampling and regression adjustment to control bias in observational studies. J. Am. Stat. Assoc. 74, 318–328 (1979)
Shadish, W.R., Cook, T.D., Campbell, D.T.: Experimental and Quasi-Experimental Designs for Generalized Causal Inference. Houghton-Mifflin, Boston (2002)
Silber, J.H., Rosenbaum, P.R., McHugh, M.D., Ludwig, J.M., Smith, H.L., Niknam, B.A., Even-Shoshan, O., Fleisher, L.A., Kelz, R.R., Aiken, L.H.: Comparison of the value of nursing work environments in hospitals across different levels of patient risk. JAMA Surg. 151, 527–536 (2016)
Wooldridge, J.M.: Econometric Analysis of Cross Section and Panel Data. MIT Press, Cambridge (2002)
Yu, R., Rosenbaum, P.R.: Directional penalties for optimal matching in observational studies. Biometrics 75(4), 1380–1390 (2019). https://doi.org/10.1111/biom.13098
Yu, R., Silber, J.H., Rosenbaum, P.R.: Matching methods for observational studies derived from large administrative databases. Stat. Sci. (2019)
Zubizarreta, J.R.: Using mixed integer programming for matching in an observational study of kidney failure after surgery. J. Am. Stat. Assoc. 107, 1360–1371 (2012)
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this chapter
Cite this chapter
R. Rosenbaum, P. (2020). Matching in R . In: Design of Observational Studies. Springer Series in Statistics. Springer, Cham. https://doi.org/10.1007/978-3-030-46405-9_14
Download citation
DOI: https://doi.org/10.1007/978-3-030-46405-9_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-46404-2
Online ISBN: 978-3-030-46405-9
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)