Matching in R

R. Rosenbaum, Paul

doi:10.1007/978-3-030-46405-9_14

Paul R. Rosenbaum⁶

Part of the book series: Springer Series in Statistics ((SSS))

3472 Accesses

Abstract

Simple calculations in the statistical language R illustrate the computations involved in one simple form of multivariate matching. The focus is on how matching is done, not on the many aspects of the design of an observational study. The process is made tangible by describing it in detail, step-by-step, closely inspecting intermediate results; however, essentially, three steps are illustrated: (1) creating a distance matrix, (2) adding a propensity score caliper to the distance matrix, and (3) finding an optimal match. In practice, matching involves bookkeeping and efficient use of computer memory that are best handled by dedicated software for matching. Sect. 14.10 describes currently available software.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 119.00; Price excludes VAT (USA)

Softcover Book: USD 159.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Dynarski [15, Table 2] presents two analyses, one with no covariates, the other with many more covariates, obtaining similar estimates of effect from both analyses. That is a reasonable approach in the context of her paper. In constructing a matched control group in this chapter, I have omitted a few of the covariates that Dynarski used, including “single-parent household” and “father attended college,” from a comparison involving a group defined by a deceased father. In general, if one wants to present parallel analyses with and without adjustment for a particular covariate, say “single-parent household,” then one should not match on that covariate, but should control for it in one of the parallel analyses using analytical techniques; e.g., Sect. 22.2 or [16, 32, 37, 42] and [36, §3.6] .
2.
In detail, the covariates used in the match are (1) faminc: family income in units of $10,000; (2) incmiss: income missing (incmiss=1 if family income is missing, incmiss=0 otherwise); (3) black (black=1 if black, black=0 otherwise), (4) hispanic (hispanic=1 if hispanic, hispanic=0 otherwise), (5) afqtpct: Armed Forces Qualifications Test (AFQT), (6) edmissm: mother’s education missing (edmissm=1 if missing, edmissm=0 otherwise), (vii) edm: mother’s education (edm=1 for less than high school, edm=2 for high school, edm=3 for some college, edm=4 for BA degree or more), (viii) female (female=1 for female, female =0 for male).
3.
Because of this restriction, the counts of seniors in various groups are slightly different than in [15].
4.
Technically, the fitted probabilities in logit regression are invariant under affine transformations of the predictors.

Bibliography

Aitkin, M., Francis, B., Hinde, J., Darnell, R.: Statistical Modelling in R. Oxford University Press, New York (2009)
Google Scholar
Angrist, J.D. , Lavy, V.: Using Maimonides’ rule to estimate the effect of class size on scholastic achievement. Q. J. Econ. 114, 533–575 (1999)
Google Scholar
Baiocchi, M., Small, D.S., Lorch, S., Rosenbaum, P.R.: Building a stronger instrument in an observational study of perinatal care for premature infants. J. Am. Stat. Assoc. 105, 1285–1296 (2010)
Google Scholar
Baiocchi, M., Small, D.S., Yang, L., Polsky, D., Groeneveld, P.W.: Near/far matching: a study design approach to instrumental variables. Health Serv. Outcomes Res. Method 12, 237–253 (2012)
Google Scholar
Bertsekas, D.P.: A new algorithm for the assignment problem. Math. Program 21, 152–171 (1981)
Google Scholar
Bertsekas, D.P.: The auction algorithm for assignment and other network flow problems: a tutorial. Interfaces 20, 133–149 (1990)
Google Scholar
Bertsekas, D.P.: Linear Network Optimization. MIT Press, Cambridge (1991)
MATH Google Scholar
Bertsekas, D.P.: Network Optimization: Continuous and Discrete Models. Athena Scientific, Belmont (1998)
Google Scholar
Bertsekas, D.P., Tseng, P.: The relax codes for linear minimum cost network flow problems. An. Oper. Res. 13, 125–190 (1988)
Google Scholar
Campbell, D.T. : Factors relevant to the validity of experiments in social settings. Psychol. Bull. 54, 297–312 (1957)
Google Scholar
Card, D., Krueger, A.: Minimum wages and employment: a case study of the fast-food industry in New Jersey and Pennsylvania. Am. Econ. Rev. 84, 772–793 (1994). http://www.irs.princeton.edu/
Chambers, J.: Software for Data Analysis: Programming with R. Springer, New York (2008)
Google Scholar
Dalgaard, P.: Introductory Statistics with R. Springer, New York (2002)
Google Scholar
Derigs, U. : Solving nonbipartite matching problems by shortest path techniques. Ann. Operat. Res. 13, 225–261 (1988)
Google Scholar
Dynarski, S.M.: Does aid matter? Measuring the effect of student aid on college attendance and completion. Am. Econ. Rev. 93, 279–288 (2003)
Google Scholar
Fleiss, J.L., Levin, B., Paik, M.C.: Statistical Methods for Rates and Proportions. Wiley, New York (2001)
MATH Google Scholar
Hansen, B.B.: Optmatch: flexible, optimal matching for observational studies. R News 7, 18–24 (2007)
Google Scholar
Hansen, B.B., Klopfer, S.O.: Optimal full matching and related designs via network flows. J. Comp. Graph. Stat. 15, 609–627 (2006)
Google Scholar
Ho, D., Imai, K., King, G., Stuart, E.A.: Matching as nonparametric preprocessing for reducing model dependence in parametric causal inference. Polit. Anal. 15, 199–236 (2007)
Google Scholar
Kelz, R.R., Sellers, M.M., Niknam, B.A., Sharpe, J.E., Rosenbaum, P.R., Hill, A.S., Zhou, H., Hochman, L.L., Bilimoria, K.Y., Itani, K., Romano, P.S., Silber, J.H.: A National comparison of operative outcomes of new and experienced surgeons. Ann. Surgery (2020). https://doi.org/10.1097/SLA.0000000000003388
Kilcioglu, C., Zubizarreta, J.R.: Maximizing the information content of a balanced matched sample in a study of the economic performance of green buildings. Ann. Appl. Stat. 10, 1997–2020 (2016)
Google Scholar
LaLonde, R.J.: Evaluating the econometric evaluations of training programs with experimental data. Am. Econ. Rev. 76, 604–620 (1986)
Google Scholar
Lu, B., Greevy, R., Xu, X., Beck, C.: Optimal nonbipartite matching and its statistical applications. Am. Stat. 65, 21–30 (2011)
Google Scholar
Maindonald, J., Braun, J.: Data Analysis and Graphics Using R. Cambridge University Press, New York (2001)
Google Scholar
McCullagh, P. , Nelder, J.A. : Generalized Linear Models. Chapman and Hall/CRC, New York (1989)
Google Scholar
Ming, K., Rosenbaum, P.R.: A note on optimal matching with variable controls using the assignment algorithm. J. Comput. Graph. Stat. 10, 455–463 (2001)
Google Scholar
Pimentel, S.D.: Large, sparse optimal matching with R package rcbalance. Obs. Stud. 2, 4–23 (2016)
Google Scholar
Pimentel, S.D., Kelz, R.R., Silber, J.H., Rosenbaum, P.R.: Large, sparse optimal matching with refined covariate balance in an observational study of the health outcomes produced by new surgeons. J. Am. Stat. Assoc. 110, 515–527 (2015)
Google Scholar
R Development Core Team.: R: a Language and Environment for Statistical Computing. R Foundation, Vienna (2019). http://www.R-project.org
Rigdon, J., Baiocchi, M., Basu, S.: Near-far matching in R: the nearfar package. J. Stat. Soft. 86, 5. https://doi.org/10.18637/jss.v086.c05
Rosenbaum, P.R.: From association to causation in observational studies. J. Am. Stat. Assoc. 79, 41–48 (1984)
Article Google Scholar
Rosenbaum, P.R.: Permutation tests for matched pairs with adjustments for covariates. Appl. Stat. 37, 401–411 (1988) (Correction: [36, §3] )
Google Scholar
Rosenbaum, P.R.: Optimal matching in observational studies. J. Am. Stat. Assoc. 84, 1024–32 (1989)
Article Google Scholar
Rosenbaum, P.R.: A characterization of optimal designs for observational studies. J. R. Stat. Soc. B 53, 597–610 (1991)
MathSciNet MATH Google Scholar
Rosenbaum, P.R.: Stability in the absence of treatment. J. Am. Stat. Assoc. 96, 210–219 (2001)
Article MathSciNet Google Scholar
Rosenbaum, P.R.: Observational Studies (2nd ed.). Springer, New York (2002)
Book Google Scholar
Rosenbaum, P.R.: Covariance adjustment in randomized experiments and observational studies (with Discussion). Stat. Sci. 17, 286–327 (2002)
Article Google Scholar
Rosenbaum, P.R.: Modern algorithms for matching in observational studies. Annu. Rev. Stat. Appl. 7, 143–176 (2020). https://doi.org/10.1146/annurev-statistics-031219-041058
Article Google Scholar
Rosenbaum, P.R., Rubin, D.B.: The central role of the propensity score in observational studies for causal effects. Biometrika 70, 41–55 (1983)
Article MathSciNet Google Scholar
Rosenbaum, P.R., Rubin, D.B.: Reducing bias in observational studies using subclassification on the propensity score. J. Am. Stat. Assoc. 79, 516–524 (1984)
Article Google Scholar
Rosenbaum, P.R., Rubin, D.B.: Constructing a control group by multivariate matched sampling methods that incorporate the propensity score. Am. Stat. 39, 33–38 (1985)
Google Scholar
Rubin, D.B.: Using multivariate matched sampling and regression adjustment to control bias in observational studies. J. Am. Stat. Assoc. 74, 318–328 (1979)
Google Scholar
Shadish, W.R., Cook, T.D., Campbell, D.T.: Experimental and Quasi-Experimental Designs for Generalized Causal Inference. Houghton-Mifflin, Boston (2002)
Google Scholar
Silber, J.H., Rosenbaum, P.R., McHugh, M.D., Ludwig, J.M., Smith, H.L., Niknam, B.A., Even-Shoshan, O., Fleisher, L.A., Kelz, R.R., Aiken, L.H.: Comparison of the value of nursing work environments in hospitals across different levels of patient risk. JAMA Surg. 151, 527–536 (2016)
Google Scholar
Wooldridge, J.M.: Econometric Analysis of Cross Section and Panel Data. MIT Press, Cambridge (2002)
Google Scholar
Yu, R., Rosenbaum, P.R.: Directional penalties for optimal matching in observational studies. Biometrics 75(4), 1380–1390 (2019). https://doi.org/10.1111/biom.13098
Yu, R., Silber, J.H., Rosenbaum, P.R.: Matching methods for observational studies derived from large administrative databases. Stat. Sci. (2019)
Google Scholar
Zubizarreta, J.R.: Using mixed integer programming for matching in an observational study of kidney failure after surgery. J. Am. Stat. Assoc. 107, 1360–1371 (2012)
Google Scholar

Download references

Author information

Authors and Affiliations

Statistics Department, Wharton School, University of Pennsylvania, Philadelphia, PA, USA
Paul R. Rosenbaum

Authors

Paul R. Rosenbaum
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

R. Rosenbaum, P. (2020). Matching in R . In: Design of Observational Studies. Springer Series in Statistics. Springer, Cham. https://doi.org/10.1007/978-3-030-46405-9_14

Download citation

DOI: https://doi.org/10.1007/978-3-030-46405-9_14
Published: 14 July 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-46404-2
Online ISBN: 978-3-030-46405-9
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics