Abstract
Before addressing issues related to describing and interpreting the model and its coefficients, one can never apply too much caution in attempting to interpret results in a causal manner. Regression models are excellent tools for estimating and inferring associations between an X and Y given that the “right” variables are in the model. Any ability of a model to provide causal inference rests entirely on the faith of the analyst in the experimental design, completeness of the set of variables that are thought to measure confounding and are used for adjustment when the experiment is not randomized, lack of important measurement error, and lastly the goodness of fit of the model.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
The s.d. of a binary variable is, aside from a multiplier of \(\frac{n} {n-1}\), equal to \(\sqrt{a(1 - a)}\), where a is the proportion of ones.
- 2.
There are decompositions of the Brier score into discrimination and calibration components.
- 3.
For example, in the binary logistic model, there is a generalization of R 2 available, but no adjusted version. For logistic models we often validate other indexes such as the ROC area or rank correlation between predicted probabilities and observed outcomes. We also validate the calibration accuracy of \(\hat{Y }\) in predicting Y.
- 4.
Using the rms package described in Chapter 6, such estimates and their confidence limits can easily be obtained, using for example contrast(fit, list(age=50, disease=’hypertension’, race=levels(race)), type=’average’, weights=table(race)).
References
D. G. Altman and P. Royston. What do we mean by validating a prognostic model? Stat Med, 19:453–473, 2000.
G. Ambler, A. R. Brady, and P. Royston. Simplifying a prognostic model: a simulation study based on clinical data. Stat Med, 21(24):3803–3822, Dec. 2002.
P. C. Austin and E. W. Steyerberg. Events per variable (EPV) and the relative performance of different strategies for estimating the out-of-sample validity of logistic regression models. Statistical methods in medical research, Nov. 2014.
P. C. Austin and E. W. Steyerberg. Graphical assessment of internal and external calibration of logistic regression models by using loess smoothers. Stat Med, 33(3):517–535, Feb. 2014.
J. Banks. Nomograms. In S. Kotz and N. L. Johnson, editors, Encyclopedia of Stat Scis, volume 6. Wiley, New York, 1985.
S. E. Bleeker, H. A. Moll, E. W. Steyerberg, A. R. T. Donders, G. Derkson-Lubsen, D. E. Grobbee, and K. G. M. Moons. External validation is necessary in prediction research: A clinical example. J Clin Epi, 56:826–832, 2003.
M. Blettner and W. Sauerbrei. Influence of model-building strategies on the results of a case-control study. Stat Med, 12:1325–1338, 1993.
J. G. Booth and S. Sarkar. Monte Carlo approximation of bootstrap variances. Am Statistician, 52:354–357, 1998.
L. Breiman. The little bootstrap and other methods for dimensionality selection in regression: X-fixed prediction error. J Am Stat Assoc, 87:738–754, 1992.
D. Brownstone. Regression strategies. In Proceedings of the 20th Symposium on the Interface between Computer Science and Statistics, pages 74–79, Washington, DC, 1988. American Statistical Association.
A. J. Canty, A. C. Davison, D. V. Hinkley, and V. Venture. Bootstrap diagnostics and remedies. Can J Stat, 34:5–27, 2006.
J. Carpenter and J. Bithell. Bootstrap confidence intervals: when, which, what? A practical guide for medical statisticians. Stat Med, 19:1141–1164, 2000.
C. Chatfield. Model uncertainty, data mining and statistical inference (with discussion). J Roy Stat Soc A, 158:419–466, 1995.
J. B. Copas. Cross-validation shrinkage of regression predictors. J Roy Stat Soc B, 49:175–183, 1987.
A. C. Davison and D. V. Hinkley. Bootstrap Methods and Their Application. Cambridge University Press, Cambridge, 1997.
B. Efron. Estimating the error rate of a prediction rule: Improvement on cross-validation. J Am Stat Assoc, 78:316–331, 1983.
B. Efron. How biased is the apparent error rate of a prediction rule? J Am Stat Assoc, 81:461–470, 1986.
B. Efron and G. Gong. A leisurely look at the bootstrap, the jackknife, and cross-validation. Am Statistician, 37:36–48, 1983.
B. Efron and R. Tibshirani. Bootstrap methods for standard errors, confidence intervals, and other measures of statistical accuracy. Statistical Sci, 1:54–77, 1986.
B. Efron and R. Tibshirani. An Introduction to the Bootstrap. Chapman and Hall, New York, 1993.
B. Efron and R. Tibshirani. Improvements on cross-validation: The.632+ bootstrap method. J Am Stat Assoc, 92:548–560, 1997.
D. Faraggi, M. LeBlanc, and J. Crowley. Understanding neural networks using regression trees: an application to multiple myeloma survival data. Stat Med, 20:2965–2976, 2001.
J. J. Faraway. The cost of data analysis. J Comp Graph Stat, 1:213–229, 1992.
J. Fox. Applied Regression Analysis and Generalized Linear Models. SAGE Publications, Thousand Oaks, CA, second edition, 2008.
D. Freedman, W. Navidi, and S. Peters. On the Impact of Variable Selection in Fitting Regression Equations, pages 1–16. Lecture Notes in Economics and Mathematical Systems. Springer-Verlag, New York, 1988.
A. Gelman. Scaling regression inputs by dividing by two standard deviations. Stat Med, 27:2865–2873, 2008.
M. Gönen and G. Heller. Concordance probability and discriminatory power in proportional hazards regression. Biometrika, 92(4):965–970, Dec. 2005.
G. Gong. Cross-validation, the jackknife, and the bootstrap: Excess error estimation in forward logistic regression. J Am Stat Assoc, 81:108–113, 1986.
I. M. Graham and E. Clavel. Communicating risk — coronary risk scores. J Roy Stat Soc A, 166:217–223, 2003.
P. Hall and H. Miller. Using the bootstrap to quantify the authority of an empirical ranking. Ann Stat, 37(6B):3929–3959, 2009.
T. L. Hankins. Blood, dirt, and nomograms. Chance, 13(1):26–37, 2000.
F. E. Harrell. Comparison of strategies for validating binary logistic regression models. Unpublished manuscript, 1991.
M. Julien and J. A. Hanley. Profile-specific survival estimates: Making reports of clinical trials more patient-relevant. CT, 5:107–115, 2008.
A. C. Justice, K. E. Covinsky, and J. A. Berlin. Assessing the generalizability of prognostic information. Ann Int Med, 130:515–524, 1999.
J. Karvanen and F. E. Harrell. Visualizing covariates in proportional hazards model. Stat Med, 28:1957–1966, 2009.
M. W. Kattan and J. Marasco. What is a real nomogram? Sem Onc, 37(1): 23–26, Feb. 2010.
V. Kipnis. Relevancy criterion for discriminating among alternative model specifications. In K. Berk and L. Malone, editors, Proceedings of the 21st Symposium on the Interface between Computer Science and Statistics, pages 376–381, Alexandria, VA, 1989. American Statistical Association.
K. Larsen and J. Merlo. Appropriate assessment of neighborhood effects on individual health: integrating random and fixed effects in multilevel logistic regression. American journal of epidemiology, 161(1):81–88, Jan. 2005.
M. A. H. Levine, A. I. El-Nahas, and B. Asa. Relative risk and odds ratio data are still portrayed with inappropriate scales in the medical literature. J Clin Epi, 63:1045–1047, 2010.
K. Linnet. Assessing diagnostic tests by a strictly proper scoring rule. Stat Med, 8:609–618, 1989.
J. Lubsen, J. Pool, and E. van der Does. A practical device for the application of a diagnostic or prognostic function. Meth Info Med, 17:127–129, 1978.
D. Paul, E. Bair, T. Hastie, and R. Tibshirani. “Preconditioning” for feature selection and regression in high-dimensional problems. Ann Stat, 36(4):1595–1619, 2008.
R. R. Picard and K. N. Berk. Data splitting. Am Statistician, 44:140–147, 1990.
R. R. Picard and R. D. Cook. Cross-validation of regression models. J Am Stat Assoc, 79:575–583, 1984.
E. B. Roecker. Prediction error and its estimation for subset-selected models. Technometrics, 33:459–468, 1991.
W. Sauerbrei and M. Schumacher. A bootstrap resampling procedure for model building: Application to the Cox regression model. Stat Med, 11:2093–2109, 1992.
J. Shao. Linear model selection by cross-validation. J Am Stat Assoc, 88:486–494, 1993.
D. J. Spiegelhalter. Probabilistic prediction in patient management and clinical trials. Stat Med, 5:421–433, 1986.
E. W. Steyerberg, S. E. Bleeker, H. A. Moll, D. E. Grobbee, and K. G. M. Moons. Internal and external validation of predictive models: A simulation study of bias and precision in small samples. Journal of Clinical Epi, 56(5):441–447, May 2003.
E. W. Steyerberg, F. E. Harrell, G. J. J. M. Borsboom, M. J. C. Eijkemans, Y. Vergouwe, and J. D. F. Habbema. Internal validation of predictive models: Efficiency of some procedures for logistic regression analysis. J Clin Epi, 54:774–781, 2001.
R. Tibshirani and K. Knight. The covariance inflation criterion for adaptive model selection. J Roy Stat Soc B, 61:529–546, 1999.
M. J. van Gorp, E. W. Steyerberg, M. Kallewaard, and Y. var der Graaf. Clinical prediction rule for 30-day mortality in Björk-Shiley convexo-concave valve replacement. J Clin Epi, 56:1006–1012, 2003.
H. C. van Houwelingen and J. Thorogood. Construction, validation and updating of a prognostic model for kidney graft survival. Stat Med, 14:1999–2008, 1995.
J. C. van Houwelingen and S. le Cessie. Predictive value of statistical models. Stat Med, 9:1303–1325, 1990.
Y. Vergouwe, E. W. Steyerberg, M. J. C. Eijkemans, and J. D. F. Habbema. Substantial effective sample sizes were required for external validation studies of predictive logistic regression models. J Clin Epi, 58:475–483, 2005.
P. J. M. Verweij and H. C. van Houwelingen. Cross-validation in survival analysis. Stat Med, 12:2305–2314, 1993.
C. F. J. Wu. Jackknife, bootstrap and other resampling methods in regression analysis. Ann Stat, 14(4):1261–1350, 1986.
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Harrell, F.E. (2015). Describing, Resampling, Validating, and Simplifying the Model. In: Regression Modeling Strategies. Springer Series in Statistics. Springer, Cham. https://doi.org/10.1007/978-3-319-19425-7_5
Download citation
DOI: https://doi.org/10.1007/978-3-319-19425-7_5
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-19424-0
Online ISBN: 978-3-319-19425-7
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)