Multivariable Modeling Strategies

Harrell, Frank E.

doi:10.1007/978-3-319-19425-7_4

Frank E. Harrell Jr.⁸

Part of the book series: Springer Series in Statistics ((SSS))

207k Accesses
51 Citations

Abstract

Chapter 2 dealt with aspects of modeling such as transformations of predictors, relaxing linearity assumptions, modeling interactions, and examining lack of fit. Chapter 3 dealt with missing data, focusing on utilization of incomplete predictor information. All of these areas are important in the overall scheme of model development, and they cannot be separated from what is to follow. In this chapter we concern ourselves with issues related to the whole model, with emphasis on deciding on the amount of complexity to allow in the model and on dealing with large numbers of predictors. The chapter concludes with three default modeling strategies depending on whether the goal is prediction, estimation, or hypothesis testing.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 84.99; Price excludes VAT (USA)

Hardcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Even then, the two blood pressures may need to be transformed to meet distributional assumptions.
2.
Shrinkage (penalized estimation) is a general solution (see Section 4.5). One can always use complex models that are “penalized towards simplicity,” with the amount of penalization being greater for smaller sample sizes.
3.
One can also perform a joint test of all parameters associated with nonlinear effects. This can be useful in demonstrating to the reader that some complexity was actually needed.
4.
Lockhart et al. 425 provide an example with n = 100 and 10 orthogonal predictors where all true βs are zero. The test statistic for the first variable to enter has type I error of 0.39 when the nominal α is set to 0.05, in line with what one would expect with multiple testing using \(1 - 0.95^{10} = 0.40\).
5.
AIC works successfully when the models being entertained are on a progression defined by a single parameter, e.g. a common shrinkage coefficient or the single number of knots to be used by all continuous predictors. AIC can also work when the model that is best by AIC is much better than the runner-up so that if the process were bootstrapped the same model would almost always be found. When used for one variable at a time variable selection. AIC is just a restatement of the P-value, and as such, doesn’t solve the severe problems with stepwise variable selection other than forcing us to use slightly more sensible α values. Burnham and Anderson 84 recommend selection based on AIC for a limited number of theoretically well-founded models. Some statisticians try to deal with multiplicity problems caused by stepwise variable selection by making α smaller than 0.05. This increases bias by giving variables whose effects are estimated with error a greater relative chance of being selected. Variable selection does not compete well with shrinkage methods that simultaneously model all potential predictors.
6.
This is akin to doing a t-test to compare the two treatments (out of 10, say) that are apparently most different from each other.
7.
These are situations where the true R ² is low, unlike tightly controlled experiments and mechanistic models where signal:noise ratios can be quite high. In those situations, many parameters can be estimated from small samples, and the \(\frac{m} {15}\) rule of thumb can be significantly relaxed.
8.
See [487]. If one considers the power of a two-sample binomial test compared with a Wilcoxon test if the response could be made continuous and the proportional odds assumption holds, the effective sample size for a binary response is 3n ₁ n ₂∕n ≈ 3min(n ₁, n ₂) if n ₁∕n is near 0 or 1 [664, Eq. 10, 15]. Here n ₁ and n ₂ are the marginal frequencies of the two response levels.
9.
Based on the power of a proportional odds model two-sample test when the marginal cell sizes for the response are n ₁, …, n _k, compared with all cell sizes equal to unity (response is continuous) [664, Eq, 3]. If all cell sizes are equal, the relative efficiency of having k response categories compared with a continuous response is \(1 - 1/k^{2}\) [664, Eq. 14]; for example, a five-level response is almost as efficient as a continuous one if proportional odds holds across category cutoffs.
10.
This is approximate, as the effective sample size may sometimes be boosted somewhat by censored observations, especially for non-proportional hazards methods such as Wilcoxon-type tests. 49
11.
An even more stringent assessment is obtained by stratifying calibration curves by predictor settings.
12.
It is interesting that researchers are quite comfortable with adjusting P-values for post hoc selection of comparisons using, for example, the Bonferroni inequality, but they do not realize that post hoc selection of comparisons also biases point estimates.
13.
There is an option to force continuous variables to be linear when they are being predicted.
14.
If one were to estimate transformations without removing observations that had these constants inserted for the current Y -variable, the resulting transformations would likely have a spike at Y = imputation constant.
15.
Study to Understand Prognoses Preferences Outcomes and Risks of Treatments
16.
Whether this statistic should be used to change the model is problematic in view of model uncertainty.
17.
The R function score.binary in the Hmisc package (see Section 6.2) assists in computing a summary variable from the series of binary conditions.

References

D. G. Altman and P. K. Andersen. Bootstrap investigation of the stability of a Cox regression model. Stat Med, 8:771–783, 1989.
Google Scholar
A. C. Atkinson. A note on the generalized information criterion for choice of a model. Biometrika, 67:413–418, 1980.
Google Scholar
P. C. Austin. Bootstrap model selection had similar performance for selecting authentic and noise variables compared to backward variable elimination: a simulation study. J Clin Epi, 61:1009–1017, 2008.
Google Scholar
D. A. Belsley. Conditioning Diagnostics: Collinearity and Weak Data in Regression. Wiley, New York, 1991.
MATH Google Scholar
D. A. Belsley, E. Kuh, and R. E. Welsch. Regression Diagnostics: Identifying Influential Data and Sources of Collinearity. Wiley, New York, 1980.
MATH Google Scholar
J. K. Benedetti, P. Liu, H. N. Sather, J. Seinfeld, and M. A. Epton. Effective sample size for tests of censored survival data. Biometrika, 69:343–349, 1982.
MathSciNet Google Scholar
L. Breiman. The little bootstrap and other methods for dimensionality selection in regression: X-fixed prediction error. J Am Stat Assoc, 87:738–754, 1992.
MathSciNet MATH Google Scholar
L. Breiman and J. H. Friedman. Estimating optimal transformations for multiple regression and correlation (with discussion). J Am Stat Assoc, 80:580–619, 1985.
MathSciNet MATH Google Scholar
K. P. Burnham and D. R. Anderson. Model Selection and Multimodel Inference: A Practical Information-Theoretic Approach. Springer, 2nd edition, Dec. 2003.
Google Scholar
C. Chatfield. Avoiding statistical pitfalls (with discussion). Statistical Sci, 6:240–268, 1991.
Google Scholar
C. Chatfield. Model uncertainty, data mining and statistical inference (with discussion). J Roy Stat Soc A, 158:419–466, 1995.
MATH Google Scholar
S. Chatterjee and A. S. Hadi. Regression Analysis by Example. Wiley, New York, fifth edition, 2012.
Google Scholar
F. Chiaromonte, R. D. Cook, and B. Li. Sufficient dimension reduction in regressions with categorical predictors. Appl Stat, 30:475–497, 2002.
MathSciNet MATH Google Scholar
A. Ciampi, J. Thiffault, J. P. Nakache, and B. Asselain. Stratification by stepwise regression, correspondence analysis and recursive partition. Comp Stat Data Analysis, 1986:185–204, 1986.
Google Scholar
N. R. Cook. Use and misues of the receiver operating characteristic curve in risk prediction. Circulation, 115:928–935, 2007.
Google Scholar
R. D. Cook. Fisher Lecture:Dimension reduction in regression. Statistical Sci, 22:1–26, 2007.
MATH Google Scholar
R. D. Cook and L. Forzani. Principal fitted components for dimension reduction in regression. Statistical Sci, 23(4):485–501, 2008.
MathSciNet Google Scholar
J. B. Copas. Regression, prediction and shrinkage (with discussion). J Roy Stat Soc B, 45:311–354, 1983.
MathSciNet MATH Google Scholar
J. B. Copas and T. Long. Estimating the residual variance in orthogonal regression with variable selection. The Statistician, 40:51–59, 1991.
Google Scholar
N. J. Crichton and J. P. Hinde. Correspondence analysis as a screening method for indicants for clinical diagnosis. Stat Med, 8:1351–1362, 1989.
Google Scholar
E. E. Cureton and R. B. D’Agostino. Factor Analysis, An Applied Approach. Erlbaum, Hillsdale, NJ, 1983.
MATH Google Scholar
R. B. D’Agostino, A. J. Belanger, E. W. Markson, M. Kelly-Hayes, and P. A. Wolf. Development of health risk appraisal functions in the presence of multiple indicators: The Framingham Study nursing home institutionalization model. Stat Med, 14:1757–1770, 1995.
Google Scholar
C. E. Davis, J. E. Hyde, S. I. Bangdiwala, and J. J. Nelson. An example of dependencies among variables in a conditional logistic regression. In S. H. Moolgavkar and R. L. Prentice, editors, Modern Statistical Methods in Chronic Disease Epi, pages 140–147. Wiley, New York, 1986.
Google Scholar
A. C. Davison and D. V. Hinkley. Bootstrap Methods and Their Application. Cambridge University Press, Cambridge, 1997.
MATH Google Scholar
J. de Leeuw and P. Mair. Gifi methods for optimal scaling in r: The package homals. J Stat Software, 31(4):1–21, Aug. 2009.
MATH Google Scholar
S. Derksen and H. J. Keselman. Backward, forward and stepwise automated subset selection algorithms: Frequency of obtaining authentic and noise variables. British J Math Stat Psych, 45:265–282, 1992.
Google Scholar
B. Efron. Estimating the error rate of a prediction rule: Improvement on cross-validation. J Am Stat Assoc, 78:316–331, 1983.
MathSciNet MATH Google Scholar
B. Efron. How biased is the apparent error rate of a prediction rule? J Am Stat Assoc, 81:461–470, 1986.
MathSciNet MATH Google Scholar
B. Efron and C. Morris. Stein’s paradox in statistics. Sci Am, 236(5):119–127, 1977.
Google Scholar
B. Efron and R. Tibshirani. Bootstrap methods for standard errors, confidence intervals, and other measures of statistical accuracy. Statistical Sci, 1:54–77, 1986.
MathSciNet Google Scholar
B. Efron and R. Tibshirani. An Introduction to the Bootstrap. Chapman and Hall, New York, 1993.
MATH Google Scholar
J. J. Faraway. The cost of data analysis. J Comp Graph Stat, 1:213–229, 1992.
Google Scholar
L. Ferré. Determining the dimension in sliced inverse regression and related methods. J Am Stat Assoc, 93:132–149, 1998.
MATH Google Scholar
J. H. Friedman. A variable span smoother. Technical Report 5, Laboratory for Computational Statistics, Department of Statistics, Stanford University, 1984.
Google Scholar
L. Friedman and M. Wall. Graphical views of suppression and multicollinearity in multiple linear regression. Am Statistician, 59:127–136, 2005.
MathSciNet Google Scholar
J. H. Giudice, J. R. Fieberg, and M. S. Lenarz. Spending degrees of freedom in a poor economy: A case study of building a sightability model for moose in northeastern minnesota. J Wildlife Manage, 2011.
Google Scholar
S. A. Glantz and B. K. Slinker. Primer of Applied Regression and Analysis of Variance. McGraw-Hill, New York, 1990.
Google Scholar
H. H. H. Göring, J. D. Terwilliger, and J. Blangero. Large upward bias in estimation of locus-specific effects from genomewide scans. Am J Hum Gen, 69:1357–1369, 2001.
Google Scholar
P. M. Grambsch and P. C. O’Brien. The effects of transformations and preliminary tests for non-linearity in regression. Stat Med, 10:697–709, 1991.
Google Scholar
R. J. Gray. Flexible methods for analyzing survival data using splines, with applications to breast cancer prognosis. J Am Stat Assoc, 87:942–951, 1992.
Google Scholar
M. J. Greenacre. Correspondence analysis of multivariate categorical data by weighted least-squares. Biometrika, 75:457–467, 1988.
MathSciNet MATH Google Scholar
S. Greenland. When should epidemiologic regressions use random coefficients? Biometrics, 56:915–921, 2000.
MATH Google Scholar
J. Guo, G. James, E. Levina, G. Michailidis, and J. Zhu. Principal component analysis with sparse fused loadings. J Comp Graph Stat, 19(4):930–946, 2011.
MathSciNet Google Scholar
P. Hall and H. Miller. Using generalized correlation to effect variable selection in very high dimensional problems. J Comp Graph Stat, 18(3):533–550, 2009.
MathSciNet Google Scholar
F. E. Harrell. The LOGIST Procedure. In SUGI Supplemental Library Users Guide, pages 269–293. SAS Institute, Inc., Cary, NC, Version 5 edition, 1986.
Google Scholar
F. E. Harrell, K. L. Lee, R. M. Califf, D. B. Pryor, and R. A. Rosati. Regression modeling strategies for improved prognostic prediction. Stat Med, 3:143–152, 1984.
Google Scholar
F. E. Harrell, K. L. Lee, and D. B. Mark. Multivariable prognostic models: Issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Stat Med, 15:361–387, 1996.
Google Scholar
F. E. Harrell, K. L. Lee, D. B. Matchar, and T. A. Reichert. Regression models for prognostic prediction: Advantages, problems, and suggested solutions. Ca Trt Rep, 69:1071–1077, 1985.
Google Scholar
F. E. Harrell, P. A. Margolis, S. Gove, K. E. Mason, E. K. Mulholland, D. Lehmann, L. Muhe, S. Gatchalian, and H. F. Eichenwald. Development of a clinical prediction model for an ordinal outcome: The World Health Organization ARI Multicentre Study of clinical signs and etiologic agents of pneumonia, sepsis, and meningitis in young infants. Stat Med, 17:909–944, 1998.
Google Scholar
T. J. Hastie and R. J. Tibshirani. Generalized Additive Models. Chapman & Hall/CRC, Boca Raton, FL, 1990. ISBN 9780412343902.
MATH Google Scholar
X. He and L. Shen. Linear regression after spline transformation. Biometrika, 84:474–481, 1997.
MathSciNet MATH Google Scholar
J. Hilden and T. A. Gerds. A note on the evaluation of novel biomarkers: do not rely on integrated discrimination improvement and net reclassification index. Statist. Med., 33(19):3405–3414, Aug. 2014.
MathSciNet Google Scholar
W. Hoeffding. A non-parametric test of independence. Ann Math Stat, 19:546–557, 1948.
MathSciNet Google Scholar
C. M. Hurvich and C. L. Tsai. The impact of model selection on inference in linear regression. Am Statistician, 44:214–217, 1990.
Google Scholar
J. E. Jackson. A User’s Guide to Principal Components. Wiley, New York, 1991.
MATH Google Scholar
I. T. Jolliffe. Discarding variables in a principal component analysis. I. Artificial data. Appl Stat, 21:160–173, 1972.
Google Scholar
I. T. Jolliffe. Principal Component Analysis. Springer-Verlag, New York, second edition, 2010.
Google Scholar
R. E. Kass and A. E. Raftery. Bayes factors. J Am Stat Assoc, 90:773–795, 1995.
MATH Google Scholar
H. J. Keselman, J. Algina, R. K. Kowalchuk, and R. D. Wolfinger. A comparison of two approaches for selecting covariance structures in the analysis of repeated measurements. Comm Stat - Sim Comp, 27:591–604, 1998.
Google Scholar
W. A. Knaus, F. E. Harrell, J. Lynn, L. Goldman, R. S. Phillips, A. F. Connors, N. V. Dawson, W. J. Fulkerson, R. M. Califf, N. Desbiens, P. Layde, R. K. Oye, P. E. Bellamy, R. B. Hakim, and D. P. Wagner. The SUPPORT prognostic model: Objective estimates of survival for seriously ill hospitalized adults. Ann Int Med, 122:191–203, 1995.
Google Scholar
W. F. Kuhfeld. The PRINQUAL procedure. In SAS/STAT 9.2 User’s Guide. SAS Publishing, Cary, NC, second edition, 2009.
Google Scholar
J. F. Lawless and K. Singhal. Efficient screening of nonnormal regression models. Biometrics, 34:318–327, 1978.
Google Scholar
S. le Cessie and J. C. van Houwelingen. Ridge estimators in logistic regression. Appl Stat, 41:191–201, 1992.
MATH Google Scholar
M. LeBlanc and R. Tibshirani. Adaptive principal surfaces. J Am Stat Assoc, 89:53–64, 1994.
MATH Google Scholar
A. Leclerc, D. Luce, F. Lert, J. F. Chastang, and P. Logeay. Correspondence analysis and logistic modelling: Complementary use in the analysis of a health survey among nurses. Stat Med, 7:983–995, 1988.
Google Scholar
S. Lee, J. Z. Huang, and J. Hu. Sparse logistic principal components analysis for binary data. Ann Appl Stat, 4(3):1579–1601, 2010.
MathSciNet MATH Google Scholar
K. Li, J. Wang, and C. Chen. Dimension reduction for censored regression data. Ann Stat, 27:1–23, 1999.
MathSciNet MATH Google Scholar
K. C. Li. Sliced inverse regression for dimension reduction. J Am Stat Assoc, 86:316–327, 1991.
MATH Google Scholar
R. Lockhart, J. Taylor, R. J. Tibshirani, and R. Tibshirani. A significance test for the lasso. Technical report, arXiv, 2013.
Google Scholar
X. Luo, L. A. Stfanski, and D. D. Boos. Tuning variable selection procedures by adding noise. Technometrics, 48:165–175, 2006.
MathSciNet MATH Google Scholar
N. Mantel. Why stepdown procedures in variable selection. Technometrics, 12:621–625, 1970.
Google Scholar
G. Marshall, F. L. Grover, W. G. Henderson, and K. E. Hammermeister. Assessment of predictive models for binary outcomes: An empirical approach using operative death from cardiac surgery. Stat Med, 13:1501–1511, 1994.
Google Scholar
J. M. Massaro. Battery Reduction. 2005.
Google Scholar
G. P. McCabe. Principal variables. Technometrics, 26:137–144, 1984.
MathSciNet MATH Google Scholar
N. Meinshausen. Hierarchical testing of variable importance. Biometrika, 95(2):265–278, 2008.
MathSciNet MATH Google Scholar
G. Michailidis and J. de Leeuw. The Gifi system of descriptive multivariate analysis. Statistical Sci, 13:307–336, 1998.
MATH Google Scholar
R. H. Myers. Classical and Modern Regression with Applications. PWS-Kent, Boston, 1990.
Google Scholar
T. G. Nick and J. M. Hardin. Regression modeling strategies: An illustrative case study from medical rehabilitation outcomes research. Am J Occ Ther, 53:459–470, 1999.
Google Scholar
P. Peduzzi, J. Concato, A. R. Feinstein, and T. R. Holford. Importance of events per independent variable in proportional hazards regression analysis. II. Accuracy and precision of regression estimates. J Clin Epi, 48:1503–1510, 1995.
Google Scholar
P. Peduzzi, J. Concato, E. Kemper, T. R. Holford, and A. R. Feinstein. A simulation study of the number of events per variable in logistic regression analysis. J Clin Epi, 49:1373–1379, 1996.
Google Scholar
N. Peek, D. G. T. Arts, R. J. Bosman, P. H. J. van der Voort, and N. F. de Keizer. External validation of prognostic models for critically ill patients required substantial sample sizes. J Clin Epi, 60:491–501, 2007.
MATH Google Scholar
M. J. Pencina, R. B. D’Agostino, and O. V. Demler. Novel metrics for evaluating improvement in discrimination: net reclassification and integrated discrimination improvement for normal variables and nested models. Stat Med, 31(2):101–113, 2012.
MathSciNet Google Scholar
M. J. Pencina, R. B. D’Agostino, and E. W. Steyerberg. Extensions of net reclassification improvement calculations to measure usefulness of new biomarkers. Stat Med, 30:11–21, 2011.
MathSciNet Google Scholar
M. J. Pencina, R. B. D’Agostino Sr, R. B. D’Agostino Jr, and R. S. Vasan. Evaluating the added predictive ability of a new marker: From area under the ROC curve to reclassification and beyond. Stat Med, 27:157–172, 2008.
MathSciNet MATH Google Scholar
A. N. Phillips, S. G. Thompson, and S. J. Pocock. Prognostic scores for detecting a high risk group: Estimating the sensitivity when applied to new data. Stat Med, 9:1189–1198, 1990.
Google Scholar
E. B. Roecker. Prediction error and its estimation for subset-selected models. Technometrics, 33:459–468, 1991.
Google Scholar
W. Sarle. The VARCLUS procedure. In SAS/STAT User’s Guide, volume 2, chapter 43, pages 1641–1659. SAS Institute, Inc., Cary, NC, fourth edition, 1990.
Google Scholar
W. Sauerbrei and M. Schumacher. A bootstrap resampling procedure for model building: Application to the Cox regression model. Stat Med, 11:2093–2109, 1992.
Google Scholar
J. Shao. Linear model selection by cross-validation. J Am Stat Assoc, 88:486–494, 1993.
Google Scholar
X. Shen, H. Huang, and J. Ye. Inference after model selection. J Am Stat Assoc, 99:751–762, 2004.
MathSciNet MATH Google Scholar
L. R. Smith, F. E. Harrell, and L. H. Muhlbaier. Problems and potentials in modeling survival. In M. L. Grady and H. A. Schwartz, editors, Medical Effectiveness Research Data Methods (Summary Report), AHCPR Pub. No. 92-0056, pages 151–159. US Dept. of Health and Human Services, Agency for Health Care Policy and Research, Rockville, MD, 1992.
Google Scholar
I. Spence and R. F. Garrison. A remarkable scatterplot. Am Statistician, 47:12–19, 1993.
Google Scholar
D. J. Spiegelhalter. Probabilistic prediction in patient management and clinical trials. Stat Med, 5:421–433, 1986.
MATH Google Scholar
E. W. Steyerberg, M. J. C. Eijkemans, F. E. Harrell, and J. D. F. Habbema. Prognostic modelling with logistic regression analysis: A comparison of selection and estimation methods in small data sets. Stat Med, 19:1059–1079, 2000.
Google Scholar
E. W. Steyerberg, M. J. C. Eijkemans, F. E. Harrell, and J. D. F. Habbema. Prognostic modeling with logistic regression analysis: In search of a sensible strategy in small data sets. Med Decis Mak, 21:45–56, 2001.
Google Scholar
E. W. Steyerberg, A. J. Vickers, N. R. Cook, T. Gerds, M. Gonen, N. Obuchowski, M. J. Pencina, and M. W. Kattan. Assessing the performance of prediction models: a framework for traditional and novel measures. Epi (Cambridge, Mass.), 21(1):128–138, Jan. 2010.
Google Scholar
G. Sun, T. L. Shook, and G. L. Kay. Inappropriate use of bivariable analysis to screen risk factors for use in multivariable analysis. J Clin Epi, 49:907–916, 1996.
Google Scholar
J. M. G. Taylor, A. L. Siqueira, and R. E. Weiss. The cost of adding parameters to a model. J Roy Stat Soc B, 58:593–607, 1996.
MathSciNet MATH Google Scholar
R. Tibshirani. Regression shrinkage and selection via the lasso. J Roy Stat Soc B, 58:267–288, 1996.
MathSciNet Google Scholar
R. Tibshirani. The lasso method for variable selection in the Cox model. Stat Med, 16:385–395, 1997.
Google Scholar
T. van der Ploeg, P. C. Austin, and E. W. Steyerberg. Modern modelling techniques are data hungry: a simulation study for predicting dichotomous endpoints. BMC Medical Research Methodology, 14(1):137+, Dec. 2014.
Google Scholar
H. C. van Houwelingen and J. Thorogood. Construction, validation and updating of a prognostic model for kidney graft survival. Stat Med, 14:1999–2008, 1995.
Google Scholar
J. C. van Houwelingen and S. le Cessie. Predictive value of statistical models. Stat Med, 9:1303–1325, 1990.
Google Scholar
W. N. Venables and B. D. Ripley. Modern Applied Statistics with S-Plus. Springer-Verlag, New York, third edition, 1999.
Google Scholar
P. Verweij and H. C. van Houwelingen. Penalized likelihood in Cox regression. Stat Med, 13:2427–2436, 1994.
Google Scholar
P. J. M. Verweij and H. C. van Houwelingen. Cross-validation in survival analysis. Stat Med, 12:2305–2314, 1993.
Google Scholar
S. K. Vines. Simple principal components. Appl Stat, 49:441–451, 2000.
MathSciNet MATH Google Scholar
E. Vittinghoff and C. E. McCulloch. Relaxing the rule of ten events per variable in logistic and Cox regression. Am J Epi, 165:710–718, 2006.
Google Scholar
A. Wang and E. A. Gehan. Gene selection for microarray data analysis using principal component analysis. Stat Med, 24:2069–2087, 2005.
MathSciNet Google Scholar
Y. Wax. Collinearity diagnosis for a relative risk regression analysis: An application to assessment of diet-cancer relationship in epidemiological studies. Stat Med, 11:1273–1287, 1992.
Google Scholar
R. E. Weiss. The influence of variable selection: A Bayesian diagnostic perspective. J Am Stat Assoc, 90:619–625, 1995.
MATH Google Scholar
J. Whitehead. Sample size calculations for ordered categorical data. Stat Med, 12:2257–2271, 1993. See letter to editor SM 15:1065-6 for binary case;see errata in SM 13:871 1994;see kol95com, jul96sam.
Google Scholar
R. E. Wiegand. Performance of using multiple stepwise algorithms for variable selection. Stat Med, 29:1647–1659, 2010.
MathSciNet Google Scholar
S. N. Wood. Generalized Additive Models: An Introduction with R. Chapman & Hall/CRC, Boca Raton, FL, 2006. ISBN 9781584884743.
Google Scholar
F. W. Young, Y. Takane, and J. de Leeuw. The principal components of mixed measurement level multivariate data: An alternating least squares method with optimal scaling features. Psychometrika, 43:279–281, 1978.
MATH Google Scholar
H. Zhou, T. Hastie, and R. Tibshirani. Sparse principal component analysis. J Comp Graph Stat, 15:265–286, 2006.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Biostatistics, School of Medicine Vanderbilt University, Nashville, TN, USA
Frank E. Harrell Jr.

Authors

Frank E. Harrell Jr.
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Harrell, F.E. (2015). Multivariable Modeling Strategies. In: Regression Modeling Strategies. Springer Series in Statistics. Springer, Cham. https://doi.org/10.1007/978-3-319-19425-7_4

Download citation

DOI: https://doi.org/10.1007/978-3-319-19425-7_4
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-19424-0
Online ISBN: 978-3-319-19425-7
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics