Abstract
In ordinary least squares multiple regression, the objective in fitting a model is to find the values of the unknown parameters that minimize the sum of squared errors of prediction. When the response variable is non-normal, polytomous, or not observed completely, one needs a more general objective function to optimize.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
In linear regression, a t distribution is used to penalize for the fact that the variance of Y | X is estimated. In models such as the logistic model, there is no separate variance parameter to estimate. Gould has done simulations that show that the normal distribution provides more accurate P-values than the t for binary logistic regression.
- 2.
For example, in a 3-treatment comparison one could examine contrasts between treatments A and B, A and C, and B and C by obtaining predicted values for those treatments, even though only two differences are required.
- 3.
The rms command could be contrast(fit, list(sex=’male’,age=30), list(sex=’female’,age=40)) where all other predictors are set to medians or modes.
- 4.
This is the basis for confidence limits computed by the R rms package’s Predict , summary , and contrast functions. When the robcov function has been used to replace the information-matrix-based covariance matrix with a Huber robust covariance estimate with an optional cluster sampling correction, the functions are using a “robust” Wald statistic basis. When the bootcov function has been used to replace the model fit’s covariance matrix with a bootstrap unconditional covariance matrix estimate, the two functions are computing confidence limits based on a normal distribution but using more nonparametric covariance estimates.
- 5.
As indicated below, this standard deviation can also be obtained by using the summary function on the object returned by bootcov , as bootcov returns a fit object like one from lrm except with the bootstrap covariance matrix substituted for the information-based one.
- 6.
Limited simulations using the conditional bootstrap and Firth’s penalized likelihood 281 did not show significant improvement in confidence interval coverage.
- 7.
Several examples from simulated datasets have shown that using BIC to choose a penalty results in far too much shrinkage.
References
O. O. Al-Radi, F. E. Harrell, C. A. Caldarone, B. W. McCrindle, J. P. Jacobs, M. G. Williams, G. S. Van Arsdell, and W. G. Williams. Case complexity scores in congenital heart surgery: A comparative study of the Aristotal Basic Complexity score and the Risk Adjustment in Congenital Heart Surg (RACHS-1) system. J Thorac Cardiovasc Surg, 133:865–874, 2007.
J. M. Alho. On the computation of likelihood ratio and score test based confidence intervals in generalized linear models. Stat Med, 11:923–930, 1992.
A. C. Atkinson. A note on the generalized information criterion for choice of a model. Biometrika, 67:413–418, 1980.
D. A. Binder. Fitting Cox’s proportional hazards models from survey data. Biometrika, 79:139–147, 1992.
D. D. Boos. On generalized score tests. Ann Math Stat, 46:327–333, 1992.
A. R. Brazzale and A. C. Davison. Accurate parametric inference for small samples. Statistical Sci, 23(4):465–484, 2008.
L. Breiman. The little bootstrap and other methods for dimensionality selection in regression: X-fixed prediction error. J Am Stat Assoc, 87:738–754, 1992.
S. T. Buckland, K. P. Burnham, and N. H. Augustin. Model selection: An integral part of inference. Biometrics, 53:603–618, 1997.
R. M. Califf, H. R. Phillips, and Others. Prognostic value of a coronary artery jeopardy score. J Am College Cardiol, 5:1055–1063, 1985.
J. Carpenter and J. Bithell. Bootstrap confidence intervals: when, which, what? A practical guide for medical statisticians. Stat Med, 19:1141–1164, 2000.
L. E. Chambless and K. E. Boyle. Maximum likelihood methods for complex sample data: Logistic regression and discrete proportional hazards models. Comm Stat A, 14:1377–1392, 1985.
C. Chatfield. Model uncertainty, data mining and statistical inference (with discussion). J Roy Stat Soc A, 158:419–466, 1995.
D. Collett. Modelling Binary Data. Chapman and Hall, London, second edition, 2002.
D. R. Cox. Further results on tests of separate families of hypotheses. J Roy Stat Soc B, 24:406–424, 1962.
D. R. Cox. Regression models and life-tables (with discussion). J Roy Stat Soc B, 34:187–220, 1972.
D. R. Cox and E. J. Snell. The Analysis of Binary Data. Chapman and Hall, London, second edition, 1989.
D. R. Cox and N. Wermuth. A comment on the coefficient of determination for binary responses. Am Statistician, 46:1–4, 1992.
J. G. Cragg and R. Uhler. The demand for automobiles. Canadian Journal of Economics, 3:386–406, 1970.
T. DiCiccio and B. Efron. More accurate confidence intervals in exponential families. Biometrika, 79:231–245, 1992.
N. Doganaksoy and J. Schmee. Comparisons of approximate confidence intervals for distributions used in life-data analysis. Technometrics, 35:175–184, 1993.
M. Drum and P. McCullagh. Comment on regression models for discrete longitudinal responses by G. M. Fitzmaurice, N. M. Laird, and A. G. Rotnitzky. Stat Sci, 8:300–301, 1993.
B. Efron and R. Tibshirani. Bootstrap methods for standard errors, confidence intervals, and other measures of statistical accuracy. Statistical Sci, 1:54–77, 1986.
B. Efron and R. Tibshirani. An Introduction to the Bootstrap. Chapman and Hall, New York, 1993.
Z. Feng, D. McLerran, and J. Grizzle. A comparison of statistical methods for clustered data analysis with Gaussian error. Stat Med, 15:1793–1806, 1996.
G. M. Fitzmaurice. A caveat concerning independence estimating equations with multivariate binary data. Biometrics, 51:309–317, 1995.
Fox, John. Bootstrapping Regression Models: An Appendix to An R and S-PLUS Companion to Applied Regression, 2002.
D. A. Freedman. On the so-called “Huber sandwich estimator” and “robust standard errors”. Am Statistician, 60:299–302, 2006.
J. H. Friedman. A variable span smoother. Technical Report 5, Laboratory for Computational Statistics, Department of Statistics, Stanford University, 1984.
R. Goldstein. The comparison of models in discrimination cases. Jurimetrics J, 34:215–234, 1994.
W. Gould. Confidence intervals in logit and probit models. Stata Tech Bull, STB-14:26–28, July 1993. http://www.stata.com/products/stb/journals/stb14.pdf.
B. I. Graubard and E. L. Korn. Regression analysis with clustered data. Stat Med, 13:509–522, 1994.
R. J. Gray. Flexible methods for analyzing survival data using splines, with applications to breast cancer prognosis. J Am Stat Assoc, 87:942–951, 1992.
S. Greenland. When should epidemiologic regressions use random coefficients? Biometrics, 56:915–921, 2000.
F. E. Harrell and K. L. Lee. A comparison of the discrimination of discriminant analysis and logistic regression under multivariate normality. In P. K. Sen, editor, Biostatistics: Statistics in Biomedical, Public Health, and Environmental Sciences. The Bernard G. Greenberg Volume, pages 333–343. North-Holland, Amsterdam, 1985.
W. W. Hauck and A. Donner. Wald’s test as applied to hypotheses in logit analysis. J Am Stat Assoc, 72:851–863, 1977.
G. Heinze and M. Schemper. A solution to the problem of separation in logistic regression. Stat Med, 21(16):2409–2419, 2002.
T. Hothorn, F. Bretz, and P. Westfall. Simultaneous inference in general parametric models. Biometrical J, 50(3):346–363, 2008.
J. Huang and D. Harrington. Penalized partial likelihood regression for right-censored data with bootstrap selection of the penalty parameter. Biometrics, 58:781–791, 2002.
P. J. Huber. The behavior of maximum likelihood estimates under nonstandard conditions. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, volume 1: Statistics, pages 221–233. University of California Press, Berkeley, CA, 1967.
C. M. Hurvich and C. Tsai. Regression and time series model selection in small samples. Biometrika, 76:297–307, 1989.
C. M. Hurvich and C. Tsai. Model selection for extended quasi-likelihood models in small samples. Biometrics, 51:1077–1084, 1995.
R. E. Kass and A. E. Raftery. Bayes factors. J Am Stat Assoc, 90:773–795, 1995.
S. Konishi and G. Kitagawa. Information Criteria and Statistical Modeling. Springer, New York, 2008. ISBN 978-0-387-71886-6.
E. L. Korn and B. I. Graubard. Analysis of large health surveys: Accounting for the sampling design. J Roy Stat Soc A, 158:263–295, 1995.
E. L. Korn and B. I. Graubard. Examples of differing weighted and unweighted estimates from a sample survey. Am Statistician, 49:291–295, 1995.
E. L. Korn and R. Simon. Measures of explained variation for survival data. Stat Med, 9:487–503, 1990.
E. L. Korn and R. Simon. Explained residual variation, explained risk, and goodness of fit. Am Statistician, 45:201–206, 1991.
T. P. Lane and W. H. DuMouchel. Simultaneous confidence intervals in multiple regression. Am Statistician, 48:315–321, 1994.
P. W. Laud and J. G. Ibrahim. Predictive model selection. J Roy Stat Soc B, 57:247–262, 1995.
S. le Cessie and J. C. van Houwelingen. Ridge estimators in logistic regression. Appl Stat, 41:191–201, 1992.
E. W. Lee, L. J. Wei, and D. A. Amato. Cox-type regression analysis for large numbers of small groups of correlated failure time observations. In J. P. Klein and P. K. Goel, editors, Survival Analysis: State of the Art, NATO ASI, pages 237–247. Kluwer Academic, Boston, 1992.
K. L. Lee, D. B. Pryor, F. E. Harrell, R. M. Califf, V. S. Behar, W. L. Floyd, J. J. Morris, R. A. Waugh, R. E. Whalen, and R. A. Rosati. Predicting outcome in coronary disease: Statistical models versus expert clinicians. Am J Med, 80:553–560, 1986.
D. Y. Lin. Cox regression analysis of multivariate failure time data: The marginal approach. Stat Med, 13:2233–2247, 1994.
D. Y. Lin. On fitting Cox’s proportional hazards models to survey data. Biometrika, 87:37–47, 2000.
D. Y. Lin and L. J. Wei. The robust inference for the Cox proportional hazards model. J Am Stat Assoc, 84:1074–1078, 1989.
K. Liu and A. R. Dyer. A rank statistic for assessing the amount of variation explained by risk factors in epidemiologic studies. Am J Epi, 109:597–606, 1979.
J. S. Long and L. H. Ervin. Using heteroscedasticity consistent standard errors in the linear regression model. Am Statistician, 54:217–224, 2000.
G. S. Maddala. Limited-Dependent and Qualitative Variables in Econometrics. Cambridge University Press, Cambridge, UK, 1983.
L. Magee. R 2 measures based on Wald and likelihood ratio joint significance tests. Am Statistician, 44:250–253, 1990.
E. Marubini and M. G. Valsecchi. Analyzing Survival Data from Clinical Trials and Observational Studies. Wiley, Chichester, 1995.
W. Q. Meeker and L. A. Escobar. Teaching about approximate confidence regions based on maximum likelihood estimation. Am Statistician, 49:48–53, 1995.
S. Menard. Coefficients of determination for multiple logistic regression analysis. Am Statistician, 54:17–24, 2000.
S. Minkin. Profile-likelihood-based confidence intervals. Appl Stat, 39:125–126, 1990.
M. Mittlböck and M. Schemper. Explained variation for logistic regression. Stat Med, 15:1987–1997, 1996.
K. G. M. Moons, Donders, E. W. Steyerberg, and F. E. Harrell. Penalized maximum likelihood estimation to directly adjust diagnostic and prognostic prediction models for overoptimism: a clinical example. J Clin Epi, 57:1262–1270, 2004.
B. J. T. Morgan, K. J. Palmer, and M. S. Ridout. Negative score test statistic (with discussion). Am Statistician, 61(4):285–295, 2007.
N. J. D. Nagelkerke. A note on a general definition of the coefficient of determination. Biometrika, 78:691–692, 1991.
M. Y. Park and T. Hastie. Penalized logistic regression for detecting gene interactions. Biostat, 9(1):30–50, 2008.
L. W. Pickle. Maximum likelihood estimation in the new computing environment. Stat Comp Graphics News ASA, 2(2):6–15, Nov. 1991.
W. H. Rogers. Regression standard errors in clustered samples. Stata Tech Bull, STB-13:19–23, May 1993. http://www.stata.com/products/stb/journals/stb13.pdf.
P. Royston and S. G. Thompson. Comparing non-nested regression models. Biometrics, 51:114–127, 1995.
S. Sardy. On the practice of rescaling covariates. Int Stat Rev, 76:285–297, 2008.
M. Schemper. The relative importance of prognostic factors in studies of survival. Stat Med, 12:2377–2382, 1993.
M. Schemper and J. Stare. Explained variation in survival analysis. Stat Med, 15:1999–2012, 1996.
G. Schwarz. Estimating the dimension of a model. Ann Stat, 6:461–464, 1978.
A. F. M. Smith and D. J. Spiegelhalter. Bayes factors and choice criteria for linear models. J Roy Stat Soc B, 42:213–220, 1980.
T. M. Therneau, P. M. Grambsch, and T. R. Fleming. Martingale-based residuals for survival models. Biometrika, 77:216–218, 1990.
R. Tibshirani. Regression shrinkage and selection via the lasso. J Roy Stat Soc B, 58:267–288, 1996.
R. Tibshirani and K. Knight. Model search and inference by bootstrap “bumping”. Technical report, Department of Statistics, University of Toronto, 1997. http://www-stat.stanford.edu/tibs. Presented at the Joint Statistical Meetings, Chicago, August 1996.
H. C. van Houwelingen and J. Thorogood. Construction, validation and updating of a prognostic model for kidney graft survival. Stat Med, 14:1999–2008, 1995.
J. C. van Houwelingen and S. le Cessie. Predictive value of statistical models. Stat Med, 9:1303–1325, 1990.
D. J. Venzon and S. H. Moolgavkar. A method for computing profile-likelihood-based confidence intervals. Appl Stat, 37:87–94, 1988.
P. Verweij and H. C. van Houwelingen. Penalized likelihood in Cox regression. Stat Med, 13:2427–2436, 1994.
P. J. M. Verweij and H. C. van Houwelingen. Cross-validation in survival analysis. Stat Med, 12:2305–2314, 1993.
P. J. M. Verweij and H. C. van Houwelingen. Time-dependent effects of fixed covariates in Cox regression. Biometrics, 51:1550–1556, 1995.
Y. Wang and J. M. G. Taylor. Inference for smooth curves in longitudinal data with application to an AIDS clinical trial. Stat Med, 14:1205–1218, 1995.
H. White. A heteroskedasticity-consistent covariance matrix estimator and a direct test for heteroskedasticity. Econometrica, 48:817–838, 1980.
J. Whittaker. Model interpretation from the additive elements of the likelihood function. Appl Stat, 33:52–64, 1984.
A. R. Willan, W. Ross, and T. A. MacKenzie. Comparing in-patient classification systems: A problem of non-nested regression models. Stat Med, 11:1321–1331, 1992.
Y. Xiao and M. Abrahamowicz. Bootstrap-based methods for estimating standard errors in Cox’s regression analyses of clustered event times. Stat Med, 29:915–923, 2010.
B. Zheng and A. Agresti. Summarizing the predictive power of a generalized linear model. Stat Med, 19:1771–1781, 2000.
X. Zheng and W. Loh. Consistent variable selection in linear models. J Am Stat Assoc, 90:151–156, 1995.
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Harrell, F.E. (2015). Overview of Maximum Likelihood Estimation. In: Regression Modeling Strategies. Springer Series in Statistics. Springer, Cham. https://doi.org/10.1007/978-3-319-19425-7_9
Download citation
DOI: https://doi.org/10.1007/978-3-319-19425-7_9
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-19424-0
Online ISBN: 978-3-319-19425-7
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)