Statistical Model Choice

Gerda Claeskens

doi:10.1146/annurev-statistics-041715-033413

Annual Review of Statistics and Its Application

Volume 3, 2016

Review Article

Free

Statistical Model Choice

Gerda Claeskens¹
View Affiliations Hide Affiliations

Affiliations: Research Center ORSTAT and Leuven Statistics Research Center, KU Leuven, B-3000 Leuven, Belgium; email: [email protected]
Vol. 3:233-256 (Volume publication date June 2016) https://doi.org/10.1146/annurev-statistics-041715-033413
First published as a Review in Advance on March 02, 2016
© Annual Reviews

Abstract

Variable selection methods and model selection approaches are valuable statistical tools that are indispensable for almost any statistical modeling question. This review first considers the use of information criteria for model selection. Such criteria provide an ordering of the considered models where the best model is selected. Different modeling goals might require different criteria to be used. Next, the effect of including a penalty in the estimation process is discussed. Third, nonparametric estimation is discussed; it contains several aspects of model choice, such as the choice of the estimator to use and the selection of tuning parameters. Fourth, model averaging approaches are reviewed in which estimators from different models are weighted to provide one final estimator. There are several ways to choose the weights, and most of them result in data-driven, hence random, weights. Challenges for inference after model selection and inference for model-averaged estimators are discussed.

Keyword(s): information criteria, model averaging, model selection, variable selection

Article metrics loading...

/content/journals/10.1146/annurev-statistics-041715-033413

2016-06-01

2024-04-19

Full text loading...

/deliver/fulltext/statistics/3/1/annurev-statistics-041715-033413.html?itemId=/content/journals/10.1146/annurev-statistics-041715-033413&mimeType=html&fmt=ahah

Literature Cited

Akaike H. 1973. Information theory and an extension of the maximum likelihood principle. Second International Symposium on Information Theory B Petrov, F Csáki 267–81 Budapest: Akadémiai Kiadó
Ando T, Li KC. 2014. A model-averaging approach for high-dimensional regression. J. Am. Stat. Assoc. 109:254–65 [Google Scholar]
Autin F, Claeskens G, Freyermuth JM. 2015. Asymptotic performance of projection estimators in standard and hyperbolic wavelet bases. Electron. J. Stat. 9:1852–83 [Google Scholar]
Bartolucci F, Lupparelli M. 2008. Focused information criterion for capture-recapture models for closed populations. Scand. J. Stat. 35:629–49 [Google Scholar]
Behl P, Claeskens G, Dette H. 2013. Focused model selection in quantile regression. Stat. Sin. 24:601–24 [Google Scholar]
Behl P, Dette H, Frondel M, Tauchmann H. 2012. Choice is suffering: a focused information criterion for model selection. Econ. Model. 29:817–22 [Google Scholar]
Belloni A, Chernozhukov V. 2013. Least squares after model selection in high-dimensional sparse models. Bernoulli 19:521–47 [Google Scholar]
Berk R, Brown L, Buja A, Zhang K, Zhao L. 2013. Valid post-selection inference. Ann. Stat. 41:802–37 [Google Scholar]
Bondell HD, Krishna A, Ghosh SK. 2010. Joint variable selection for fixed and random effects in linear mixed-effects models. Biometrics 66:1069–77 [Google Scholar]
Brownlees CT, Gallo GM. 2008. On variable selection for volatility forecasting: the role of focused selection criteria. J. Financ. Econom. 6:513–39 [Google Scholar]
Brownlees CT, Gallo GM. 2011. Shrinkage estimation of semiparametric multiplicative error models. Int. J. Forecast. 27:365–78 [Google Scholar]
Brumback BA, Ruppert D, Wand MP. 1999. Comment on Shively, Kohn and Wood. J. Am. Stat. Assoc. 94:794–97 [Google Scholar]
Buchholz A, Holländer N, Sauerbrei W. 2008. On properties of predictors derived with a two-step bootstrap model averaging approach—a simulation study in the linear regression model. Comput. Stat. Data Anal. 52:2778–93 [Google Scholar]
Burnham KP, Anderson DR. 2002. Model Selection and Multimodel Inference: A Practical Information-Theoretic Approach New York: Springer-Verlag, 2nd ed..
Candes E, Tao T. 2007. The Dantzig selector: statistical estimation when p is much larger than n. Ann. Stat. 35:2313–51 [Google Scholar]
Charkhi A, Claeskens G, Hansen BE. 2016. Minimum mean squared error model averaging in likelihood models. Stat. Sin. In press
Claeskens G, Carroll R. 2007. An asymptotic theory for model selection inference in general semiparametric problems. Biometrika 94:249–65 [Google Scholar]
Claeskens G, Croux C, Van Kerckhoven J. 2006. Variable selection for logistic regression using a prediction focussed information criterion. Biometrics 62:972–79 [Google Scholar]
Claeskens G, Croux C, Van Kerckhoven J. 2007. Prediction focussed model selection for autoregressive models. Aust. N. Z. J. Stat. 49:359–79 [Google Scholar]
Claeskens G, Hjort NL. 2003. The focused information criterion. J. Am. Stat. Assoc. 98:900–16 [Google Scholar]
Claeskens G, Hjort NL. 2008a. Minimising average risk in regression models. Econom. Theory 24:493–527 [Google Scholar]
Claeskens G, Hjort NL. 2008b. Model Selection and Model Averaging Cambridge, UK: Cambridge Univ. Press
Claeskens G, Krivobokova T, Opsomer JD. 2009. Asymptotic properties of penalized spline estimators. Biometrika 96:529–44 [Google Scholar]
Claeskens G, Magnus JR, Vasnev AL, Wendun W. 2016. The forecast combination puzzle: a simple theoretical explanation. Int. J. Forecast. In press
Claeskens G, Pircalabelu E, Waldorp L. 2015. Constructing graphical models via the focused information criterion. Modeling and Stochastic Learning for Forecasting in High Dimension A Antoniadis, X Brossat, JM Poggi 55–78 New York: Springer [Google Scholar]
Danilov D, Magnus JR. 2004. On the harm that ignoring pretesting can cause. J. Econ. 122:27–46 [Google Scholar]
Dawid AP. 1984. Present position and potential developments: some personal views: statistical theory: the prequential approach. J. R. Stat. Soc. Ser. A 147:278–92 [Google Scholar]
de Boor C. 2001. A Practical Guide to Splines New York: Springer Rev. ed.
Donoho DL, Johnstone IM. 1994. Ideal spatial adaptation by wavelet shrinkage. Biometrika 81:425–55 [Google Scholar]
Donoho DL, Johnstone IM. 1995. Adapting to unknown smoothness via wavelet shrinkage. J. Am. Stat. Assoc. 90:1200–24 [Google Scholar]
Donohue MC, Overholser R, Xu R, Vaida F. 2011. Conditional Akaike information under generalized linear and proportional hazards mixed models. Biometrika 98:685–700 [Google Scholar]
Efron B. 2014. Estimation and accuracy after model selection. J. Am. Stat. Assoc. 109:991–1007 [Google Scholar]
Eilers PHC, Marx BD. 1996. Flexible smoothing with B-splines and penalties. Stat. Sci. 11:89–121 [Google Scholar]
Erven Tv, Grünwald P, de Rooij S. 2012. Catching up faster by switching sooner: a predictive approach to adaptive estimation with an application to the AIC–BIC dilemma. J. R. Stat. Soc. Ser. B 74:361–417 [Google Scholar]
Fan J, Gijbels I. 1996. Local Polynomial Modelling and Its Applications Monogr. Stat. Appl. Probab. 66 London: Chapman & Hall
Fan J, Li R. 2001. Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 96:1348–60 [Google Scholar]
Fan J, Li R. 2002. Variable selection for Cox's proportional hazards model and frailty model. Ann. Stat. 30:74–99 [Google Scholar]
Fokoue E, Clarke B. 2011. Bias-variance trade-off for prequential model list selection. Stat. Pap. 52:813–33 [Google Scholar]
Green PJ, Hjort NL, Richardson S. 2003. Highly Structured Stochastic Systems Oxford Stat. Sci. Ser 27 Oxford, UK: Oxford Univ. Press
Green PJ, Silverman BW. 1994. Nonparametric Regression and Generalized Linear Models: A Roughness Penalty Approach Monogr. Stat. Appl. Probab 58 London: Chapman & Hall
Greven S, Kneib T. 2010. On the behavior of marginal and conditional Akaike information criteria in linear mixed models. Biometrika 97:773–89 [Google Scholar]
Hannan EJ, Quinn BG. 1979. The determination of the order of an autoregression. J. R. Stat. Soc. Ser. B 41:190–95 [Google Scholar]
Hansen BE. 2005. Challenges for econometric model selection. Econom. Theory 21:60–68 [Google Scholar]
Hansen BE. 2007. Least squares model averaging. Econometrica 75:1175–89 [Google Scholar]
Hansen BE, Racine JS. 2012. Jackknife model averaging. J. Econometr. 167:38–46 [Google Scholar]
Hastie TJ, Tibshirani RJ, Friedman J. 2001. The Elements of Statistical Learning: Data Mining, Inference, and Prediction Heidelberg: Springer-Verlag
Heritier S, Cantoni E, Copt S, Victoria-Feser MP. 2009. Robust Methods in Biostatistics Wiley Ser. Probab. Stat. Chichester, UK: Wiley
Hjort NL. 2008. Focused information criteria for the linear hazard regression model. Statistical Models and Methods for Biomedical and Technical Systems F Vonta, M Nikulin, N Limnios, C Huber-Carol 487–502 Stat. Ind. Tech. Ser Boston: Birkhäuser [Google Scholar]
Hjort NL, Claeskens G. 2003. Frequentist model average estimators. J. Am. Stat. Assoc. 98:879–99 [Google Scholar]
Hjort NL, Claeskens G. 2006. Focused information criteria and model averaging for Cox hazard regression model. J. Am. Stat. Assoc. 101:1449–64 [Google Scholar]
Hodges JS, Sargent DJ. 2001. Counting degrees of freedom in hierarchical and other richly parameterized models. Biometrika 88:367–79 [Google Scholar]
Hoeting JA, Madigan D, Raftery AE, Volinsky CT. 1999. Bayesian model averaging: a tutorial. Stat. Sci. 14:382–401 [Google Scholar]
Huang JZ. 2003. Local asymptotics for polynomial spline regression. Ann. Stat. 31:1600–35 [Google Scholar]
Jansen M. 2014. Information criteria for variable selection under sparsity. Biometrika 101:37–55 [Google Scholar]
Jansen M, Malfait M, Bultheel A. 1997. Generalized cross validation for wavelet thresholding. Signal Process. 56:33–44 [Google Scholar]
Jansen M, Nason G, Silverman B. 2009. Multiscale methods for data on graphs and irregular multidimensional situations. J. R. Stat. Soc. Ser. B 71:97–125 [Google Scholar]
Jansen M, Oonincx P. 2005. Second Generation Wavelets and Applications London: Springer-Verlag
Konishi S, Kitagawa G. 2008. Information Criteria and Statistical Modeling Springer Ser. Stat New York: Springer
Leeb H, Pötscher BM. 2003. The finite-sample distribution of post-model-selection estimators and uniform versus nonuniform approximations. Econom. Theory 19:100–42 [Google Scholar]
Leeb H, Pötscher BM. 2005. Model selection and inference: facts and fiction. Econom. Theory 21:21–59 [Google Scholar]
Liang H, Wu H, Zou G. 2008. A note on conditional AIC for linear mixed-effects models. Biometrika 95:773–78 [Google Scholar]
Liang H, Zou G, Wan ATK, Zhang X. 2011. Optimal weight choice for frequentist model average estimators. J. Am. Stat. Assoc. 106:1053–66 [Google Scholar]
Linhart H, Zucchini W. 1986. Model Selection New York: Wiley
Liu CA. 2015. Distribution theory of the least squares averaging estimator. J. Econometr. 186:142–59 [Google Scholar]
Liu W, Yang Y. 2011. Parametric or nonparametric? A parametricness index for model selection. Ann. Stat. 39:2074–102 [Google Scholar]
Magnus JR, Powell O, Prüfer P. 2010. A comparison of two model averaging techniques with an application to growth empirics. J. Econometr. 154:139–53 [Google Scholar]
Mallows CL. 1973. Some comments on C_p. Technometrics 15:661–75 [Google Scholar]
Massart P. 2007. Concentration Inequalities and Model Selection Lect. Notes Math. Book 1896, Ecole Eté Probab. Saint-Flour. Berlin: Springer [Google Scholar]
McQuarrie ADR, Tsai CL. 1998. Regression and Time Series Model Selection River Edge, NJ: World Sci.
Meinshausen N, Bühlmann P. 2006. High-dimensional graphs and variable selection with the Lasso. Ann. Stat. 34:1436–62 [Google Scholar]
Moral-Benito E. 2015. Model averaging in economics: an overview. J. Econ. Surv. 29:46–75 [Google Scholar]
Müller S, Welsh AH. 2009. Robust model selection in generalized linear models. Stat. Sin. 19:1155–70 [Google Scholar]
Naik PA, Shi P, Tsai CL. 2007. Extending the Akaike information criterion to mixture regression models. J. Am. Stat. Assoc. 102:244–54 [Google Scholar]
Nason GP. 1996. Wavelet shrinkage using cross validation. J. R. Stat. Soc. Ser. B 58:463–79 [Google Scholar]
Park MY, Hastie T. 2007. L1-regularization path algorithm for generalized linear models. J. R. Stat. Soc. Ser. B 69:659–77 [Google Scholar]
Pircalabelu E, Claeskens G, Jahfari S, Waldorp LJ. 2015a. A focused information criterion for graphical models in fMRI connectivity with high dimensional data. Ann. Appl. Stat. 9:2179–214 [Google Scholar]
Pircalabelu E, Claeskens G, Waldorp L. 2015b. A focused information criterion for graphical models. Stat. Comput. 25:1071–92 [Google Scholar]
Pötscher BM. 1991. Effects of model selection on inference. Econ. Theory 7:163–85 [Google Scholar]
Qian M, Murphy SA. 2011. Performance guarantees for individualized treatment rules. Ann. Stat. 39:1180–210 [Google Scholar]
Rohan N, Ramanathan T. 2011. Order selection in ARMA models using the focused information criterion. Aust. N. Z. J. Stat. 53:217–31 [Google Scholar]
Rolling CA, Yang Y. 2014. Model selection for estimating treatment effects. J. R. Stat. Soc. Ser. B 76:749–69 [Google Scholar]
Ronchetti E. 1985. Robust model selection in regression. Stat. Probab. Lett. 3:21–23 [Google Scholar]
Ronchetti E, Staudte RG. 1994. A robust version of Mallows' C_p. J. Am. Stat. Assoc. 89:550–59 [Google Scholar]
Ruppert D, Wand MP, Carroll RJ. 2003. Semiparametric Regression Cambridge, UK: Cambridge Univ. Press
Saefken B, Kneib T, van Waveren CS, Greven S. 2014. A unifying approach to the estimation of the conditional Akaike information in generalized linear mixed models. Electron. J. Stat. 8:201–25 [Google Scholar]
Sauerbrei W, Holländer N, Buchholz A. 2008. Investigation about a screening step in model selection. Stat. Comput. 18:195–208 [Google Scholar]
Schomaker M, Heumann C. 2014. Model selection and model averaging after multiple imputation. Comput. Stat. Data Anal. 71:758–70 [Google Scholar]
Schomaker M, Wan AT, Heumann C. 2010. Frequentist model averaging with missing observations. Comput. Stat. Data Anal. 54:3336–47 [Google Scholar]
Schwarz G. 1978. Estimating the dimension of a model. Ann. Stat. 6:461–64 [Google Scholar]
Shibata R. 1980. Asymptotically efficient selection of the order of the model for estimatng parameters of a linear process. Ann. Stat. 8:147–64 [Google Scholar]
Sin C, White H. 1996. Information criteria for selecting possibly misspecified parametric models. J. Econometr. 71:207–25 [Google Scholar]
Stein C. 1981. Estimation of the mean of a multivariate normal distribution. Ann. Stat. 9:1135–51 [Google Scholar]
Sueishi N. 2013. Generalized empirical likelihood-based focused information criterion and model averaging. Econometrics 1:141–56 [Google Scholar]
Tibshirani R. 1996. Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B 58:267–88 [Google Scholar]
Vaida F, Blanchard S. 2005. Conditional Akaike information for mixed-effects models. Biometrika 92:351–70 [Google Scholar]
Vansteelandt S, Bekaert M, Claeskens G. 2012. On model selection and model misspecification in causal inference. Stat. Methods Med. Res. 21:7–30 [Google Scholar]
Vidakovic B. 1999. Statistical Modeling by Wavelets Wiley Ser. Probab. Math. Stat New York: Wiley
Wan ATK, Zhang X, Wang S. 2013. Frequentist model averaging for multinomial and ordered logit models. Int. J. Forecast. 30:118–28 [Google Scholar]
Wan ATK, Zhang X, Zou G. 2010. Least squares model averaging by Mallows criterion. J. Econometr. 156:277–83 [Google Scholar]
Wand MP, Jones MC. 1995. Kernel Smoothing Monogr. Stat. Appl. Probab. 60 London: Chapman & Hall
Yang Y. 2001. Adaptive regression by mixing. J. Am. Stat. Assoc. 96:574–88 [Google Scholar]
Yang Y. 2005. Can the strengths of AIC and BIC be shared?. Biometrika 92:937–50 [Google Scholar]
Zhang X, Liang H. 2011. Focused information criterion and model averaging for generalized additive partial linear models. Ann. Stat. 39:174–200 [Google Scholar]
Zhang X, Wan ATK, Zhou SZ. 2012. Focused information criteria, model selection, and model averaging in a Tobit model with a nonzero threshold. J. Bus. Econ. Stat. 30:132–42 [Google Scholar]
Zhang X, Zou G, Carroll RJ. 2015. Model averaging based on Kullback-Leibler distance. Stat. Sin. 25:1583–98 [Google Scholar]
Zou H. 2006. The adaptive lasso and its oracle properties. J. Am. Stat. Assoc. 101:1418–29 [Google Scholar]
Zou H, Hastie T. 2005. Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B 67:301–20 [Google Scholar]
Zou H, Yuan M. 2008. Composite quantile regression and the oracle model selection theory. Ann. Stat. 36:1108–26 [Google Scholar]

/content/journals/10.1146/annurev-statistics-041715-033413

Statistical Model Choice

Annual Review of Statistics and Its Application 3, 233 (2016); https://doi.org/10.1146/annurev-statistics-041715-033413

/content/journals/10.1146/annurev-statistics-041715-033413

Data & Media loading...

Article Type: Review Article

Most Cited Most Cited RSS feed

- Probabilistic Forecasting
  
  Tilmann Gneiting, and Matthias Katzfuss
  
  Vol. 1 (2014), pp. 125–151
- Functional Data Analysis
  
  Jane-Ling Wang, Jeng-Min Chiou, and Hans-Georg Müller
  
  Vol. 3 (2016), pp. 257–295
- Bayesian Computing with INLA: A Review
  
  Håvard Rue, Andrea Riebler, Sigrunn H. Sørbye, Janine B. Illian, Daniel P. Simpson, and Finn K. Lindgren
  
  Vol. 4 (2017), pp. 395–421
- Functional Regression
  
  Jeffrey S. Morris
  
  Vol. 2 (2015), pp. 321–359
- Topological Data Analysis
  
  Larry Wasserman
  
  Vol. 5 (2018), pp. 501–532
- Algorithmic Fairness: Choices, Assumptions, and Definitions
  
  Shira Mitchell, Eric Potash, Solon Barocas, Alexander D'Amour, and Kristian Lum
  
  Vol. 8 (2021), pp. 141–163
- Microbiome, Metagenomics, and High-Dimensional Compositional Data Analysis
  
  Hongzhe Li
  
  Vol. 2 (2015), pp. 73–94
- Learning Deep Generative Models
  
  Ruslan Salakhutdinov
  
  Vol. 2 (2015), pp. 361–385
- On p-Values and Bayes Factors
  
  Leonhard Held, and Manuela Ott
  
  Vol. 5 (2018), pp. 393–419
- High-Dimensional Statistics with a View Toward Applications in Biology
  
  Peter Bühlmann, Markus Kalisch, and Lukas Meier
  
  Vol. 1 (2014), pp. 255–278
More Less

Annual Review of Statistics and Its Application

Volume 3, 2016

Review Article

Free

Statistical Model Choice

Abstract

Most Read This Month

Most Cited Most Cited RSS feed