Missing data methods in longitudinal studies: a review

Ibrahim, Joseph G.; Molenberghs, Geert

doi:10.1007/s11749-009-0138-x

Missing data methods in longitudinal studies: a review

Invited Paper
Published: 27 February 2009

Volume 18, pages 1–43, (2009)
Cite this article

TEST Aims and scope Submit manuscript

Joseph G. Ibrahim¹ &
Geert Molenberghs²

3430 Accesses
303 Citations
1 Altmetric
Explore all metrics

Abstract

Incomplete data are quite common in biomedical and other types of research, especially in longitudinal studies. During the last three decades, a vast amount of work has been done in the area. This has led, on the one hand, to a rich taxonomy of missing-data concepts, issues, and methods and, on the other hand, to a variety of data-analytic tools. Elements of taxonomy include: missing data patterns, mechanisms, and modeling frameworks; inferential paradigms; and sensitivity analysis frameworks. These are described in detail. A variety of concrete modeling devices is presented. To make matters concrete, two case studies are considered. The first one concerns quality of life among breast cancer patients, while the second one examines data from the Muscatine children’s obesity study.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Beckman RJ, Nachtsheim CJ, Cook RD (1987) Diagnostics for mixed-model analysis of variance. Technometrics 29:413–426
Article MATH MathSciNet Google Scholar
Best NG, Spiegelhalter DJ, Thomas A, Brayne CEG (1996) Bayesian analysis of realistically complex models. J R Stat Soc Ser A 159:323–342
Article Google Scholar
Beunckens C, Molenberghs G, Verbeke G, Mallinckrodt C (2008) A latent-class mixture model for incomplete longitudinal Gaussian data. Biometrics 64(1):96–105
Article MATH Google Scholar
Breslow NE, Clayton DG (1993) Approximate inference in generalized linear mixed models. J Am Stat Assoc 88:9–25
Article MATH Google Scholar
Brown ER, Ibrahim JG (2003a) A Bayesian semiparametric joint hierarchical model for longitudinal and survival data. Biometrics 59:221–228
Article MathSciNet Google Scholar
Brown ER, Ibrahim JG (2003b) Bayesian approaches to joint cure rate and longitudinal models with applications to cancer vaccine trials. Biometrics 59:686–693
Article MathSciNet Google Scholar
Brown ER, Ibrahim JG, DeGruttola V (2005) A flexible b-spline model for multiple longitudinal biomarkers and survival. Biometrics 61:64–73
Article MATH MathSciNet Google Scholar
Carpenter J, Pocock S, Lamm CJ (2002) Coping with missing data in clinical trials: a model based approach applied to asthma trials. Stat Med 21:1043–1066
Article Google Scholar
Chen M-H, Ibrahim JG (2002) Maximum likelihood methods for cure rate models with missing covariates. Biometrics 57:43–52
Article MathSciNet Google Scholar
Chen M-H, Ibrahim JG, Lipsitz SR (2002) Bayesian methods for missing covariates in cure rate models. Lifetime Data Anal 8:117–146
Article MATH MathSciNet Google Scholar
Chen M-H, Ibrahim JG, Shao Q-M (2004a) Propriety of the posterior distribution and existence of the maximum likelihood estimator for regression models with covariates missing at random. J Am Stat Assoc 99:421–438
Article MATH MathSciNet Google Scholar
Chen M-H, Ibrahim JG, Sinha D (2004b) A new joint model for longitudinal and survival data with a cure fraction. J Multivar Anal 91:18–34
Article MATH MathSciNet Google Scholar
Chen M-H, Ibrahim JG, Shao Q-M (2006) Posterior propriety anc computation for the Cox regression model with applications to missing covariates. Biometrika 93:791–807
Article MathSciNet Google Scholar
Chen M-H, Ibrahim JG, Shao Q-M (2009) Model identifiability for the Cox regression model with applications to missing covariates. J Multivar Anal (in press)
Chen Q, Ibrahim JG (2006) Missing covariate and response data in regression models. Biometrics 62:177–184
Article MATH MathSciNet Google Scholar
Chen Q, Zeng D, Ibrahim JG (2007) Sieve maximum likelihood estimation for regression models with covariates missing at random. J Am Stat Assoc 102:1309–1317
Article MATH MathSciNet Google Scholar
Chen Q, Ibrahim JG, Chen M-H, Senchaudhuri P (2008) Theory and inference for regression models with missing responses and covariates. J Multivar Anal 99:1302–1331
Article MATH Google Scholar
Chi Y, Ibrahim JG (2006) Joint models for multivariate longitudinal and survival data. Biometrics 62:432–445
Article MATH MathSciNet Google Scholar
Chi Y, Ibrahim JG (2007) A new class of joint models for longitudinal and survival data accomodating zero and zon-zero cure fractions: a case study of an international breast cancer study group trial. Stat Sin 17:445–462
MATH MathSciNet Google Scholar
Cook RD (1986) Assessment of local influence. J R Stat Soc Ser B 48:133–169
MATH Google Scholar
Cowles MK, Carlin BP, Connett JE (1996) Bayesian tobit modeling of longitudinal ordinal clinical trial compliance data with nonignorable missingness. J Am Stat Assoc 91:86–98
Article MATH Google Scholar
Creemers A, Hens N, Aerts M, Molenberghs G, Verbeke G, Kenward MG (2009) Shared-parameter models and missingness at random (Submitted for publication)
Daniels MJ, Hogan JW (2008) Missing data in longitudinal studies. Chapman and Hall, London
MATH Google Scholar
DeGruttola V, Tu XM (1994) Modelling progression of CD4 lymphocyte count and its relationship to survival time. Biometrics 50:1003–1014
Article Google Scholar
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm (with discussion). J R Stat Soc Ser B 39:1–38
MATH MathSciNet Google Scholar
Diggle P, Kenward MG (1994) Informative drop-out in longitudinal data analysis (with discussion). Appl Stat 43:49–93
Article MATH Google Scholar
Diggle PJ, Heagerty P, Liang K-Y, Zeger SL (2002) Analysis of longitudinal data. Oxford University Press, London
Google Scholar
Ekholm A, Skinner C (1998) The Muscatine children’s obesity data reanalysed using pattern mixture models. Appl Stat 47:251–263
Google Scholar
Faucett CL, Thomas DC (1996) Simultaneously modelling censored survival data and repeatedly measured covariates: a Gibbs sampling approach. Stat Med 15:1663–1685
Article Google Scholar
Fitzmaurice GM, Laird NM (2000) Generalized linear mixture models for handling nonignorable dropouts in longitudinal studies. Biostatistics 1:141–156
Article MATH Google Scholar
Fitzmaurice GM, Lipsitz SR, Molenberghs G, Ibrahim JG (2001) Bias in estimating association parameters for longitudinal binary responses with drop-outs. Biometrics 57:15–21
Article MathSciNet Google Scholar
Fitzmaurice GM, Laird NM, Ware JH (2004) Applied longitudinal analysis. Wiley, New York
MATH Google Scholar
Fitzmaurice GM, Lipsitz SR, Ibrahim JG, Gelber R, Lipshultz S (2006) Estimation in regression models for longitudinal binary data with outcome-dependent follow-up. Biostatistics 7:469–485
Article MATH Google Scholar
Fitzmaurice GM, Davidian M, Verbeke G, Molenberghs M (2008) Longitudinal data analysis. Chapman and Hall, London
Google Scholar
Follman D, Wu M (1995) An approximate generalized linear model with random effects for informative missing data. Biometrics 51:151–168
Article MathSciNet Google Scholar
Garcia RI, Ibrahim JG, Zhu H (2009) Variable selection for regression models with missing data. Stat Sin (in press)
Gilks WR, Wild P (1992) Adaptive rejection sampling for Gibbs sampling. Appl Stat 41:337–348
Article MATH Google Scholar
Henderson R, Diggle P, Dobson A (2000) Joint modelling of longitudinal measurements and event time data. Biostatistics 1:465–480
Article MATH Google Scholar
Herring AH, Ibrahim JG (2001) Likelihood-based methods for missing covariates in the Cox proportional hazards model. J Am Stat Assoc 96:292–302
Article MATH MathSciNet Google Scholar
Herring AH, Ibrahim JG (2002) Maximum likelihood estimation in random effects cure rate models with nonignorably missing covariates. Biostatistics 3:387–405
Article MATH Google Scholar
Herring AH, Ibrahim JG, Lipsitz SR (2002) Frailty models with missing covariates. Biometrics 58:98–109
Article MathSciNet Google Scholar
Herring AH, Ibrahim JG, Lipsitz SR (2004) Nonignorably missing covariate data in survival analysis: a case study of an international breast cancer study group trial. Appl Stat 53:293–310
MATH MathSciNet Google Scholar
Hogan JW, Laird NM (1997) Mixture models for the joint distribution of repeated measures and event times. Stat Med 16:239–257
Article Google Scholar
Hogan JW, Laird NM (1998) Increasing efficiency from censored survival data using random effects from longitudinal covariates. Stat Methods Med Res 7:28–48
Article Google Scholar
Huang L, Chen M-H, Ibrahim JG (2005) Bayesian analysis for generalized linear models with nonignorably missing covariates. Biometrics 61:767–780
Article MATH MathSciNet Google Scholar
Ibrahim JG (1990) Incomplete data in generalized linear models. J Am Stat Assoc 85:765–769
Article Google Scholar
Ibrahim JG, Lipsitz SR, Chen M-H (1999a) Missing covariates in generalized linear models when the missing data mechanism is nonignorable. J R Stat Soc Ser B 61:173–190
Article MATH MathSciNet Google Scholar
Ibrahim JG, Chen MH, Lipsitz SR (1999b) Monte Carlo EM for missing covariates in parametric regression models. Biometrics 55:591–596
Article MATH Google Scholar
Ibrahim JG, Chen M-H, Lipsitz SR (2001) Missing responses in generalized linear mixed models when the missing data mechanism is nonignorable. Biometrika 88:551–564
Article MATH MathSciNet Google Scholar
Ibrahim JG, Chen M-H, Lipsitz SR (2002) Bayesian methods for generalized linear models with covariates missing at random. Can J Stat 30:55–78
Article MATH MathSciNet Google Scholar
Ibrahim JG, Chen M-H, Sinha D (2004) Bayesian methods for joint modeling of longitudinal and survival data with applicants to cancer vaccine trials. Stat Sin 14:863–883
MATH MathSciNet Google Scholar
Ibrahim JG, Chen M-H, Lipsitz SR, Herring AH (2005) Missing data methods in generalized linear models: a comparative review. J Am Stat Assoc 100:332–346
Article MATH MathSciNet Google Scholar
Ibrahim JG, Chen M-H, Kim S (2008a) Bayesian variable selection for the Cox regression model with missing covariates. Lifetime Data Anal 14:496–520
Article MathSciNet Google Scholar
Ibrahim JG, Zhu H, Tang N (2008b) Model selection criteria for missing data problems using the EM algorithm. J Am Stat Assoc 103:1648–1658
Article Google Scholar
Jennrich RI, Schluchter MD (1986) Unbalanced repeated-measures models with structured covariance matrices. Biometrics 42:805–820
Article MATH MathSciNet Google Scholar
Laird NM, Ware JH (1982) Random-effects models for longitudinal data. Biometrics 38:963–974
Article MATH Google Scholar
Lavalley MP, DeGruttola V (1996) Models for empirical Bayes estimators of longitudinal CD4 counts. Stat Med 15:2289–2305
Article Google Scholar
Lesaffre E, Verbeke G (1998) Local influence in linear mixed models. Biometrics 54:570–582
Article MATH Google Scholar
Liang K-Y, Zeger SL (1986) Longitudinal data analysis using generalized linear models. Biometrika 73:13–22
Article MATH MathSciNet Google Scholar
Lipsitz SR, Ibrahim JG, Fitzmaurice GM (1999a) Likelihood methods for incomplete longitudinal binary responses with incomplete categorical covariates. Biometrics 55:214–223
Article MATH Google Scholar
Lipsitz SR, Ibrahim JG, Zhao LP (1999b) A new weighted estimating equation for missing covariate data with properties similar to maximum likelihood. J Am Stat Assoc 94:1147–1160
Article MATH MathSciNet Google Scholar
Lipsitz SR, Ibrahim JG, Molenberghs G (2000) Using a Box–Cox transformation in the analysis of longitudinal data with incomplete responses. Appl Stat 49:287–296
MATH MathSciNet Google Scholar
Lipsitz SR, Parzen M, Molenberghs G, Ibrahim JG (2001) Tesing for bias in weighted estimating equations. Biostatistics 2:295–307
Article MATH Google Scholar
Lipsitz SR, Fitzmaurice GM, Ibrahim JG, Gelber R, Lipshultz S (2002) Parameter estimation in longitudinal studies with outcome-dependent follow-up. Biometrics 58:621–630
Article MathSciNet Google Scholar
Little RJA (1993) Pattern-mixture models for multivariate incomplete data. J Am Stat Assoc 88:125–134
Article MATH Google Scholar
Little RJA (1994) A class of pattern-mixture models for normal incomplete data. Biometrika 81:471–483
Article MATH MathSciNet Google Scholar
Little RJA (1995) Modeling the drop-out mechanism in repeated-measures studies. J Am Stat Assoc 90:1113–1121
Article MathSciNet Google Scholar
Little RJA, Wang Y (1996) Pattern-mixture models for multivariate incomplete data with covariates. Biometrics 52:98–111
Article MATH Google Scholar
Little RJA, Rubin DB (2002) Statistical analysis with missing data. Wiley, New York
MATH Google Scholar
Louis T (1982) Finding the observed information matrix when using the EM algorithm. J R Stat Soc Ser B 44:226–233
MATH MathSciNet Google Scholar
Meilijson I (1989) A fast improvement to the EM algorithm on its own terms. J R Stat Soc Ser B 51:127–138
MATH MathSciNet Google Scholar
Molenberghs G, Verbeke G (2005) Models for discrete longitudinal data. Springer, New York
MATH Google Scholar
Molenberghs G, Kenward MG (2007) Missing data in clinical studies. Wiley, New York
Book Google Scholar
Molenberghs G, Kenward MG, Lesaffre E (1997) The analysis of longitudinal ordinal data with nonrandom drop-out. Biometrika 84:33–4
Article MATH Google Scholar
Pawitan Y, Self S (1993) Modeling disease marker processes in AIDS. J Am Stat Assoc 88:719–726
Article MATH Google Scholar
Prentice RL (1989) Surrogate endpoints in clinical trials: definitions and operational criteria. Stat Med 8:431–440
Article Google Scholar
Renard D, Geys H, Molenberghs G, Burzykowski T, Buyse M (2002) Validation of surrogate endpoints in multiple randomized clinical trials with discrete outcomes. Biom J 44:921–935
Article MathSciNet Google Scholar
Rizopoulos D, Verbeke G, Molenberghs G (2008) Shared parameter models under random-effects misspecification. Biometrika 94:63–74
Article MathSciNet Google Scholar
Robins JM, Rotnitzky A, Zhao LP (1995) Analysis of semiparametric regression models for repeated outcomes in the presence of missing data. J Am Stat Assoc 90(429):106–121
Article MATH MathSciNet Google Scholar
Rotnitzky A, Robins JM, Scharfstein DO (1998) Semiparametric regression for repeated outcomes with nonignorable nonresponse. J Am Stat Assoc 93:1321–1339
Article MATH MathSciNet Google Scholar
Rubin DB (1976) Inference and missing data. Biometrika 63(3):581–592
Article MATH MathSciNet Google Scholar
Rubin DB (1987) Multiple imputation for nonresponse in surveys. Wiley series in probability and mathematical statistics: applied probability and statistics. Wiley, New York
Book Google Scholar
Scharfstein DO, Rotnitzky A, Robins JM (1999) Adjusting for nonignorable drop-out using semiparametric nonresponse models. J Am Stat Assoc 94:1096–1120
Article MATH MathSciNet Google Scholar
Schluchter MD (1992) Methods for the analysis of informatively censored longitudinal data. Stat Med 11:1861–1870
Article Google Scholar
Shi X, Zhu H, Ibrahim JG (2009) Local influence for generalized linear models with missing covariates. Biometrics (in press)
Stubbendick AL, Ibrahim JG (2003) Maximum likelihood methods for nonignorable responses and covariates in random effects models. Biometrics 59:1140–1150
Article MATH MathSciNet Google Scholar
Stubbendick AL, Ibrahim JG (2006) Likelihood-based inference with nonignorably missing responses and covariates in models for discrete longitudinal data. Stat Sin 16:1143–1167
MATH MathSciNet Google Scholar
Taylor JMG, Cumberland WG, Sy JP (1994) A stochastic model for analysis of longitudinal AIDS data. J Am Stat Assoc 89:727–736
Article MATH Google Scholar
Thijs H, Molenberghs G, Michiels B, Verbeke G, Curran D (2002) Strategies to fit pattern-mixture models. Biostatistics 3:245–265
Article MATH Google Scholar
Troxel AB, Harrington DP, Lipsitz SR (1998a) Analysis of longitudinal data with nonignorable nonmonotone missing values. Appl Stat 47:425–438
MATH Google Scholar
Troxel AB, Lipsitz SR, Harrington DP (1998b) Marginal models for the analysis of longitudinal measurements with nonignorable non-monotone missing data. Biometrika 85:661–672
Article MATH MathSciNet Google Scholar
Tsiatis AA, DeGruttola V, Wulfsohn MS (1995) Modeling the relationship of survival to longitudinal data measured with error. Applications to survival and CD4 counts in patients with AIDS. J Am Stat Assoc 90:27–37
Article MATH Google Scholar
Verbeke G, Molenberghs G (2000) Linear mixed models for longitudinal data. Springer, New York
MATH Google Scholar
Wedderburn RWM (1974) Quasi-likelihood methods, generalised linear models, and the Gauss–Newton method. Biometrika 61:439–447
MATH MathSciNet Google Scholar
Wei GC, Tanner MA (1990) A Monte Carlo implementation of the EM algorithm and the poor man’s data augmentation algorithms. J Am Stat Assoc 85:699–704
Article Google Scholar
Wolfinger R, O’Connell M (1993) Generalized linear models: a pseudo-likelihood approach. J Stat Comput Simul 48:233–243
Article MATH Google Scholar
Woolson RF, Clarke WR (1984) Analysis of categorical incomplete longitudinal data. J R Stat Soc Ser A 147:87–99
Article Google Scholar
Wu MC, Bailey KR (1988) Analysing changes in the presence of informative right censoring caused by death and withdrawal. Stat Med 7:337–346
Article Google Scholar
Wu MC, Carroll RJ (1988) Estimation and comparison of changes in the presence of informative right censoring by modeling the censoring process. Biometrics 44:175–188
Article MATH MathSciNet Google Scholar
Wu MC, Bailey KR (1989) Estimation and comparison of changes in the presence of informative right censoring: conditional linear model. Biometrics 45:939–955
Article MATH MathSciNet Google Scholar
Xu J, Zeger SL (2001) Joint analysis of longitudinal data comprising repeated measures and times to events. Appl Stat 50:375–387
MATH MathSciNet Google Scholar
Zeger SL, Liang K-Y (1986) Longitudinal data analysis for discrete and continuous outcomes. Biometrics 42:121–130
Article Google Scholar
Zhu H-T, Lee S-Y (2001) Local influence for incomplete-data models. J R Stat Soc Ser B 63:111–126
Article MATH MathSciNet Google Scholar
Zhu H, Ibrahim JG, Shi X (2009) Diagnostic measures for generalized linear models with missing covariates. Scand J Stat (in press)

Download references

Author information

Authors and Affiliations

Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, USA
Joseph G. Ibrahim
Center for Statistics à International Institute for Biostatistic and Statistical Bioinformatics, Hasselt University and Catholic University Leuven, Agoralaan 1, 3590, Diepenbeek, Belgium
Geert Molenberghs

Authors

Joseph G. Ibrahim
View author publications
You can also search for this author in PubMed Google Scholar
Geert Molenberghs
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Joseph G. Ibrahim.

Additional information

This invited paper is discussed in the comments available at: http://dx.doi.org/10.1007/s11749-009-0139-9, http://dx.doi.org/10.1007/s11749-009-0140-3, http://dx.doi.org/10.1007/s11749-009-0141-2, http://dx.doi.org/10.1007/s11749-009-0142-1, http://dx.doi.org/10.1007/s11749-009-0143-0.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ibrahim, J.G., Molenberghs, G. Missing data methods in longitudinal studies: a review. TEST 18, 1–43 (2009). https://doi.org/10.1007/s11749-009-0138-x

Download citation

Published: 27 February 2009
Issue Date: May 2009
DOI: https://doi.org/10.1007/s11749-009-0138-x

Keywords

Mathematics Subject Classification (2000)

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Missing data methods in longitudinal studies: a review

Abstract

Access this article

Similar content being viewed by others

An Introduction to Handling Missing Data in Health Economic Evaluations

Principled Missing Data Treatments

Are All Biases Missing Data Problems?

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification (2000)

Navigation

Missing data methods in longitudinal studies: a review

Abstract

Access this article

Similar content being viewed by others

An Introduction to Handling Missing Data in Health Economic Evaluations

Principled Missing Data Treatments

Are All Biases Missing Data Problems?

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification (2000)

Search

Navigation