Skip to main content

Advertisement

Log in

Missing data methods in longitudinal studies: a review

  • Invited Paper
  • Published:
TEST Aims and scope Submit manuscript

Abstract

Incomplete data are quite common in biomedical and other types of research, especially in longitudinal studies. During the last three decades, a vast amount of work has been done in the area. This has led, on the one hand, to a rich taxonomy of missing-data concepts, issues, and methods and, on the other hand, to a variety of data-analytic tools. Elements of taxonomy include: missing data patterns, mechanisms, and modeling frameworks; inferential paradigms; and sensitivity analysis frameworks. These are described in detail. A variety of concrete modeling devices is presented. To make matters concrete, two case studies are considered. The first one concerns quality of life among breast cancer patients, while the second one examines data from the Muscatine children’s obesity study.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Beckman RJ, Nachtsheim CJ, Cook RD (1987) Diagnostics for mixed-model analysis of variance. Technometrics 29:413–426

    Article  MATH  MathSciNet  Google Scholar 

  • Best NG, Spiegelhalter DJ, Thomas A, Brayne CEG (1996) Bayesian analysis of realistically complex models. J R Stat Soc Ser A 159:323–342

    Article  Google Scholar 

  • Beunckens C, Molenberghs G, Verbeke G, Mallinckrodt C (2008) A latent-class mixture model for incomplete longitudinal Gaussian data. Biometrics 64(1):96–105

    Article  MATH  Google Scholar 

  • Breslow NE, Clayton DG (1993) Approximate inference in generalized linear mixed models. J Am Stat Assoc 88:9–25

    Article  MATH  Google Scholar 

  • Brown ER, Ibrahim JG (2003a) A Bayesian semiparametric joint hierarchical model for longitudinal and survival data. Biometrics 59:221–228

    Article  MathSciNet  Google Scholar 

  • Brown ER, Ibrahim JG (2003b) Bayesian approaches to joint cure rate and longitudinal models with applications to cancer vaccine trials. Biometrics 59:686–693

    Article  MathSciNet  Google Scholar 

  • Brown ER, Ibrahim JG, DeGruttola V (2005) A flexible b-spline model for multiple longitudinal biomarkers and survival. Biometrics 61:64–73

    Article  MATH  MathSciNet  Google Scholar 

  • Carpenter J, Pocock S, Lamm CJ (2002) Coping with missing data in clinical trials: a model based approach applied to asthma trials. Stat Med 21:1043–1066

    Article  Google Scholar 

  • Chen M-H, Ibrahim JG (2002) Maximum likelihood methods for cure rate models with missing covariates. Biometrics 57:43–52

    Article  MathSciNet  Google Scholar 

  • Chen M-H, Ibrahim JG, Lipsitz SR (2002) Bayesian methods for missing covariates in cure rate models. Lifetime Data Anal 8:117–146

    Article  MATH  MathSciNet  Google Scholar 

  • Chen M-H, Ibrahim JG, Shao Q-M (2004a) Propriety of the posterior distribution and existence of the maximum likelihood estimator for regression models with covariates missing at random. J Am Stat Assoc 99:421–438

    Article  MATH  MathSciNet  Google Scholar 

  • Chen M-H, Ibrahim JG, Sinha D (2004b) A new joint model for longitudinal and survival data with a cure fraction. J Multivar Anal 91:18–34

    Article  MATH  MathSciNet  Google Scholar 

  • Chen M-H, Ibrahim JG, Shao Q-M (2006) Posterior propriety anc computation for the Cox regression model with applications to missing covariates. Biometrika 93:791–807

    Article  MathSciNet  Google Scholar 

  • Chen M-H, Ibrahim JG, Shao Q-M (2009) Model identifiability for the Cox regression model with applications to missing covariates. J Multivar Anal (in press)

  • Chen Q, Ibrahim JG (2006) Missing covariate and response data in regression models. Biometrics 62:177–184

    Article  MATH  MathSciNet  Google Scholar 

  • Chen Q, Zeng D, Ibrahim JG (2007) Sieve maximum likelihood estimation for regression models with covariates missing at random. J Am Stat Assoc 102:1309–1317

    Article  MATH  MathSciNet  Google Scholar 

  • Chen Q, Ibrahim JG, Chen M-H, Senchaudhuri P (2008) Theory and inference for regression models with missing responses and covariates. J Multivar Anal 99:1302–1331

    Article  MATH  Google Scholar 

  • Chi Y, Ibrahim JG (2006) Joint models for multivariate longitudinal and survival data. Biometrics 62:432–445

    Article  MATH  MathSciNet  Google Scholar 

  • Chi Y, Ibrahim JG (2007) A new class of joint models for longitudinal and survival data accomodating zero and zon-zero cure fractions: a case study of an international breast cancer study group trial. Stat Sin 17:445–462

    MATH  MathSciNet  Google Scholar 

  • Cook RD (1986) Assessment of local influence. J R Stat Soc Ser B 48:133–169

    MATH  Google Scholar 

  • Cowles MK, Carlin BP, Connett JE (1996) Bayesian tobit modeling of longitudinal ordinal clinical trial compliance data with nonignorable missingness. J Am Stat Assoc 91:86–98

    Article  MATH  Google Scholar 

  • Creemers A, Hens N, Aerts M, Molenberghs G, Verbeke G, Kenward MG (2009) Shared-parameter models and missingness at random (Submitted for publication)

  • Daniels MJ, Hogan JW (2008) Missing data in longitudinal studies. Chapman and Hall, London

    MATH  Google Scholar 

  • DeGruttola V, Tu XM (1994) Modelling progression of CD4 lymphocyte count and its relationship to survival time. Biometrics 50:1003–1014

    Article  Google Scholar 

  • Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm (with discussion). J R Stat Soc Ser B 39:1–38

    MATH  MathSciNet  Google Scholar 

  • Diggle P, Kenward MG (1994) Informative drop-out in longitudinal data analysis (with discussion). Appl Stat 43:49–93

    Article  MATH  Google Scholar 

  • Diggle PJ, Heagerty P, Liang K-Y, Zeger SL (2002) Analysis of longitudinal data. Oxford University Press, London

    Google Scholar 

  • Ekholm A, Skinner C (1998) The Muscatine children’s obesity data reanalysed using pattern mixture models. Appl Stat 47:251–263

    Google Scholar 

  • Faucett CL, Thomas DC (1996) Simultaneously modelling censored survival data and repeatedly measured covariates: a Gibbs sampling approach. Stat Med 15:1663–1685

    Article  Google Scholar 

  • Fitzmaurice GM, Laird NM (2000) Generalized linear mixture models for handling nonignorable dropouts in longitudinal studies. Biostatistics 1:141–156

    Article  MATH  Google Scholar 

  • Fitzmaurice GM, Lipsitz SR, Molenberghs G, Ibrahim JG (2001) Bias in estimating association parameters for longitudinal binary responses with drop-outs. Biometrics 57:15–21

    Article  MathSciNet  Google Scholar 

  • Fitzmaurice GM, Laird NM, Ware JH (2004) Applied longitudinal analysis. Wiley, New York

    MATH  Google Scholar 

  • Fitzmaurice GM, Lipsitz SR, Ibrahim JG, Gelber R, Lipshultz S (2006) Estimation in regression models for longitudinal binary data with outcome-dependent follow-up. Biostatistics 7:469–485

    Article  MATH  Google Scholar 

  • Fitzmaurice GM, Davidian M, Verbeke G, Molenberghs M (2008) Longitudinal data analysis. Chapman and Hall, London

    Google Scholar 

  • Follman D, Wu M (1995) An approximate generalized linear model with random effects for informative missing data. Biometrics 51:151–168

    Article  MathSciNet  Google Scholar 

  • Garcia RI, Ibrahim JG, Zhu H (2009) Variable selection for regression models with missing data. Stat Sin (in press)

  • Gilks WR, Wild P (1992) Adaptive rejection sampling for Gibbs sampling. Appl Stat 41:337–348

    Article  MATH  Google Scholar 

  • Henderson R, Diggle P, Dobson A (2000) Joint modelling of longitudinal measurements and event time data. Biostatistics 1:465–480

    Article  MATH  Google Scholar 

  • Herring AH, Ibrahim JG (2001) Likelihood-based methods for missing covariates in the Cox proportional hazards model. J Am Stat Assoc 96:292–302

    Article  MATH  MathSciNet  Google Scholar 

  • Herring AH, Ibrahim JG (2002) Maximum likelihood estimation in random effects cure rate models with nonignorably missing covariates. Biostatistics 3:387–405

    Article  MATH  Google Scholar 

  • Herring AH, Ibrahim JG, Lipsitz SR (2002) Frailty models with missing covariates. Biometrics 58:98–109

    Article  MathSciNet  Google Scholar 

  • Herring AH, Ibrahim JG, Lipsitz SR (2004) Nonignorably missing covariate data in survival analysis: a case study of an international breast cancer study group trial. Appl Stat 53:293–310

    MATH  MathSciNet  Google Scholar 

  • Hogan JW, Laird NM (1997) Mixture models for the joint distribution of repeated measures and event times. Stat Med 16:239–257

    Article  Google Scholar 

  • Hogan JW, Laird NM (1998) Increasing efficiency from censored survival data using random effects from longitudinal covariates. Stat Methods Med Res 7:28–48

    Article  Google Scholar 

  • Huang L, Chen M-H, Ibrahim JG (2005) Bayesian analysis for generalized linear models with nonignorably missing covariates. Biometrics 61:767–780

    Article  MATH  MathSciNet  Google Scholar 

  • Ibrahim JG (1990) Incomplete data in generalized linear models. J Am Stat Assoc 85:765–769

    Article  Google Scholar 

  • Ibrahim JG, Lipsitz SR, Chen M-H (1999a) Missing covariates in generalized linear models when the missing data mechanism is nonignorable. J R Stat Soc Ser B 61:173–190

    Article  MATH  MathSciNet  Google Scholar 

  • Ibrahim JG, Chen MH, Lipsitz SR (1999b) Monte Carlo EM for missing covariates in parametric regression models. Biometrics 55:591–596

    Article  MATH  Google Scholar 

  • Ibrahim JG, Chen M-H, Lipsitz SR (2001) Missing responses in generalized linear mixed models when the missing data mechanism is nonignorable. Biometrika 88:551–564

    Article  MATH  MathSciNet  Google Scholar 

  • Ibrahim JG, Chen M-H, Lipsitz SR (2002) Bayesian methods for generalized linear models with covariates missing at random. Can J Stat 30:55–78

    Article  MATH  MathSciNet  Google Scholar 

  • Ibrahim JG, Chen M-H, Sinha D (2004) Bayesian methods for joint modeling of longitudinal and survival data with applicants to cancer vaccine trials. Stat Sin 14:863–883

    MATH  MathSciNet  Google Scholar 

  • Ibrahim JG, Chen M-H, Lipsitz SR, Herring AH (2005) Missing data methods in generalized linear models: a comparative review. J Am Stat Assoc 100:332–346

    Article  MATH  MathSciNet  Google Scholar 

  • Ibrahim JG, Chen M-H, Kim S (2008a) Bayesian variable selection for the Cox regression model with missing covariates. Lifetime Data Anal 14:496–520

    Article  MathSciNet  Google Scholar 

  • Ibrahim JG, Zhu H, Tang N (2008b) Model selection criteria for missing data problems using the EM algorithm. J Am Stat Assoc 103:1648–1658

    Article  Google Scholar 

  • Jennrich RI, Schluchter MD (1986) Unbalanced repeated-measures models with structured covariance matrices. Biometrics 42:805–820

    Article  MATH  MathSciNet  Google Scholar 

  • Laird NM, Ware JH (1982) Random-effects models for longitudinal data. Biometrics 38:963–974

    Article  MATH  Google Scholar 

  • Lavalley MP, DeGruttola V (1996) Models for empirical Bayes estimators of longitudinal CD4 counts. Stat Med 15:2289–2305

    Article  Google Scholar 

  • Lesaffre E, Verbeke G (1998) Local influence in linear mixed models. Biometrics 54:570–582

    Article  MATH  Google Scholar 

  • Liang K-Y, Zeger SL (1986) Longitudinal data analysis using generalized linear models. Biometrika 73:13–22

    Article  MATH  MathSciNet  Google Scholar 

  • Lipsitz SR, Ibrahim JG, Fitzmaurice GM (1999a) Likelihood methods for incomplete longitudinal binary responses with incomplete categorical covariates. Biometrics 55:214–223

    Article  MATH  Google Scholar 

  • Lipsitz SR, Ibrahim JG, Zhao LP (1999b) A new weighted estimating equation for missing covariate data with properties similar to maximum likelihood. J Am Stat Assoc 94:1147–1160

    Article  MATH  MathSciNet  Google Scholar 

  • Lipsitz SR, Ibrahim JG, Molenberghs G (2000) Using a Box–Cox transformation in the analysis of longitudinal data with incomplete responses. Appl Stat 49:287–296

    MATH  MathSciNet  Google Scholar 

  • Lipsitz SR, Parzen M, Molenberghs G, Ibrahim JG (2001) Tesing for bias in weighted estimating equations. Biostatistics 2:295–307

    Article  MATH  Google Scholar 

  • Lipsitz SR, Fitzmaurice GM, Ibrahim JG, Gelber R, Lipshultz S (2002) Parameter estimation in longitudinal studies with outcome-dependent follow-up. Biometrics 58:621–630

    Article  MathSciNet  Google Scholar 

  • Little RJA (1993) Pattern-mixture models for multivariate incomplete data. J Am Stat Assoc 88:125–134

    Article  MATH  Google Scholar 

  • Little RJA (1994) A class of pattern-mixture models for normal incomplete data. Biometrika 81:471–483

    Article  MATH  MathSciNet  Google Scholar 

  • Little RJA (1995) Modeling the drop-out mechanism in repeated-measures studies. J Am Stat Assoc 90:1113–1121

    Article  MathSciNet  Google Scholar 

  • Little RJA, Wang Y (1996) Pattern-mixture models for multivariate incomplete data with covariates. Biometrics 52:98–111

    Article  MATH  Google Scholar 

  • Little RJA, Rubin DB (2002) Statistical analysis with missing data. Wiley, New York

    MATH  Google Scholar 

  • Louis T (1982) Finding the observed information matrix when using the EM algorithm. J R Stat Soc Ser B 44:226–233

    MATH  MathSciNet  Google Scholar 

  • Meilijson I (1989) A fast improvement to the EM algorithm on its own terms. J R Stat Soc Ser B 51:127–138

    MATH  MathSciNet  Google Scholar 

  • Molenberghs G, Verbeke G (2005) Models for discrete longitudinal data. Springer, New York

    MATH  Google Scholar 

  • Molenberghs G, Kenward MG (2007) Missing data in clinical studies. Wiley, New York

    Book  Google Scholar 

  • Molenberghs G, Kenward MG, Lesaffre E (1997) The analysis of longitudinal ordinal data with nonrandom drop-out. Biometrika 84:33–4

    Article  MATH  Google Scholar 

  • Pawitan Y, Self S (1993) Modeling disease marker processes in AIDS. J Am Stat Assoc 88:719–726

    Article  MATH  Google Scholar 

  • Prentice RL (1989) Surrogate endpoints in clinical trials: definitions and operational criteria. Stat Med 8:431–440

    Article  Google Scholar 

  • Renard D, Geys H, Molenberghs G, Burzykowski T, Buyse M (2002) Validation of surrogate endpoints in multiple randomized clinical trials with discrete outcomes. Biom J 44:921–935

    Article  MathSciNet  Google Scholar 

  • Rizopoulos D, Verbeke G, Molenberghs G (2008) Shared parameter models under random-effects misspecification. Biometrika 94:63–74

    Article  MathSciNet  Google Scholar 

  • Robins JM, Rotnitzky A, Zhao LP (1995) Analysis of semiparametric regression models for repeated outcomes in the presence of missing data. J Am Stat Assoc 90(429):106–121

    Article  MATH  MathSciNet  Google Scholar 

  • Rotnitzky A, Robins JM, Scharfstein DO (1998) Semiparametric regression for repeated outcomes with nonignorable nonresponse. J Am Stat Assoc 93:1321–1339

    Article  MATH  MathSciNet  Google Scholar 

  • Rubin DB (1976) Inference and missing data. Biometrika 63(3):581–592

    Article  MATH  MathSciNet  Google Scholar 

  • Rubin DB (1987) Multiple imputation for nonresponse in surveys. Wiley series in probability and mathematical statistics: applied probability and statistics. Wiley, New York

    Book  Google Scholar 

  • Scharfstein DO, Rotnitzky A, Robins JM (1999) Adjusting for nonignorable drop-out using semiparametric nonresponse models. J Am Stat Assoc 94:1096–1120

    Article  MATH  MathSciNet  Google Scholar 

  • Schluchter MD (1992) Methods for the analysis of informatively censored longitudinal data. Stat Med 11:1861–1870

    Article  Google Scholar 

  • Shi X, Zhu H, Ibrahim JG (2009) Local influence for generalized linear models with missing covariates. Biometrics (in press)

  • Stubbendick AL, Ibrahim JG (2003) Maximum likelihood methods for nonignorable responses and covariates in random effects models. Biometrics 59:1140–1150

    Article  MATH  MathSciNet  Google Scholar 

  • Stubbendick AL, Ibrahim JG (2006) Likelihood-based inference with nonignorably missing responses and covariates in models for discrete longitudinal data. Stat Sin 16:1143–1167

    MATH  MathSciNet  Google Scholar 

  • Taylor JMG, Cumberland WG, Sy JP (1994) A stochastic model for analysis of longitudinal AIDS data. J Am Stat Assoc 89:727–736

    Article  MATH  Google Scholar 

  • Thijs H, Molenberghs G, Michiels B, Verbeke G, Curran D (2002) Strategies to fit pattern-mixture models. Biostatistics 3:245–265

    Article  MATH  Google Scholar 

  • Troxel AB, Harrington DP, Lipsitz SR (1998a) Analysis of longitudinal data with nonignorable nonmonotone missing values. Appl Stat 47:425–438

    MATH  Google Scholar 

  • Troxel AB, Lipsitz SR, Harrington DP (1998b) Marginal models for the analysis of longitudinal measurements with nonignorable non-monotone missing data. Biometrika 85:661–672

    Article  MATH  MathSciNet  Google Scholar 

  • Tsiatis AA, DeGruttola V, Wulfsohn MS (1995) Modeling the relationship of survival to longitudinal data measured with error. Applications to survival and CD4 counts in patients with AIDS. J Am Stat Assoc 90:27–37

    Article  MATH  Google Scholar 

  • Verbeke G, Molenberghs G (2000) Linear mixed models for longitudinal data. Springer, New York

    MATH  Google Scholar 

  • Wedderburn RWM (1974) Quasi-likelihood methods, generalised linear models, and the Gauss–Newton method. Biometrika 61:439–447

    MATH  MathSciNet  Google Scholar 

  • Wei GC, Tanner MA (1990) A Monte Carlo implementation of the EM algorithm and the poor man’s data augmentation algorithms. J Am Stat Assoc 85:699–704

    Article  Google Scholar 

  • Wolfinger R, O’Connell M (1993) Generalized linear models: a pseudo-likelihood approach. J Stat Comput Simul 48:233–243

    Article  MATH  Google Scholar 

  • Woolson RF, Clarke WR (1984) Analysis of categorical incomplete longitudinal data. J R Stat Soc Ser A 147:87–99

    Article  Google Scholar 

  • Wu MC, Bailey KR (1988) Analysing changes in the presence of informative right censoring caused by death and withdrawal. Stat Med 7:337–346

    Article  Google Scholar 

  • Wu MC, Carroll RJ (1988) Estimation and comparison of changes in the presence of informative right censoring by modeling the censoring process. Biometrics 44:175–188

    Article  MATH  MathSciNet  Google Scholar 

  • Wu MC, Bailey KR (1989) Estimation and comparison of changes in the presence of informative right censoring: conditional linear model. Biometrics 45:939–955

    Article  MATH  MathSciNet  Google Scholar 

  • Xu J, Zeger SL (2001) Joint analysis of longitudinal data comprising repeated measures and times to events. Appl Stat 50:375–387

    MATH  MathSciNet  Google Scholar 

  • Zeger SL, Liang K-Y (1986) Longitudinal data analysis for discrete and continuous outcomes. Biometrics 42:121–130

    Article  Google Scholar 

  • Zhu H-T, Lee S-Y (2001) Local influence for incomplete-data models. J R Stat Soc Ser B 63:111–126

    Article  MATH  MathSciNet  Google Scholar 

  • Zhu H, Ibrahim JG, Shi X (2009) Diagnostic measures for generalized linear models with missing covariates. Scand J Stat (in press)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Joseph G. Ibrahim.

Additional information

This invited paper is discussed in the comments available at: http://dx.doi.org/10.1007/s11749-009-0139-9, http://dx.doi.org/10.1007/s11749-009-0140-3, http://dx.doi.org/10.1007/s11749-009-0141-2, http://dx.doi.org/10.1007/s11749-009-0142-1, http://dx.doi.org/10.1007/s11749-009-0143-0.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ibrahim, J.G., Molenberghs, G. Missing data methods in longitudinal studies: a review. TEST 18, 1–43 (2009). https://doi.org/10.1007/s11749-009-0138-x

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11749-009-0138-x

Keywords

Mathematics Subject Classification (2000)

Navigation