Abstract
Background
Missing data are potentially an extensive problem in cost-effectiveness analyses conducted alongside randomised clinical trials, where prospective collection of both resource use and health outcome information is required. There are several possible reasons for the presence of incomplete records, and the validity of the analysis in the presence of data with missing values is dependent upon the mechanism generating the missing data phenomenon. In the past, the most commonly used methods for analysing datasets with incomplete observations were relatively ad hoc (e.g. case deletion, mean imputation) and suffered from potential limitations. Recently, several alternative and more sophisticated approaches (e.g. multiple imputation) have been proposed that attempt to correct the flaws of the simple imputation methods.
Objectives
The objectives are to provide a concise and accessible description of the quantitative methods most commonly used in trial-based cost-effectiveness analysis for handling missing data, and also to demonstrate the potential impact of these alternative approaches on the cost-effectiveness results reported in two case studies.
Methods
Data from two recently conducted, trial-based economic evaluations are used to explore the sensitivity of the study results to the technique used to deal with incomplete observations. A statistical framework for representing the uncertainty in the alternative methods is outlined using an approach based on net benefits and cost-effectiveness acceptability curves.
Results
The case studies demonstrate the potential importance of the approach used to handle missing data. Although the analytical strategy did not appear to alter the results of one of the studies, the other case study showed that that the results of the cost-effectiveness analysis were sensitive to both the decision to impute and also the imputation strategy adopted.
Conclusions
Analysts should be more explicit in reporting the analytical strategies applied in the presence of missing data. The use of a multiple imputation approach is recommended in the majority of cases, so as to adequately reflect the uncertainty in the study results due to the presence of missing data.
Similar content being viewed by others
Notes
1One scenario in which the CCA estimator is clearly biased is when costs and effects are censored. Lin and colleagues[21] discussed the issue of cost estimation in the presence of censored survival times for some patients in the study. They suggested that the uncensored-cases’ estimator (i.e. CCA) is biased towards the costs of the patients with shorter survival times, because patients with longer survival times are more likely to be censored.
2For an extensive list of papers and reports on both theoretical developments and applications of MI, and for a list of available software to generate MIs, see the website http://www.multiple-imputation.com.
References
Thompson SG, Barber JA. How should cost data in pragmatic randomised trials be analysed? BMJ 2000; 320(7243): 1197–200
Briggs AH, Clark T, Wolstenholme J, et al. Missing… presumed at random: cost-analysis of incomplete data. Health Econ 2003; 12: 377–92
Oostenbrink JB, Al MJ, Rutten-van Molken MPMH. Methods to analyse cost data of patients who withdraw in a clinical trial setting. Pharmacoeconomics 2003; 21(15): 1103–12
Crawford SL, Tennstedt SL, McKinlay JB. A comparison of analytic methods for non-random missingness of outcome data. J Clin Epidemiol 1995; 48: 209–19
Engels JM, Diehr P. Imputation of missing longitudinal data: a comparison of methods. J Clin Epidemiol 2003; 56: 968–76
Liu G, Gould AL. Comparison of alternative strategies for analysis of longitudinal trials with dropouts. J Biopharm Stat 2002; 12(2): 207–26
Musil CM, Warner CB, Yobas PK, et al. A comparison of imputation techniques for handling missing data. West J Nurs Res 2002; 24(7): 815–29
Myers WR. Handling missing data in clinical trials: an overview. Drug Inf J 2000; 34: 525–33
Stinnett A, Mullahy J. Net health benefits: a new framework for the analysis of uncertainty in cost-effectiveness analysis. Med Decis Making 1998; 18: S68–80
Barnard J, Meng XL. Applications of multiple imputation in medical studies: from AIDS to NHANES. Stat Methods Med Res 1999; 8(1): 17–36
Bernhard J, Cella DF, Coates AS, et al. Missing quality of life data in clinical trials: serious problems and challenges. Stat Med 1998; 17: 517–32
Little RJA, Rubin DB. Statistical analysis with missing data. 1st ed. New York: John Wiley and Sons, 1987
Rubin DB. Multiple imputation for nonresponse in surveys. New York: John Wiley and Sons, 1987
Little RJA. Pattern-mixture models for multivariate incomplete data. J Am Stat Assoc 1993; 88: 125–34
Miechiels B, Molenberghs G, Lipsitz SR. Selection models and pattern-mixture models for incomplete data with covariates. Biometrics 1999; 55: 978–83
Curran D, Molenberghs G, Aaronson NK, et al. Analysing longitudinal continuous quality of life data with dropout. Stat Methods Med Res 2002; 11(1): 5–23
Schafer JL, Rubin DB. Multiple imputation for missing-data problems. Short course presented at the Joint Statistical Meeting. Co-sponsored by the Survery Reseach Methods Section and the Biometrics Section, American Statistical Association; 1998 Aug 12; Dallas (TX)
Dempster AP, Laird NM, Rubin DB. Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B 1977; 39: 1–38
Schafer JL. Analysis of incomplete multivariate data. London: Chapman and Hall, 1997
Gilks WR, Richardson S, Spiegelhalter DJ. Markov chain Monte Carlo in practice. London: Chapman and Hall, 1996
Lin DY, Feuer EJ, Etzioni R, et al. Estimating medical costs from incomplete follow-up data. Biometrics 1997; 53: 419–34
Little RJA, Rubin DB. The analysis of social science data with missing values. In: Fox J, editor. Modern methods of data analysis. Newbury Park (CA): Sage Publications Inc., 1990
Schafer JL. Multiple imputation: a primer. Stat Methods Med Res 1999; 8: 3–15
Rubin DB. Multiple imputation after 18+ years. J Am Stat Assoc 1996; 91: 473–89
Rubin DB, Schenker N. Multiple imputation in health care databases: an overview and some applications. Stat Med 1991; 10: 585–98
Statistical Solutions. SOLAS™ for missing data analysis 2.1 [computer program]. Cork, Ireland: Statistical Solutions, 1999
Tanner MA, Wong WH. The calculation of posterior distributions by data augmentation (with discussion). J Am Stat Assoc 1987; 82: 528–50
van Hout BA, Al MJ, Gordon GS, et al. Costs, effects and c/e-ratios alongside a clinical trial. Health Econ 1994; 3: 309–19
Fenwick E, Claxton K, Sculpher MJ. Representing uncertainty: the role of cost-effectiveness acceptability curves. Health Econ 2001; 10: 779–89
Scott J, Palmer S, Paykel ES, et al. Use of cognitive therapy for relapse prevention in chronic depression: cost-effectiveness study. Br J Psychiatry 2003; 182: 221–7
Paykel ES, Scott J, Teasdale J, et al. Prevention of relapse in residual depression by cognitive therapy: a controlled trial. Arch Gen Psychiatry 1999; 56: 829–35
Efron B, Tibshirani R. An introduction to the bootstrap. New York: Chapman and Hall, 1993
Manca A, Sculpher MJ, Ward K, et al. A cost-utility analysis of tension-free vaginal tape versus colposuspension for primary urodynamic stress incontinence. BJOG 2003; 110(3): 255–62
Lambert PC, Billingham LJ, Cooper NJ, et al. Estimating the cost-effectiveness of an intervention in a clinical trial when partial cost information is available: a Bayesian approach. Leicester: Department of Health Sciences, University of Leicester, 2003. Technical Report no.: 03/03
Rubin HR, Stern HS, Vehovar V. Handling “don’t know” survey responses: the case of the Slovenian plebiscite. J Am Stat Assoc 1995; 90: 822–8
Little RJA. A test of missing completely at random for multivariate data with missing values. J Am Stat Assoc 1988; 83(404): 1198–202
Curran D, Bacchi M, Hsu Schmitz SF, et al. Identifying the types of missingness in quality of life data from clinical trials. Stat Med 1998; 17: 547–55
Willan AR, Briggs AH, Hoch JS. Regression methods for covariate adjustment and subgroup analysis for non-censored cost-effectiveness data. Health Econ 2004; 13(5): 461–75
Landrum MB, Becker MP. A multiple imputation strategy for incomplete longitudinal data. Stat Med 2001; 20(17-18): 2741–60
Little RJA, Yao L. Intent-to-treat analysis for longitudinal studies with drop-outs. Biometrics 1996; 52: 1324–33
Best NG, Spiegelhalter DJ, Thomas A, et al. Bayesian analysis of realistically complex models. J R Stat Soc [Ser A] 1996; 159: 323–42
Acknowledgements
The present research was funded by a grant awarded to Dr Manca by the University of York. An earlier version of this manuscript was presented at the UK Health Economists’ Study Group Conference, September 12–14 2001, City University, London, UK. ## We are grateful to Mark Sculpher and Susan Griffin for their comments during the preparation of this manuscript. Thanks to the anonymous referee for useful comments and suggestions for clarification. Any mistakes and omissions remain the authors’. ## The authors have no conflicts of interest that are directly relevant to the content of this article.
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
Using the notation provided in Schafer,[23] suppose the analyst is interested in analysing the variable Y (e.g. patient-level net benefit), part of which is observed (Yobs) and part is missing (Ymis).
Let the parameter ^Q be the statistic of interest (i.e. mean costs, mean effects, mean net benefits) and Û its estimated variance, and suppose the researcher has created m completed datasets (equation 1):
S/he can now calculate m plausible estimates of the statistic of interest ^Q(1),…, ^Q(m) together with their estimated variances, Û(1),Û(2), …,Û(m).
In the univariate case, it is possible to combine these quantities to obtain an MI estimate of the statistic of interest, together with its variance, by first calculating the average value of -Q across the m datasets as (equation 2):
The total variance of this estimate can be obtained as (equation 3):
Which combines the ‘within-imputation variance’ (U) with the ‘between-imputation variance’ (B), where \(\overline U={1\over m}\sum^{m}_{k=1}U^{(l)}\) is the average of the variances across the m dataset, and \(B={1\over m-1}\sum^{m}_{k=1}(\hat Q^{(l)}-\overline Q)^2\) is the sample variance among the m estimates.
The analyst can then construct confidence intervals around the Q estimate using the Student’s t-test, with a number of degrees of freedom that are a function of m, B and U.
Rights and permissions
About this article
Cite this article
Manca, A., Palmer, S. Handling missing data in patient-level cost-effectiveness analysis alongside randomised clinical trials. Appl Health Econ Health Policy 4, 65–75 (2005). https://doi.org/10.2165/00148365-200504020-00001
Published:
Issue Date:
DOI: https://doi.org/10.2165/00148365-200504020-00001