Abstract
Many longitudinal studies in field settings present challenges due to selection bias and incomplete data. A motivating example is provided by an intervention study aimed at preventing HIV transmission among runaway youths housed at shelters in New York City. Two shelters with 167 youths received the intervention, and two shelters with 144 youths received the control treatment. The number of unprotected sexual acts in the prior three months for each youth was assessed at a baseline interview and (to the extent possible) at five follow-up time points. Among observed items, there is strong evidence of a lack of balance on baseline characteristics between the intervention and control groups; meanwhile, beyond occasional missing items among participants interviewed at baseline, there were three items about baseline characteristics added after the study began, resulting in a few items being missing on a large percentage of the study sample. Here, we outline two strategies for handling the complexities of this data set, both of which make use of propensity scores to address the imbalances across treatment groups. One approach relies on available cases and ad hoc choices to simplify the steps leading up to a linear mixed model analysis in SAS PROC MIXED; the other approach uses multiple imputation strategies to reflect uncertainty due to missing values in an analogous linear mixed model analysis. Ultimately we did not find substantial qualitative differences in this setting between the available-case and imputed-data approaches. But in both cases, we find that the considerable imbalance on covariates between treatment arms constrains the ability to draw inferences about the intervention effect, suggesting the importance of evaluating propensity-score distributions in quasi-experimental intervention research.
Similar content being viewed by others
References
W. G. Cochran, “The effectiveness of adjustment by subclassification in removing bias in observational studies,” Biometrics, 24, pp. 205-213, 1968.
R. B. D'Agostino and D. B. Rubin, “Estimating and using propensity scores with partially missing data,” Journal of the American Statistical Association, 95, pp. 749-759, 2000.
P. J. Diggle, K.-Y. Liang and S. L. Zeger, Analysis of Longitudinal Data, Clarendon Press, Oxford, 1994.
N. M. Laird and J. H. Ware. “Random-effects models for longitudinal data,” Biometrics, 38, PP. 963-974, 1982.
P. W. Lavori, R. Dawson and D. Shera, “A multiple imputation strategy for clinical trials with truncation of patient data,” Statistics in Medicine, 14, pp. 1913-1925, 1995.
R. C. Littell, G. A. Milliken, W.W. Stroup and R. D. Wolfinger, SAS System for Mixed Models, SAS Institute, Inc., Cary, NC, 1996.
R. J. A. Little and D. B. Rubin, Statistical Analysis with Missing Data, John Wiley, New York, 1987.
M. Liu, J. M. G. Taylor and T. R. Belin, “Multiple imputation and posterior simulation for multivariate missing data for longitudinal studies,” Biometrics, 56, pp. 1157-1163, 1995.
X. Meng, “Multiple-imputation inferences with uncongenial sources of input,” Statistical Science, 9, pp. 538-558, 1994.
J. M. Robins, A. Rotnitzky and L. P. Zhao, “Analysis of semiparametric regression models for repeated outcomes in the presence of missing data,” Journal of the American Statistical Association, 90, pp. 106-121, 1995.
P. R. Rosenbaum, Observational Studies, Springer-Verlag, New York, 1995.
P. R. Rosenbaum and D. B. Rubin, “The central role of the propensity score in observational studies for causal effects,” Biometrika, 70, pp. 41-55, 1983.
P. R. Rosenbaum and D. B. Rubin, “Reducing bias in observational studies using subclassification on the propensity score,” Journal of the American Statistical Association, 79, pp. 516-524, 1984.
D. B. Rubin, Multiple Imputation for Nonresponse in Surveys, John Wiley, New York, 1987.
D. B. Rubin and N. Thomas, “Characterizing the effect of matching using linear propensity score methods with normal distributions,” Biometrika, 79, pp. 797-809, 1992.
J. L. Schafer, Analysis of Incomplete Multivariate Data, Chapman & Hall, New York, 1997.
J. L. Schafer, Multivariate linear mixed-effects models with missing values, Unpublished technical report, Department of Statistics, Penn State University, In electronic form at http:==www.stat.psu.edu=_jls, 1999.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Song, J., Belin, T.R., Lee, M.B. et al. Handling Baseline Differences and Missing Items in a Longitudinal Study of HIV Risk Among Runaway Youths. Health Services & Outcomes Research Methodology 2, 317–329 (2001). https://doi.org/10.1023/A:1020327530029
Issue Date:
DOI: https://doi.org/10.1023/A:1020327530029