Abstract
A nonparametric bootstrap was used to obtain an interval estimate of Pearson’s r, and test the null hypothesis that there was no association between 5th grade students’ positive substance use expectancies and their intentions to not use substances. The students were participating in a substance use prevention program in which the unit of randomization was a public middle school. The bootstrap estimate indicated that expectancies explained 21% of the variability in students’ intentions (r = 0.46, 95% CI = [0.40, 0.50]). This case study illustrates the use of a nonparametric bootstrap with cluster randomized data and the danger posed if outliers are not identified and addressed. Editors’ Strategic Implications: Prevention researchers will benefit from the authors’ detailed description of this nonparametric bootstrap approach for cluster randomized data and their thoughtful discussion of the potential impact of cluster sizes and outliers.
Similar content being viewed by others
References
Altman, D. G. (2000). Statistics in medical journals: Some recent trends. Statistics in Medicine, 19, 3275–3289.
Andrews, D. W. K. (2000). Inconsistency of the bootstrap when a parameter is on the boundary of the parameter space. Econometrika, 68, 399–405.
Bieler, G. S., & Williams, R. L. (1995). Cluster sampling techniques in quantal response teratology and developmental toxicity studies. Biometrics, 51, 764–776.
Bryk, A. S., & Raudenbush, S. W. (1992). Hierarchical linear models: Applications and data analysis methods. Newbury Park, CA: Sage.
Buckland, S. T. (1984). Monte Carlo confidence intervals. Biometrics, 40, 811–817.
Carpenter, J., & Bithell, J. (2000). Bootstrap confidence intervals: When? Which? What? A practical guide to medical statisticians. Statistics in Medicine, 19, 1141–1164.
Carpenter, J. R., Goldstein, H., & Rasbash, J. (2003). A novel bootstrap procedure for assessing the relationship between class size and achievement. Applied Statistics, 52, 431–443.
Chan, W., & Chan, D. W.-L. (2004). Bootstrap standard error and confidence interval for the correlation corrected for range restriction: A simulation study. Psychological Methods, 9, 369–385.
Cook, R. D. (1977). Detection of influential observations in linear regression. Technometrics, 19, 15–18.
Cornfield, J. (1978). Randomization by group: A formal analysis. American Journal of Epidemiology, 108, 100–102.
Davison, A. C., & Hinkley, D. V. (1997). Bootstrap methods and their application. New York: Cambridge University Press.
Derzon, J. (2007). Using correlational evidence to select youth for prevention programming. The Journal of Primary Prevention, 28, 421–447.
DiCiccio, T. J., & Efron, B. (1996). Bootstrap confidence intervals. Statistical Science, 13, 189–228.
Donner, A. (1998). Some aspects of the design and analysis of cluster randomization trials. Applied Statistics, 47, 95–113.
Donner, A., & Klar, N. (1999). Design and analysis of cluster randomization trials in health research. New York: Oxford University Press.
Efron, B. (1979). Bootstrap methods: Another look at the Jackknife. Annals of Statistics, 7, 1–26.
Efron, B. (1987). Better bootstrap confidence intervals. Journal of the American Statistical Association, 82, 171–185.
Efron, B., & Gong, G. (1983). A leisurely look at the bootstrap, the jackknife, and cross-validation. The American Statistician, 37, 36–48.
Efron, B., & Tibshirani, R. (1986). Bootstrap methods for standard errors, confidence intervals, and other measures of statistical accuracy. Statistical Science, 1, 54–77.
Efron, B., & Tibshirani, R. (1993). An introduction to the bootstrap. New York: Chapman & Hall.
Field, C. A., & Welsh, A. H. (2007). Bootstrapping clustered data. Journal of the Royal Statistical Society, Series B, 69, 369–390.
Goldstein, H. (1995). Multilevel statistical models (2nd ed.). London: Edward Arnold.
Hall, P. (1986). On the number of bootstrap simulations required to construct a confidence interval. Annals of Statistics, 14, 1453–1462.
Hall, P. (1988). Theoretical comparison of bootstrap confidence intervals. Annals of Statistics, 16, 927–953.
Keen, K., & Elston, R. C. (2003). Robust asymptotic theory for correlations in pedigrees. Statistics in Medicine, 22, 3229–3247.
Kelly, K. (2005). The effects of nonnormal distributions on confidence intervals around the standardized mean difference: Bootstrap and parametric confidence intervals. Educational and Psychological Measurement, 65, 51–69.
Kelly, K., & Maxwell, S. E. (2003). Sample size for multiple regression: Obtaining regression coefficients that are accurate, not simply significant. Psychological Methods, 8, 305–321.
Kish, L. (1957). Confidence intervals for clustered samples. American Sociological Review, 22, 154–165.
Kish, L., & Frankel, M. R. (1974). Inference from complex samples. Journal of the Royal Statistical Society, Series B, 36, 1–37.
Korn, E. L., & Graubard, B. I. (1999). Analysis of health surveys. New York: Wiley.
LaVange, L. M., Keys, L. L., Koch, G. G., & Margolis, P. A. (1994). Application of sample dose-response modeling ratios to incidence densities. Statistics in Medicine, 13, 343–355.
Levy, P. S., & Lemeshow, S. (1999). Sampling of populations: Methods and applications (3rd ed.). New York: Wiley.
Localio, A. R., Sharp, T. J., & Landis, J. R. (1995). Analysis of clustered categorical data in an experimental design: Sample survey methods compared to alternatives. Proceedings of the Biometrics Section, American Statistical Association, 71–76.
Manly, B. F. J. (1997). Randomization, bootstrap and Monte Carlo methods in biology (2nd ed.). London: Chapman & Hall.
Maxwell, S. E. (2004). The persistence of underpowered studies in psychological research: Causes, consequences, and remedies. Psychological Methods, 9, 147–163.
Murray, D. M. (1998). Design and analysis of group-randomized trials. New York: Oxford University Press.
Myers, J. L., DiCecco, J. V., & Lorch, R. F., Jr. (1981). Group dynamics and individual differences: Pseudogroup and quasi-F analyses. Journal of Personality and Social Psychology, 40, 86–98.
Ren, S., Yang, S., & Lai, S. (2006). Intraclass correlation coefficients and bootstrap methods of hierarchical binary outcomes. Statistics in Medicine, 25, 3576–3588.
Rosenthal, R. (1991). Meta-analytic procedures for social research. Newbury Park, CA: Sage. revised edition.
Rosner, B., Donner, A., & Hennekens, C. H. (1977). Estimation of interclass correlation from familial data. Applied Statistics, 26, 179–187.
Shao, J. (2003). Impact of the bootstrap on sample surveys. Statistical Science, 18, 191–198.
Sribney, B. (2001). How can I estimate correlations and their level of significance with survey data? Retrieved March 06, 2007 from http://www.stata.com/support/faqs/stat/survey.html.
Stacy, A. W., Widaman, K. F., & MarLatt, G. A. (1990). Expectancy models of alcohol use. Journal of Personality and Social Psychology, 58, 918–928.
Stata Corporation. (2005). Stata statistical software: Release 9.0. College Station, TX: Author.
Ukoumunne, O. C., Davison, A. C., Gulliford, M. C., & Chinn, S. (2003). Non-parametric bootstrap confidence intervals for the intraclass correlation coefficient. Statistics in Medicine, 22, 3805–3821.
Walsh, J. E. (1947). Concerning the effect of intraclass correlation on certain significance tests. Annals of Mathematical Statistics, 18, 88–96.
Acknowledgments
The project described was supported by Grant Number DA005629 awarded by the National Institute On Drug Abuse to The Pennsylvania State University (Grant Recipient), Michael Hecht, Principal Investigator, with Arizona State University as the collaborating subcontractor. The data used in the present study would not have been available had it not been for the dedication of the Drug Resistance Strategies Project team members in Phoenix, Arizona. These researchers are led by Drs. Flavio Marsiglia, Stephen Kulis, and Patricia Dustman. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institute on Drug Abuse or the National Institutes of Health. Finally, we would like to thank Drs. Eric Loken and Michael Rovine for helpful comments and suggestions on the preparation of this article.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Wagstaff, D.A., Elek, E., Kulis, S. et al. Using a Nonparametric Bootstrap to Obtain a Confidence Interval for Pearson’s r with Cluster Randomized Data: A Case Study. J Primary Prevent 30, 497–512 (2009). https://doi.org/10.1007/s10935-009-0191-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10935-009-0191-y