Skip to main content
Log in

Analysis of Large-Scale Secondary Data in Higher Education Research: Potential Perils Associated with Complex Sampling Designs

  • Published:
Research in Higher Education Aims and scope Submit manuscript

Abstract

Most large-scale secondary data sets used in higher education research (e.g., NPSAS or BPS) are constructed using complex survey sample designs where the population of interest is stratified on a number of dimensions and oversampled within certain of these strata. Moreover, these complex sample designs often cluster lower level units (e.g., students) within higher level units (e.g., colleges) to achieve efficiencies in the sampling process. Ignoring oversampling (unequal probability of selection) in complex survey designs presents problems when trying to make inferences—data from these designs are, in their raw form, admittedly nonrepresentative of the population to which they are designed to generalize. Ignoring the clustering of observations in these sampling designs presents a second set of problems when making inferences about variability in the population and testing hypotheses and usually leads to an increased likelihood of committing Type I errors (declaring something as an effect when in fact it is not). This article presents an extended example using complex sample survey data to demonstrate how researchers can address problems associated with oversampling and clustering of observations in these designs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

REFERENCES

  • Barcikowski, R. (1981). Statistical power with group mean as the unit of analysis. Journal of Educational Statistics 6: 267-285.

    Google Scholar 

  • Bryk, A. S., and Raudenbush, S. W., (1992). Hierarchical Linear Models. Newbury Park, CA: Sage Publications, Inc.

    Google Scholar 

  • de Leeuw, J., and Kreft, I. G. (1995). Questioning multilevel models. Journal of Educational Statistics 20: 171-189.

    Google Scholar 

  • Dempster, A., Laird, N., and Rubin, D. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B 30: 1-38.

    Google Scholar 

  • Dey, E. L., Hurtado, S., Rhee, B. S., Inkelas, K. K., Wimsatt, L. A., and Guan, F. (1997). Improving Research on Postsecondary Outcomes. Palo Alto, CA: National Center for Postsecondary Improvement, Stanford University.

    Google Scholar 

  • Fuller. (1975). Regression analysis for sample survey. Sankhα New York: Springer Verlag.

    Google Scholar 

  • Kalton, G. (1983). Introduction to Survey Sampling. Sage University Paper series on Quantitative Applications in the Social Sciences, series no. 07-035. Beverly Hills: Sage Publications, Inc.

    Google Scholar 

  • Kish, L. (1965). Survey Sampling Principles. New York: Marcel Dekker, Inc.

    Google Scholar 

  • MLwinN [Computer software]. (1999). London: Institute of Education.

  • MPlus [Computer software]. (1999). Los Angeles: Muthén & Muthén.

  • Muthen, B. O., and Satorra, A. (1995). Complex sample data in structural equation modeling. In P. Marsden (ed.), Sociological Methodology, pp. 267-316. Washington, DC: American Sociological Association.

    Google Scholar 

  • National Center for Education Statistics. (1995). Methodology Report for the 1993 National Postsecondary Student Aid Study. Washington, DC: Author.

    Google Scholar 

  • National Center for Education Statistics. (1996). Baccalaureate and Beyond Longitudinal Study: 1993/94 First Follow-up Methodology Report. Washington, DC: Author.

    Google Scholar 

  • PCCARP [Computer software]. (1989). Ames, IA: Statistical Laboratory, Iowa State University.

  • Rust, K. (1985). Variance estimation for complex estimators in sample surveys. Journal of Official Statistics 4: 381-397.

    Google Scholar 

  • SAS [Computer software]. (1999). Cary, NC: SAS Institute, Inc.

  • SPSS [Computer software]. (1999). Chicago: SPSS, Inc.

  • SUDAAN [Computer Software]. (1999). Research Triangle Park, NC: Research Triangle Institute.

  • WesVar Complex Samples [Computer Software]. (1998). Rockville, MD: Weststat.

  • Wolter, K. M. (1985). An Introduction to Variance Estimation. New York: Springer.

    Google Scholar 

  • Woodruff, R. S. (1971). A simple method for approximating the variance of a complicated estimate. Journal of the American Statistical Association 66: 411-414.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Thomas, S.L., Heck, R.H. Analysis of Large-Scale Secondary Data in Higher Education Research: Potential Perils Associated with Complex Sampling Designs. Research in Higher Education 42, 517–540 (2001). https://doi.org/10.1023/A:1011098109834

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1011098109834

Navigation