Research and practice methods
Comparison of small-area analysis techniques for estimating county-level outcomes

https://doi.org/10.1016/j.amepre.2004.02.004Get rights and content

Abstract

Background

Since many health data are unavailable at the county level, policymakers sometimes rely on state-level datasets to understand the health needs of their communities. This can be accomplished using small-area estimation techniques. However, it is unknown which small- area technique produces the most valid and precise results.

Methods

The reliability and accuracy of three methods used in small-area analyses were examined, including the synthetic method, spatial smoothing, and regression. To do this, severe work disability measures were first validated by comparing the 2000 Behavioral Risk Factor Surveillance System (BRFSS) and Census 2000 measures (used as the gold standard). The three small-area analysis methods were then applied to 2000 BRFSS data to examine how well each technique predicted county-level disability prevalence.

Results

The regression method produces the most valid and precise estimates of county-level disability prevalence over a large number of counties when a single year of data is used.

Conclusions

Local health departments and policymakers who need to track trends in behavioral risk factors and health status within their counties should utilize the regression method unless their county is large enough for direct estimation of the outcome of interest.

Introduction

S tate and local health departments, community-based organizations, and policymakers need valid and precise local data for program planning, program evaluation, and resource allocation.1, 2 However, while there are rich and varied sources of national health data, few local health data are available for most counties, and local health departments and community-based organizations often lack the resources to collect data on their own. The Behavioral Risk Factor Surveillance System (BRFSS), which was designed to help improve local decision making by providing state-level data, provides an array of annual health behavior and health status information.3, 4 However, in the BRFSS, the sample size at the county level is often too small to estimate local outcomes.4, 5, 6

Local health agencies commonly rely on temporal (i.e., combine several years of data for one county) or spatial (i.e., combine several counties together) data aggregation to increase sample size and therefore increase precision. However, estimates based on temporally aggregated data cannot show time-trend differences for smaller counties, and area differences require spatial delineation. Many statistical procedures, which fall under the rubric of “small-area analyses,” have been developed to help fill this void.7, 8 These include the synthetic method, spatial data smoothing, and regression analysis. The synthetic method applies statistics for the nation as a whole to local areas based on each area's demographic characteristics.9 Spatial data smoothing uses data from neighboring communities and the area of interest to calculate a weighted moving average value of neighboring areas.10 Finally, the local area outcomes can be estimated through multivariate regression with area-specific data as predicators.8, 11 To our knowledge, the validity and precision of the BRFSS for county-level analyses have not been examined across these methods.

One useful statistic for local planners is the prevalence of severe work disability. Severe work disability is defined as inability to work due to physical or mental conditions.12, 13 Since 1993, the BRFSS has included a number of questions that measure respondents' perceived health status and activity limitation.14 A measure of severe work disability based on these survey instruments has been developed and has shown good to excellent validity.15 The Centers for Disease Control and Prevention (CDC) subsequently promoted the BRFSS as a potential source of annual disability data.16 Likewise, the decennial census contains a measure of severe work disability.17 While slightly different from the measure contained in the BRFSS, the census measure can be used to validate county-level BRFSS disability data, since it was designed with a sampling frame large enough to provide precise and reliable estimates for geographic regions much smaller than a county.15, 17

The main objective of this study was to investigate various methods for estimating the annual prevalence of county-level severe work disability using the BRFSS (referred to henceforth as “county-level disability prevalence”). Disability is used as a proxy measure to examine the validity and precision of three small-area analysis techniques for other measures or variables as well. The challenge has been to obtain stable estimates of patterns of disease or risk factors for disease, and to allow for relative comparisons with the county of interest.

Section snippets

Materials and methods

County-level disability prevalence was examined using 2000 BRFSS data and Census 2000 disability data validated the estimates. The performance of three small-area estimation methods—the synthetic method, spatial data smoothing, and regression analysis—was then compared. A working regression procedure was provided that may help local health officials and other planners understand the needs of the communities they serve, track their progress from year to year, and make comparisons between their

Validations

County estimates were validated by comparing them to county-level disability prevalence data from Census 2000. The census was chosen to serve as the “gold standard” for severe work disability because the census data were designed to provide a precise estimation of severe work disability rates at the county level.

Discrepancies between the census (ri) and the BRFSS (pi) rates were examined by:

  • 1.

    Scatter plots of the BRFSS estimates versus the census rates

  • 2.

    Pearson (ρ) and Spearman (s) correlation

Nationwide and state-level estimates of disability

In 2000, the BRFSS sampled a total of 146,018 adults aged 18 to 64 years. The overall self-reported severe work disability prevalence was 4.57% (95% confidence interval [CI]=4.35%–4.79%), which was very close to the estimated prevalence of disability obtained from census data (4.43%). Individuals aged 55 to 64 years were approximately six times more likely to report being disabled than those aged 18 to 24 (12.37% vs 2.07%, p <0.001). The increase in risk with age only occurred for persons aged

Discussion

The regression method was found to be superior to the synthetic method and data smoothing with respect to correlation with census data and other discrepancy statistics. We also demonstrated that it can be easily deployed.

To date, the synthetic method has been the most frequently employed technique in the field of public health, perhaps due to its ease of use.9, 25 However, this method is biased and lacks specificity for tracking changes over time or examining local patterns of diseases because

Acknowledgements

We are grateful to Jiming Jiang, PhD, Joseph Sedransk, PhD, David Moriarty, and Matthew Zack, MD, for their contribution to this research. The study was supported in part by an Association of Teachers of Preventive Medicine/Centers for Disease Control and Prevention/Agency for Toxic Substances and Disease Registry Cooperative Agreement (TS 221-12/12), with funding provided by the National Center for Environmental Health, Office on Disability and Health.

References (28)

  • L.W. Pickle et al.

    Within-state geographic patterns of health insurance coverage and health risk factors in the United States

    Am J Prev Med

    (2002)
  • M.J. Douglas et al.

    Achieving better health through health impact assessment

    Health Bull

    (2001)
  • P. Paul-Shaheen et al.

    Small area analysisa review and analysis of the North American literature

    J Health Policy Law

    (1987)
  • P.L. Remington et al.

    Design, characteristics, and usefulness of state-based Behavioral Risk Factor Surveillance1981–87

    Public Health Rep

    (1988)
  • Frazier EL, Franks AL, Sanderson LM. Using Behavioral Risk Factor Surveillance data. In: Using chronic disease data: a...
  • Kim I, Keppel KG. Priority data needs: sources of national, state and local-level data and data collection systems. In:...
  • Centers for Disease Control and Prevention. Surveillance summaries. MMWR Surveill Summ...
  • N.J. Purcell et al.

    Estimation for small domains

    Biometrics

    (1979)
  • M. Ghosh et al.

    Small area estimationan appraisal (with comments)

    Stat Sci

    (1994)
  • National Center for Health Statistics. Synthetic estimation of state health characteristics based on the Health...
  • E.P. Ericksen

    A regression method for estimating population changes of local areas

    J Am Stat Assoc

    (1974)
  • Domzal C. Disability demographics and definitions. Washington DC: National Institute on Disability and Rehabilitation...
  • LaPlante MP. State estimates of disability in America. Washington DC: National Institute on Disability and...
  • C.H. Hennessy et al.

    Measuring health-related quality of life for public health surveillance

    Public Health Rep

    (1994)
  • Cited by (55)

    • Developing a surveillance system of sub-county data: Finding suitable population thresholds for geographic aggregations

      2020, Spatial and Spatio-temporal Epidemiology
      Citation Excerpt :

      This illustrates the fact that there are certain health outcomes (i.e., those with a median case count less than 1.9 cases) where the proposed aggregation schemes will not work for annual data and temporal aggregation will be required. However, temporal aggregation would not allow for examination of time-trend differences (Jia et al., 2004). Additionally, any area differences that are noted where geographic aggregation is used would require spatial delineation, where possible, to explore these differences further (Jia et al., 2004).

    • Health-related quality of life among adults 65 years and older in the United States, 2011–2012: a multilevel small area estimation approach

      2017, Annals of Epidemiology
      Citation Excerpt :

      For the county-level, mean square error, mean absolute difference, and mean relative absolute difference were 0.001 versus 0.004, 0.03 versus 0.06, 0.12 versus 0.33, for PH and MH, respectively. Validate estimates for PH is better than MH (smaller values are better) [26]. Among individual characteristics, males had a lower probability of reporting high physical or mental unhealthy days (odds ratio [OR] = 0.82, P < .0001 and OR = 0.59, P < .0001, respectively).

    View all citing articles on Scopus
    View full text