Skip to main content
Log in

Using the Census Bureau’s surname list to improve estimates of race/ethnicity and associated disparities

  • Published:
Health Services and Outcomes Research Methodology Aims and scope Submit manuscript

An Erratum to this article was published on 30 October 2009

Abstract

Commercial health plans need member racial/ethnic information to address disparities, but often lack it. We incorporate the U.S. Census Bureau’s latest surname list into a previous Bayesian method that integrates surname and geocoded information to better impute self-reported race/ethnicity. We validate this approach with data from 1,921,133 enrollees of a national health plan. Overall, the new approach correlated highly with self-reported race-ethnicity (0.76), which is 19% more efficient than its predecessor (and 41% and 108% more efficient than single-source surname and address methods, respectively, P < 0.05 for all). The new approach has an overall concordance statistic (area under the Receiver Operating Curve or ROC) of 0.93. The largest improvements were in areas where prior performance was weakest (for Blacks and Asians). The new Census surname list accounts for about three-fourths of the variance explained in the new estimates. Imputing Native American and multiracial identities from surname and residence remains challenging.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

Notes

  1. The specific counts that were suppressed were also known.

  2. Exploratory analyses (not shown) demonstrated better overall predictive performance with this approach than with several alternatives we considered.

  3. Because the 2000 Census SF1 file includes an “other race” category not used in the Census surname list, we reassigned responses at the level of the block group using Iterative Proportional Fitting (Jirousek and Preucil 1995), an approach similar to that used by Word et al. (2008).

  4. We present the results treating surname information as the prior that is updated by the geocoded information; however, we would obtain the same results if we treated the geocoded information as the prior and updated with the surname data.

  5. Because the racial/ethnic categories are mutually exclusive, estimates for the groups are negatively correlated.

  6. A squared correlation of 0.49 between estimated race/ethnicity and self-reported implies approximately 49% efficiency relative to known race/ethnicity for estimating a disparity between two racial/ethnic groups under the assumptions in that paper.

References

  • Abrahamse, A.F., Morrison, P.A., Bolton, N.M.: Surname analysis for estimating local concentration of Hispanics and Asians. Popul. Res. Policy Rev. 13(4), 383–398 (1994). doi:10.1007/BF01084115

    Article  Google Scholar 

  • Boston Public Health Commission: Data Collection Regulation. Boston, MA (2006)

  • California State Senate: Senate Bill Analysis of SB 853. Sacramento, CA (2007)

  • Elliott, M.N., Finch, B.K., Klein, D.J., Ma, S., Do, P., Beckett, M.K., Orr, N., Lurie, N.: Sample designs for measuring the health of small racial ethnic subgroups. Stat. Med. 27(20), 4016–4029 (2008a). doi:10.1002/sim.3244

    Article  PubMed  Google Scholar 

  • Elliott, M.N., Fremont, A.M., Morrison, P.A., Pantoja, P., Lurie, N.: A new method for estimating race/ethnicity and associated disparities where administrative records lack self-reported race/ethnicity. Health Serv. Res. 43(5p1), 1722–1736 (2008b)

    Article  Google Scholar 

  • Elliott, M.N., Haviland, A.: Use of a web-based convenience sample to supplement and improve the accuracy of a probability sample. Surv. Methodol. 33(2), 211–215 (2007)

    Google Scholar 

  • Falkenstein, M.R.: The Asian and Pacific Islander surname list: as developed from Census 2000. In: Joint Statistical Meetings, New York, NY (2002)

  • Fiscella, K., Fremont, A.M.: Use of geocoding and surname analysis to estimate race and ethnicity. Health Serv. Res. 41(4 Pt 1), 1482–1500 (2006)

    PubMed  Google Scholar 

  • Fremont, A.M., Bierman, A.S., Wickstrom, S.L., Bird, C.E., Shah, M.M., Escarce, J.J., Rector, T.S.: Use of indirect measures of race/ethnicity and socioeconomic status in managed care settings to identify disparities in cardiovascular and diabetes care quality. Health Aff. 24(2), 516–526 (2005). doi:10.1377/hlthaff.24.2.516

    Article  Google Scholar 

  • Fremont, A.M., Lurie, N.: The Role of Race and Ethnic Data Collection in Eliminating Health Disparities. National Academies Press, Washington, DC (2004)

    Google Scholar 

  • Ghosh-Dastidar, B., Elliott, M.N., Haviland, A., Karoly, L.: Composite estimates from incomplete and complete frames for minimum-MSE estimation in a rare population: an application for families with young children. Public Opin. Q. (in press)

  • Hand, D.J., Till, R.J.: A simple generalisation of the area under the ROC curve for multiple class classification. Mach. Learn. 45(2), 171–186 (2001). doi:10.1023/A:1010920819831

    Article  Google Scholar 

  • Hanley, J.A., McNeil, B.J.: The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 143(1), 29–36 (1982)

    CAS  PubMed  Google Scholar 

  • Institute of Medicine: Unequal Treatment: Confronting Racial and Ethnic Disparities in Health Care. National Academies Press, Washington, DC (2002)

    Google Scholar 

  • Jirousek, R., Preucil, S.: On the effective implementation of the iterative proportional fitting procedure. Comput. Stat. Data Anal. 19(2), 177–189 (1995). doi:10.1016/0167-9473(93)E0055-9

    Article  Google Scholar 

  • Kestenbaum, B.B., Ferguson, R., Elo, I., Turra, C.: Hispanic identification. In: Southern Demographic Association Meetings, New Orleans, LA (2000)

  • Lauderdale, D., Kestenbaum, B.B.: Asian American ethnic identification by surname. Popul. Dev. Rev. 19(3), 283–300 (2000)

    Article  Google Scholar 

  • Logan, J.: Ethnic Diversity Grows, Neighborhood Integration Lags Behind. Lewis Mumford Center, University at Albany, Albany, NY (2001)

    Google Scholar 

  • Massey, D.S., Denton, N.A.: Hypersegregation in U.S. metropolitan areas: black and hispanic segregation along five dimensions. Demography 26(3), 373–391 (1989). doi:10.2307/2061599

    Article  CAS  PubMed  Google Scholar 

  • McCaffrey, D., Elliott, M.N.: Power of tests for a dichotomous independent variable measured with error. Health Serv. Res. 43(3), 1085–1101 (2008). doi:10.1111/j.1475-6773.2007.00810.x

    Article  PubMed  Google Scholar 

  • Morrison, P.A., Word, D.L., Coleman, C.D.: Using first names to estimate racial proportions in populations. In: Population Association of America Annual Meeting, Washington, DC (2001)

  • National Health Plan Collaborative: Phase 1 summary report: reducing racial and ethnic disparities improving quality of health care. Hamilton, NJ (2006)

  • National Research Council: Eliminating Health Disparities: Measurement and Data Needs. National Academies Press, Washington, DC (2004)

    Google Scholar 

  • Perkins, R.C.: Evaluating the Passel-Word Spanish Surname List: 1990 Decennial Census Post Enumeration Survey Results. U.S. Census Bureau, Population Division (1993)

  • Schenker, N., Parker, J.D.: From single-race reporting to multiple-race reporting: using imputation methods to bridge the transition. Stat. Med. 22(9), 1571–1587 (2003). doi:10.1002/sim.1512

    Article  PubMed  Google Scholar 

  • U.S. Office of Management of Budget: Revisions to the standards for the classifications of federal data on race and ethnicity. Notice. Federal Register, Washington, DC (1997)

  • Word, D.L., Coleman, C.D., Nunziata, R., Kominski, R.: Demographic aspects of surnames from Census 2000. Available at: http://www.census.gov/genealogy/www/surnames.pdf (2008). Accessed 30 July 2008

Download references

Acknowledgments

This study was supported, in part, by contract 282-00-0005, Task Order 13 from DHHS: Agency for Healthcare Research and Quality. Additional funding and support was provided by RWJF and the Brookings Institute. Marc Elliott is supported in part by the Centers for Disease Control and Prevention (CDC U48/DP000056). The authors thank Bryan GeoDemographics for their work in modifying SF1 Census files for these purposes and Jacquelyn Chou for assistance with manuscript preparation. We thank plans participating in the National Health Plan Collaborative, particularly Aetna, for sharing selected data to help improve efforts to address disparities in care and improve overall quality.

Disclaimer

The contents of the publication are solely the responsibility of the authors and do not necessarily reflect the official views of the DHHS.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marc N. Elliott.

Additional information

An erratum to this article can be found at http://dx.doi.org/10.1007/s10742-009-0055-1

Rights and permissions

Reprints and permissions

About this article

Cite this article

Elliott, M.N., Morrison, P.A., Fremont, A. et al. Using the Census Bureau’s surname list to improve estimates of race/ethnicity and associated disparities. Health Serv Outcomes Res Method 9, 69–83 (2009). https://doi.org/10.1007/s10742-009-0047-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10742-009-0047-1

Keywords

Navigation