Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Research Article
  • Published:

On ecological fallacy, assessment errors stemming from misguided variable selection, and the effect of aggregation on the outcome of epidemiological study

Abstract

In social and environmental sciences, ecological fallacy is an incorrect assumption about an individual based on aggregate data for a group. In the present study, the validity of this assumption was tested using both individual estimates of exposure to air pollution and aggregate data for 1,492 schoolchildren living in the in vicinity of a major coal-fired power station in the Hadera region of Israel. In 1996 and 1999, the children underwent subsequent pulmonary function tests (PFT), and their parents completed a detailed questionnaire on their health status and housing conditions. The association between children's PFT results and their exposure to air pollution was investigated in two phases. During the first phase, PFT averages were compared with average levels of air pollution detected in townships, and small census areas in which the children reside. During the second phase, individual pollution estimates were compared with individual PFT results, and pattern detection techniques (Getis-Ord statistic) were used to investigate the spatial data structure. While different levels of areal data aggregation changed the results only marginally, the choice of indices measuring the children's PFT performance had a significant influence on the outcome of the analysis. As argued, differences between individual-level and group-level effects of exposure (i.e., ecological or cross-level bias) are not necessary outcomes of data aggregation, and that seemingly unexpected results may often stem from a misguided selection of variables chosen to measure health effects. The implications of the results of the analysis for epidemiological studies are discussed, and recommendations for public health policy are formulated.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1
Figure 2
Figure 3

Similar content being viewed by others

References

  • American Thoracic Society. Standardization of spirometry, 1994 update. Am J Respir Crit Care Med 1995: 152 (3): 1107–1136.

  • Anselin L. Spatial Econometrics. Bruton Center, School of Social Sciences, University of Texas: Dallas, TX, 1999.

    Google Scholar 

  • Association of Towns for Environmental Protection. Annual Report for 2004. Hadera, Israel [in Hebrew] (available at: http://www.igudhadera.co.il/) 2005.

  • Bell M.L. The use of ambient air quality modeling to estimate individual and population exposure for human health research: a case study of ozone in the Northern Georgia Region of the United States. Env Int 2006: 32 (5): 586–593.

    Article  CAS  Google Scholar 

  • Brauer M., Hoek G., van Vliet P., Meliefste K., Fischer P., and Gehring U., et al. Estimating long-term average particulate air pollution concentrations: application of traffic indicators and geographic information systems. Epidemiology 2003: 14 (2): 228–239.

    PubMed  Google Scholar 

  • Burrough P.A., and McDonnell R.A. Principles of Geographical Information Systems. Oxford University Press: New York, 1998.

    Google Scholar 

  • Cliff A.D., and Ord J.K. Spatial processes — Models and Applications. Pion: London, 1981.

    Google Scholar 

  • Cockings S., Dunn C.E., Bhopal R.S., and Walker D.R. Users’ perspectives on epidemiological, GIS and point pattern approaches to analyzing environment and health data. Health Place 2004: 10 (2): 169–182.

    Article  PubMed  Google Scholar 

  • Dubnov J., Barchana M., Rishpon S., Leventhal A., Segal I., Carel R., and Portnov B.A. Estimating the effect of air pollution from a coal-fired power station on the development of children’s pulmonary function. Environ Res (in press).

  • Elliott P., Cuzick J., English D., and Stern R., (Eds.). Geographical and Environmental Epidemiology. Methods for Small Area Studies. Oxford: Oxford University Press, 1992 (1996 reprint) 404 pp.

    Google Scholar 

  • Elliott P., and Wartenberg D. Spatial epidemiology: current approaches and future challenges. Environ Health Perspect 2004: 112 (9): 998–1006.

    Article  PubMed  PubMed Central  Google Scholar 

  • Enright P.L., Linn W.S., Avol E.L., Margolis H.G., Gong Jr H., and Peters J.M. Quality of Spirometry Test Performance in Children and Adolescents Experience in a Large Field Study. Chest 2000: 118: 665–671.

    Article  CAS  PubMed  Google Scholar 

  • Felsenstein D., and Portnov B.A., (Eds.). Regional Disparities in Small Countries. Springer Verlag: Heidelberg, 2005.

    Book  Google Scholar 

  • Feris B.G. Recommended respiratory disease questionnaire for use with adults and children in epidemiological research. Epidemiology standardization project. Am Rev Respir Dis 1978: 118: 1–120.

    Google Scholar 

  • Gauderman W.J., Avol E.L., Gilliland F., Vora H., Thomas D., and Berhane K., et al. The effect of air pollution on lung development from 10 to 18 years of age. N Engl J Med 2004: 351: 1057–1067.

    Article  CAS  PubMed  Google Scholar 

  • Gauderman W.J., McConnell R., Gilliland F., London S., Thomas D., and Avol E.L., et al. Association between air pollution and lung function growth in Southern California children. Am J Respir Crit Care Med 2000: 162: 1383–1390.

    Article  CAS  PubMed  Google Scholar 

  • Gehlke C., and Biehl K. Certain effects of grouping upon the size of the correlation coefficient in census tract material. J Am Statist Assoc 1934: 29: 169–170.

    Google Scholar 

  • Getis A., and Ord J.K. The analysis of spatial association by use of distance statistics. Geogr Anal 1992: 24 (3): 189–206.

    Article  Google Scholar 

  • Glaeser E.L., Kallal H.D., Scheinkman J.A., and Shleifer A. Growth in cities. J Political Econ 1992: 100 (6): 1126–1152.

    Article  Google Scholar 

  • Goodchild M.F., Sun G., and Yang S. Development and test of an error model for categorical data. Int J Geogr Inf Syst 1992: 6 (2): 87–104.

    Article  Google Scholar 

  • Goren A.I., Goldsmith J.R., Hellmann S., and Brenner S. Follow-up of schoolchildren in the vicinity of a coal-fired power plant in Israel. Environ Health Perspect 1991: 94: 101–105.

    CAS  PubMed  PubMed Central  Google Scholar 

  • Goren A.I., and Hellmann S. Respiratory conditions among schoolchildren and their relationship to environmental tobacco smoke and other combustion products. Arch Environ Health 1995: 50 (2): 112–118.

    Article  CAS  PubMed  Google Scholar 

  • Goren A.I., Hellmann S., and Glaser E.D. Use of outpatient clinics as a health indicator for communities around a coal-fired power plant. Environ Health Perspect 1995: 103 (12): 1110–1115.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Gotway C., and Young L.J. Combining incompatible spatial data. J Am Statist Assoc 2002: 97 (48): 632–647.

    Article  Google Scholar 

  • Greenland S. Ecologic versus individual-level sources of bias in ecologic estimates of contextual health effects. Int J Epidemiol 2001: 30 (6): 1343–1350.

    Article  CAS  PubMed  Google Scholar 

  • Greenland S., and Morgenstern H. Ecological bias, confounding, and effect modification. Int J Epidemiol 1989: 18 (1): 269–274.

    Article  CAS  PubMed  Google Scholar 

  • Greenland S., and Robins J. Invited commentary: ecological studies — biases, misconceptions, and counterexamples. Am J Epidemiol 1994: 139 (8): 747–760.

    Article  CAS  PubMed  Google Scholar 

  • Hankinson J., Odencrantz J., and Fedan K. Spirometric reference values from a sample of the general US population. Am J Respir Crit Care Med 1999: 159: 179–187.

    Article  CAS  PubMed  Google Scholar 

  • Heuvelink G.B.M., and Burrough P.A. Propagation of errors in spatial modeling with GIS. Int. J Geogr Inf Syst 1989: 3 (4): 303–322.

    Article  Google Scholar 

  • Jedrychowski W., Maugeri U., and Jedrychowska-Bianchi I. Body growth rate in preadolescent children and outdoor air quality. Environ Res 2002: 90 (1): 12–20.

    Article  CAS  PubMed  Google Scholar 

  • Lasserre V., Guihenneuc-Jouyaux C., and Richardson S. Biases in ecological studies: utility of including within-area distribution of confounders. Stat Med 2000: 19 (1): 45–59.

    Article  CAS  PubMed  Google Scholar 

  • McCoy J., and Johnston K. Using ArcGIS Spatial Analyst. ESRI: Redlands, CA, 2001.

    Google Scholar 

  • Minami M., and Environmental Systems Research Institute Using ArcMap: GIS by ESRI. ESRI: Redlands, California, 2000 pp. 365–392.

    Google Scholar 

  • Morgenstern H. Ecologic studies in epidemiology: concepts, principles, and methods. Annu Rev Public Health 1995: 16: 61–81. (Review).

    Article  CAS  PubMed  Google Scholar 

  • Morgenstern H., and Thomas D. Principles of study design in environmental epidemiology. Environ Health Perspect 1993: 101 (Suppl 4): 23–38.

    Article  PubMed  PubMed Central  Google Scholar 

  • Nuckols J.R., Ward M.H., and Jarup L. Using geographic information systems for exposure assessment in environmental epidemiology studies. Environ Health Perspect 2004: 112 (9): 1007–1015.

    Article  PubMed  PubMed Central  Google Scholar 

  • Openshaw S. The modifiable areal unit problem. In: Concepts and Techniques in Modern Geography, Monograph Series. Geo Books: London, 1984 38, 41 pp.

    Google Scholar 

  • Pekkanen J., and Pearce N. Environmental epidemiology: challenges and opportunities. Environ Health Perspect 2001: 109 (1): 1–5.

    CAS  PubMed  PubMed Central  Google Scholar 

  • Peled R., Bibi H., Pope III C.A., Nir P., Shiachi R., and Scharff S. Differences in lung function among school children in communities in Israel. Arch Environ Health 2001: 56 (1): 89–95.

    Article  CAS  PubMed  Google Scholar 

  • Peters J.M., Avol E.L., Gauderman W.J., Linn W.S., Navidi W., and London S.J., et al. A study of twelve southern California communities with differing levels and types of air pollution: II. Effects on pulmonary function. Am J Respir Crit Care Med 1999a: 159: 768–775.

    Article  CAS  PubMed  Google Scholar 

  • Peters J.M., Avol E.L., Navidi W., London S.J., Gauderman W.J., and Lurmann F., et al. A study of twelve Southern California communities with differing levels and types of air pollution: I. Prevalence of respiratory morbidity. Am J Respir Crit Care Med 1999b: 159: 760–767.

    Article  CAS  PubMed  Google Scholar 

  • Pikhart H., Bobak M., Kriz B., Danova J., Celko M.A., and Prikazsky V., et al. Outdoor air concentrations of nitrogen dioxide and sulfur dioxide and prevalence of wheezing in school children. Epidemiology 2000: 11 (2): 153–160.

    Article  CAS  PubMed  Google Scholar 

  • Pope III C.A., Burnett R.T., Thun M.J., Calle E.E., Krewski D., and Kazuhiko I., et al. Lung cancer, cardiopulmonary mortality, and long-term exposure to fine particulate air pollution. JAMA 2002: 287: 1132–1141.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Pope III C.A., Burnett R.T., Thurston G.D., Thun M.J., Calle E.E., and Krewski D., et al. Cardiovascular mortality and long-term exposure to particulate air pollution: epidemiological evidence of general pathophysiological pathways of diseases. Circulation 2004: 109: 71–77.

    Article  PubMed  Google Scholar 

  • Robinson W.S. Ecological correlations and the behavior of individuals. Am Sociol Rev 1950: 15: 351–357.

    Article  Google Scholar 

  • Rothman K.J. Modern Epidemiology. Little, Brown and Co: Boston, 1986.

    Google Scholar 

  • Rothman K.J. Methodological Frontiers in Environmental Epidemiology. Environ Health Perspect 1993: 101 (Suppl 4): 19–21.

    Article  PubMed  PubMed Central  Google Scholar 

  • Salway R., and Wakefield J. Sources of bias in ecological studies of non-rare events. Env Ecol Statist 2005: 12: 321–347.

    Article  CAS  Google Scholar 

  • Samet J.M., Dominici F., Curriero F.C., Coursac I., and Zeger S. Fine particulate air pollution and mortality in 20 US cities, 1987–1994. N Engl J Med 2000: 343: 1742–1749.

    Article  CAS  PubMed  Google Scholar 

  • Schwartz J. Air pollution and children’s health. Pediatrics 2004: 113 (4 Suppl): 1037–1043. (Review).

    PubMed  Google Scholar 

  • Scoggins A., Kjellstrom T., Fisher G., Connor J., and Gimson N. Spatial analysis of annual air pollution exposure and mortality. Sci Total Environ 2004: 321 (1–3): 71–85.

    Article  CAS  PubMed  Google Scholar 

  • Selvin H.C. Durkheim’s suicide and problems of empirical research. Am J Sociol 1958: 63 (6): 607–619.

    Article  Google Scholar 

  • Unwin D.J. GIS, spatial analysis and spatial statistics. Prog Human Geogr 1996: 20 (4): 540–551.

    Article  Google Scholar 

  • Veregin H. Developing and testing of an error propagation model for GIS overlay operations. Int J Geogr Inf Syst 1995: 9 (6): 595–619.

    Article  Google Scholar 

  • Wakefield J., and Shaddick G. Health-exposure modeling and the ecological fallacy. Biostatistics 2005: 7 (3): 438–455.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Boris A Portnov.

Appendices

Appendix A

Description of Data Sources

Pulmonary Function Data

Spirometry was performed by means of a Minato® AS 500 spirometer and in compliance with the American Thoracic Society (ATS) criteria. Each child performed three consecutive pulmonary function tests (PFT), and the maneuver with the largest sum of Forced Vital Capacity (FVC) and Forced Expiratory Volume during the first second (FEV1) was recorded as a representative test (American Thoracic Society, 1995; Enright et al., 2000). Predicted PFT values were calculated using a polynomial model, separately for each gender (Hankinson et al., 1999).

The calculations were performed separately for differences in forced vital capacity (FVC) and forced expiratory volume during the first second (FEV1).

Then, the relative changes in pulmonary function tests (ΔPFT) from 1996 to 1999 were calculated as follows:

ΔFVCi =FVC_99pi - FVC_96pi = FVC_99io*100/FVC_99ie, - FVC_96io*100/FVC_96ie, ΔFEV1i =FEV1_99pi - FEV1_96pi = FEV1_99io*100/FEV1_99ie, - FEV1_96io*100/FEV1_96ie,

where:

  • FVC_99io, FEV1_99io, FVC_96io, and FEV1_96io and PFT96io are observed forced expiratory flow volumes (FVC or FEV1) of child i in 1999 and 1996, respectively;

  • FVC_99ie, FEV1_99ie, FVC_96ie, and FEV1_96ie and PFT96ie are the calculated (expected) volumes for child i in the same years;

  • FVC_99pi, FVC_96pi, FEV1_99pi and FEV1_96pi are, respectively, FVC and FEV1 performances of child i (observed vs. expected) in 1999 and 1996, expressed as percentages.

To ensure the suitability of ΔPFT estimates for multivariate modeling, normality of distribution was tested by the Kolmogorov-Smirnov (KS) test, in which the distribution of ΔPFT values appeared fairly normal (KS Z<0.9; P>0.4).

Demographic And Health Data

The questionnaire used in the study was a validated translation of the questionnaire developed and used by the American Thoracic Society (ATS) and National Heart and Lung Institute (Feris, 1978). It includes questions about the presence or absence of pulmonary diseases diagnosed by a physician (e.g., asthma), household-related characteristics, such as gas or oil house heating, housing density, exposure to passive tobacco smoking, parents’ education, and duration of living in the study area. The children's parents completed the questionnaires, with an overall rate of return of 72.4%.

Air Pollution Data

There are 12 monitoring stations in the study area, which provide continuous (24-h a day) measurements of air pollution levels. For our analysis we used only the measurements simultaneously exceeding half-an-hour reference levels for NOx and SO2 (0.125 and 0.070 ppm, respectively). These excess concentrations (or so-called “air pollution events”) help to distinguish air pollution “splashes” generated by the power station (contributing 50% of emissions in study area) from air pollution constantly present in the area and attributed to other sources such as motor vehicles (Association of Towns for Environmental Protection, 2005; Goren et al., 1995).

For each “air pollution event” we calculated integrated concentration value (ICV) of NOx and SO2 by multiplying their average concentrations during the “event” [ppm] by the unit of event's duration (half-an-hour is one unit) and then summarized the results over the entire study period (i.e., 1996 through 1999).

These summary values for the 12 air-monitoring stations were then interpolated by krigging, which furnished contours of equal pollution levels for the entire study area. Using these air pollution contours we estimated the individual exposure levels in the vicinity of the children's residences.

One comment is important. As noted in the section on the sources of ecological bias, the conversion of data available for several air monitoring stations into regular grids or local exposure estimates may result in an estimation bias known as “error propagation” (Heuvelink and Burrough, 1989; Goodchild et al., 1992; Veregin, 1995). The major cause of this sort of this bias emanates from in the use of interpolation models which may “transfer” any error in the original data into all output data layers created by interpolation (Gotway and Young, 2002; Bell, 2006). Furthermore, the outcome of interpolation is sensitive to the interpolation method used, that is spline, inverse distance weighted method, kriging, etc. (McCoy and Johnston, 2001). However, this complex phenomenon is well beyond the scope of the present study, which focuses mainly on ecological bias attributed to spatial data aggregation and considers air pollution estimates as exogenous.

The air pollution estimates used in the analysis did not include particulate matter of <2.5 μm or <10 μm in aerodynamic diameter (i.e., PM2.5 and PM10) because PM measurements were available for only three out of the 12 monitoring stations distributed sparsely across the study area. Due to this limitation we used NOx and SO2 air pollutants as proxies for air pollution patterns in the study area. This limitation and its implications are addressed in the discussion section.

Appendix B

Descriptive characteristics of selected research variables

Table A1

Table 8 Table a1

Appendix C

Getis-Ord measure of local spatial autocorrelation

The Getis-Ord (Gi*(d)) statistic, used in the present analysis for detecting the spatial clustering of abnormally high and low values of ΔPFT variables, is reported as standard normal z-values and is calculated as follows:

where n is the number of observations; d is the distance band within which locations j are considered as neighbors of the target location i; xi is the value observed in location i; ; wij is a symmetric binary weight matrix, whose elements take value 1 if locations i and j are neighbors and 0 otherwise, and . (In the present study, the mean distance between the children's home (i.e., 40 m) was used for defining individual observations as “neighbors”).

Gi*(d) statistic evaluates each point within a network of sites, and helps to determine the relationship between the values observed around the target point and the global mean (Getis and Ord, 1992). This statistic is easy to interpret: a significant and positive Gi*(d) indicates that location i is surrounded by relatively large values (with respect to the global mean) — “peak-value clusters”, whereas a significant and negative Gi*(d) indicates that location i is surrounded by relatively small values — “dip-value clusters” (ibid).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Portnov, B., Dubnov, J. & Barchana, M. On ecological fallacy, assessment errors stemming from misguided variable selection, and the effect of aggregation on the outcome of epidemiological study. J Expo Sci Environ Epidemiol 17, 106–121 (2007). https://doi.org/10.1038/sj.jes.7500533

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/sj.jes.7500533

Keywords

This article is cited by

Search

Quick links