skip to main content
10.1145/3394231.3397923acmconferencesArticle/Chapter ViewAbstractPublication PageswebsciConference Proceedingsconference-collections
research-article

How Biased is the Population of Facebook Users? Comparing the Demographics of Facebook Users with Census Data to Generate Correction Factors

Published:06 July 2020Publication History

ABSTRACT

Censuses and representative sampling surveys around the world are key sources of data to guide government investments and public policies. However, these sources are very expensive to obtain and are collected relatively infrequently. Over the last decade, there has been growing interest in the use of data from social media to complement more traditional data sources. However, social media users are not representative of the general population. Thus, analyses based on social media data require statistical adjustments, like post-stratification, in order to remove the bias and make solid statistical claims. These adjustments are possible only when we have information about the frequency of demographic groups using social media. These data, when compared with official statistics, enable researchers to produce appropriate statistical correction factors. In this paper, we leverage the Facebook advertising platform to compile the equivalent of an aggregate-level census of Facebook users. Our compilation includes the population distribution for seven demographic attributes such as gender, political leaning, and educational attainment at different geographic levels for the U.S. (country, state, and city). By comparing the Facebook counts with official reports provided by the U.S. Census and Gallup, we found very high correlations, especially for political leaning and race. We also identified instances where official statistics may be underestimating population counts as in the case of immigration. We use the information collected to calculate bias correction factors for all computed attributes in order to evaluate the extent to which different demographic groups are more or less represented on Facebook, and to derive the actual distributions for specific audiences of interest. We provide the first comprehensive analysis for assessing biases in Facebook users across several dimensions. This information can be used to generate bias-adjusted population estimates and demographic counts in a timely way and at fine geographic granularity in between data releases of official statistics.

Skip Supplemental Material Section

Supplemental Material

3394231.3397923.mp4

mp4

13.3 MB

References

  1. Monica Alexander, Kivan Polimis, and Emilio Zagheni. 2019. The Impact of Hurricane Maria on Out-migration from Puerto Rico: Evidence from Facebook Data. Population and Development Review 3, 45 (2019), 617–630.Google ScholarGoogle ScholarCross RefCross Ref
  2. Monica Alexander, Kivan Polimis, and Emilio Zagheni. 2020. Combining social media and survey data to nowcast migrant stocks in the United States. arXiv preprint arXiv:2003.02895(2020).Google ScholarGoogle Scholar
  3. Matheus Araujo, Yelena Mejova, Ingmar Weber, and Fabricio Benevenuto. 2017. Using Facebook Ads Audiences for Global Lifestyle Disease Surveillance: Promises and Limitations. In Proceedings of the ACM Conference on Web Science(WebSci ’17).Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Nina Cesare, Hedwig Lee, Tyler McCormick, Emma Spiro, and Emilio Zagheni. 2018. Promises and Pitfalls of Using Digital Traces for Demographic Research. Demography 55, 5 (2018), 1979–1999.Google ScholarGoogle ScholarCross RefCross Ref
  5. Michael Conover, Jacob Ratkiewicz, Matthew Francisco, Bruno Gonçalves, Alessandro Flammini, and Filippo Menczer. 2011. Political polarization on Twitter. In Proceeding of the AAAI Conference on Weblogs and Social Media(ICWSM’11).Google ScholarGoogle Scholar
  6. Antoine Dubois, Emilio Zagheni, Kiran Garimella, and Ingmar Weber. 2018. Studying migrant assimilation through Facebook interests. In International Conference on Social Informatics. Springer, 51–60.Google ScholarGoogle ScholarCross RefCross Ref
  7. Masoomali Fatehkia, Ridhi Kashyap, and Ingmar Weber. 2018. Using Facebook ad data to track the global digital gender gap. World Development 107(2018), 189–209.Google ScholarGoogle ScholarCross RefCross Ref
  8. David Garcia, Yonas Mitike Kassa, Angel Cuevas, Manuel Cebrian, Esteban Moro, Iyad Rahwan, and Ruben Cuevas. 2018. Analyzing gender inequality through large-scale Facebook advertising data. Proceedings of the National Academy of Sciences 115, 27(2018), 6958–6963.Google ScholarGoogle ScholarCross RefCross Ref
  9. Sofia Gil-Clavel and Emilio Zagheni. 2019. Demographic Differentials in Facebook Usage around the World. In Proceedings of the International AAAI Conference on Web and Social Media, Vol. 13. 647–650.Google ScholarGoogle ScholarCross RefCross Ref
  10. Connor Gilroy and Ridhi Kashyap. 2018. Extending the Demography of Sexuality with Digital Trace Data. PAA 2018 Annual Meeting(2018), 1–25.Google ScholarGoogle Scholar
  11. Jennifer Golbeck and Derek Hansen. 2011. Computing Political Preference Among Twitter Followers. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Vancouver, BC, Canada) (CHI ’11).Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Karri Haranko, Emilio Zagheni, Kiran Garimella, and Ingmar Weber. 2018. Professional Gender Gaps Across US Cities. In Proceedings of the AAAI Conference on Web and Social Media(ICWSM’18).Google ScholarGoogle ScholarCross RefCross Ref
  13. Y. M. Kassa, R. Cuevas, and Á. Cuevas. 2018. A Large-Scale Analysis of Facebook’s User-Base and User Engagement Growth. IEEE Access 6(2018), 78881–78891.Google ScholarGoogle ScholarCross RefCross Ref
  14. Bruce Krulwich. 1997. LIFESTYLE FINDER: Intelligent User Profiling Using Large-Scale Demographic Data. AI Magazine 18, 2 (1997), 37. https://doi.org/10.1609/aimag.v18i2.1292Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Aibek Makazhanov and Davood Rafiei. 2013. Predicting Political Preference of Twitter Users. In Proceedings of the 2013 IEEE/ACM Conference on Advances in Social Networks Analysis and Mining (Niagara, Ontario, Canada) (ASONAM ’13).Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Yelena Mejova, Ingmar Weber, and Luis Fernandez-Luque. 2018. Online Health Monitoring using Facebook Advertisement Audience Estimates in the United States: Evaluation Study.JMIR Public Health Surveill 4 (2018), e30. Issue 1.Google ScholarGoogle ScholarCross RefCross Ref
  17. Johnnatan Messias, Fabricio Benevenuto, Ingmar Weber, and Emilio Zagheni. 2016. From Migration Corridors to Clusters: The Value of Google+ Data for Migration Studies. In Proceedings of the IEEE/ACM Conference on Advances in Social Networks Analysis and Mining(ASONAM’16).Google ScholarGoogle ScholarCross RefCross Ref
  18. Alan Mislove, Sune Lehmann, Yong-Yeol Ahn, Jukka-Pekka Onnela, and J Niels Rosenquist. 2011. Understanding the Demographics of Twitter Users. In Proceedings of the AAAI Conference on Web and Social Media(ICWSM’11).Google ScholarGoogle Scholar
  19. Joao Palotti, Natalia Adler, Alfredo Morales-Guzman, Jeffrey Villaveces, Vedran Sekara, Manuel Garcia Herranz, Musa Al-Asad, and Ingmar Weber. 2020. Monitoring of the Venezuelan exodus through Facebook’s advertising platform. PLOS ONE 15, 2 (2020), 1–15.Google ScholarGoogle Scholar
  20. Filipe N. Ribeiro, Lucas Henrique, Fabrício Benevenuto, Abhijnan Chakraborty, Juhi Kulshrestha, Mahmoudreza Babaei, and Krishna P. Gummadi. 2018. Media Bias Monitor: Quantifying Biases of Social Media News Outlets at Large-Scale. In Proceedings of the AAAI Conference on Web and Social Media(ICWSM’18).Google ScholarGoogle ScholarCross RefCross Ref
  21. Filipe N. Ribeiro, Koustuv Saha, Mahmoudreza Babaei, Lucas Henrique, Johnnatan Messias, Fabrício Benevenuto Oana Goga, Krishna P. Gummadi, and Elissa M. Redmiles. 2019. On Microtargeting Socially Divisive Ads: A Case Study of Russia-Linked Ad Campaigns on Facebook. In Proceedings of the ACM Conference on Fairness, Accountability, and Transparency(FAccT ’19).Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Koustuv Saha, Ingmar Weber, Michael L Birnbaum, and Munmun De Choudhury. 2017. Characterizing Awareness of Schizophrenia Among Facebook Users by Leveraging Facebook Advertisement Estimates. Journal of Medical Internet Research 19, 5 (2017), e156.Google ScholarGoogle ScholarCross RefCross Ref
  23. Till Speicher, Muhammad Ali, Giridhari Venkatadri, Filipe N. Ribeiro, George Arvanitakis, Fabricio Benevenuto, Krishna P. Gummadi, Patrick Loiseau, and Alan Mislove. 2018. On the Potential for Discrimination in Online Targeted Advertising. In Proceedings of the Conference on Fairness, Accountability, and Transparency (FAccT ’18).Google ScholarGoogle Scholar
  24. Ian Stewart, René D Flores, Timothy Riffe, Ingmar Weber, and Emilio Zagheni. 2019. Rock, Rap, or Reggaeton?: Assessing Mexican Immigrants’ Cultural Assimilation Using Facebook Data. In The World Wide Web Conference. 3258–3264.Google ScholarGoogle Scholar
  25. Karolina Sylwester and Matthew Purver. 2015. Twitter Language Use Reflects Psychological Differences between Democrats and Republicans. PLOS ONE 10, 9 (2015), 1–18.Google ScholarGoogle ScholarCross RefCross Ref
  26. Oy De Vel, Mw Corney, and Am Anderson. 2002. Language and gender author cohort analysis of e-mail for computer forensics. In Proceedings of the Digital Forensics Research Workshop(DFRWS ’02).Google ScholarGoogle Scholar
  27. Carolina Vieira, Filipe N. Ribeiro, Pedro Olmo Vaz de Melo, Fabricio Benevenuto, and Emilio Zagheni. 2020. Using Facebook Data to Measure Cultural Distance between Countries: The Case of Brazilian Cuisine. In Proceedings of The Web Conference(WWW ’20).Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Emilio Zagheni, Venkata Rama Kiran Garimella, Ingmar Weber, and Bogdan State. 2014. Inferring international and internal migration patterns from twitter data. In Proceedings of the 23rd International Conference on World Wide Web. 439–444.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Emilio Zagheni and Ingmar Weber. 2012. You Are Where You e-Mail: Using e-Mail Data to Estimate International Migration Rates. In Proceedings of the ACM Conference on Web Science(Evanston, Illinois) (WebSci ’12).Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Emilio Zagheni, Ingmar Weber, Krishna Gummadi, 2017. Leveraging Facebook’s advertising platform to monitor stocks of migrants. Population and Development Review 43, 4 (2017), 721–734.Google ScholarGoogle ScholarCross RefCross Ref

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Conferences
    WebSci '20: Proceedings of the 12th ACM Conference on Web Science
    July 2020
    361 pages
    ISBN:9781450379892
    DOI:10.1145/3394231

    Copyright © 2020 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 6 July 2020

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article
    • Research
    • Refereed limited

    Acceptance Rates

    Overall Acceptance Rate218of875submissions,25%

    Upcoming Conference

    Websci '24
    16th ACM Web Science Conference
    May 21 - 24, 2024
    Stuttgart , Germany

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format