ABSTRACT
Censuses and representative sampling surveys around the world are key sources of data to guide government investments and public policies. However, these sources are very expensive to obtain and are collected relatively infrequently. Over the last decade, there has been growing interest in the use of data from social media to complement more traditional data sources. However, social media users are not representative of the general population. Thus, analyses based on social media data require statistical adjustments, like post-stratification, in order to remove the bias and make solid statistical claims. These adjustments are possible only when we have information about the frequency of demographic groups using social media. These data, when compared with official statistics, enable researchers to produce appropriate statistical correction factors. In this paper, we leverage the Facebook advertising platform to compile the equivalent of an aggregate-level census of Facebook users. Our compilation includes the population distribution for seven demographic attributes such as gender, political leaning, and educational attainment at different geographic levels for the U.S. (country, state, and city). By comparing the Facebook counts with official reports provided by the U.S. Census and Gallup, we found very high correlations, especially for political leaning and race. We also identified instances where official statistics may be underestimating population counts as in the case of immigration. We use the information collected to calculate bias correction factors for all computed attributes in order to evaluate the extent to which different demographic groups are more or less represented on Facebook, and to derive the actual distributions for specific audiences of interest. We provide the first comprehensive analysis for assessing biases in Facebook users across several dimensions. This information can be used to generate bias-adjusted population estimates and demographic counts in a timely way and at fine geographic granularity in between data releases of official statistics.
Supplemental Material
- Monica Alexander, Kivan Polimis, and Emilio Zagheni. 2019. The Impact of Hurricane Maria on Out-migration from Puerto Rico: Evidence from Facebook Data. Population and Development Review 3, 45 (2019), 617–630.Google ScholarCross Ref
- Monica Alexander, Kivan Polimis, and Emilio Zagheni. 2020. Combining social media and survey data to nowcast migrant stocks in the United States. arXiv preprint arXiv:2003.02895(2020).Google Scholar
- Matheus Araujo, Yelena Mejova, Ingmar Weber, and Fabricio Benevenuto. 2017. Using Facebook Ads Audiences for Global Lifestyle Disease Surveillance: Promises and Limitations. In Proceedings of the ACM Conference on Web Science(WebSci ’17).Google ScholarDigital Library
- Nina Cesare, Hedwig Lee, Tyler McCormick, Emma Spiro, and Emilio Zagheni. 2018. Promises and Pitfalls of Using Digital Traces for Demographic Research. Demography 55, 5 (2018), 1979–1999.Google ScholarCross Ref
- Michael Conover, Jacob Ratkiewicz, Matthew Francisco, Bruno Gonçalves, Alessandro Flammini, and Filippo Menczer. 2011. Political polarization on Twitter. In Proceeding of the AAAI Conference on Weblogs and Social Media(ICWSM’11).Google Scholar
- Antoine Dubois, Emilio Zagheni, Kiran Garimella, and Ingmar Weber. 2018. Studying migrant assimilation through Facebook interests. In International Conference on Social Informatics. Springer, 51–60.Google ScholarCross Ref
- Masoomali Fatehkia, Ridhi Kashyap, and Ingmar Weber. 2018. Using Facebook ad data to track the global digital gender gap. World Development 107(2018), 189–209.Google ScholarCross Ref
- David Garcia, Yonas Mitike Kassa, Angel Cuevas, Manuel Cebrian, Esteban Moro, Iyad Rahwan, and Ruben Cuevas. 2018. Analyzing gender inequality through large-scale Facebook advertising data. Proceedings of the National Academy of Sciences 115, 27(2018), 6958–6963.Google ScholarCross Ref
- Sofia Gil-Clavel and Emilio Zagheni. 2019. Demographic Differentials in Facebook Usage around the World. In Proceedings of the International AAAI Conference on Web and Social Media, Vol. 13. 647–650.Google ScholarCross Ref
- Connor Gilroy and Ridhi Kashyap. 2018. Extending the Demography of Sexuality with Digital Trace Data. PAA 2018 Annual Meeting(2018), 1–25.Google Scholar
- Jennifer Golbeck and Derek Hansen. 2011. Computing Political Preference Among Twitter Followers. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Vancouver, BC, Canada) (CHI ’11).Google ScholarDigital Library
- Karri Haranko, Emilio Zagheni, Kiran Garimella, and Ingmar Weber. 2018. Professional Gender Gaps Across US Cities. In Proceedings of the AAAI Conference on Web and Social Media(ICWSM’18).Google ScholarCross Ref
- Y. M. Kassa, R. Cuevas, and Á. Cuevas. 2018. A Large-Scale Analysis of Facebook’s User-Base and User Engagement Growth. IEEE Access 6(2018), 78881–78891.Google ScholarCross Ref
- Bruce Krulwich. 1997. LIFESTYLE FINDER: Intelligent User Profiling Using Large-Scale Demographic Data. AI Magazine 18, 2 (1997), 37. https://doi.org/10.1609/aimag.v18i2.1292Google ScholarDigital Library
- Aibek Makazhanov and Davood Rafiei. 2013. Predicting Political Preference of Twitter Users. In Proceedings of the 2013 IEEE/ACM Conference on Advances in Social Networks Analysis and Mining (Niagara, Ontario, Canada) (ASONAM ’13).Google ScholarDigital Library
- Yelena Mejova, Ingmar Weber, and Luis Fernandez-Luque. 2018. Online Health Monitoring using Facebook Advertisement Audience Estimates in the United States: Evaluation Study.JMIR Public Health Surveill 4 (2018), e30. Issue 1.Google ScholarCross Ref
- Johnnatan Messias, Fabricio Benevenuto, Ingmar Weber, and Emilio Zagheni. 2016. From Migration Corridors to Clusters: The Value of Google+ Data for Migration Studies. In Proceedings of the IEEE/ACM Conference on Advances in Social Networks Analysis and Mining(ASONAM’16).Google ScholarCross Ref
- Alan Mislove, Sune Lehmann, Yong-Yeol Ahn, Jukka-Pekka Onnela, and J Niels Rosenquist. 2011. Understanding the Demographics of Twitter Users. In Proceedings of the AAAI Conference on Web and Social Media(ICWSM’11).Google Scholar
- Joao Palotti, Natalia Adler, Alfredo Morales-Guzman, Jeffrey Villaveces, Vedran Sekara, Manuel Garcia Herranz, Musa Al-Asad, and Ingmar Weber. 2020. Monitoring of the Venezuelan exodus through Facebook’s advertising platform. PLOS ONE 15, 2 (2020), 1–15.Google Scholar
- Filipe N. Ribeiro, Lucas Henrique, Fabrício Benevenuto, Abhijnan Chakraborty, Juhi Kulshrestha, Mahmoudreza Babaei, and Krishna P. Gummadi. 2018. Media Bias Monitor: Quantifying Biases of Social Media News Outlets at Large-Scale. In Proceedings of the AAAI Conference on Web and Social Media(ICWSM’18).Google ScholarCross Ref
- Filipe N. Ribeiro, Koustuv Saha, Mahmoudreza Babaei, Lucas Henrique, Johnnatan Messias, Fabrício Benevenuto Oana Goga, Krishna P. Gummadi, and Elissa M. Redmiles. 2019. On Microtargeting Socially Divisive Ads: A Case Study of Russia-Linked Ad Campaigns on Facebook. In Proceedings of the ACM Conference on Fairness, Accountability, and Transparency(FAccT ’19).Google ScholarDigital Library
- Koustuv Saha, Ingmar Weber, Michael L Birnbaum, and Munmun De Choudhury. 2017. Characterizing Awareness of Schizophrenia Among Facebook Users by Leveraging Facebook Advertisement Estimates. Journal of Medical Internet Research 19, 5 (2017), e156.Google ScholarCross Ref
- Till Speicher, Muhammad Ali, Giridhari Venkatadri, Filipe N. Ribeiro, George Arvanitakis, Fabricio Benevenuto, Krishna P. Gummadi, Patrick Loiseau, and Alan Mislove. 2018. On the Potential for Discrimination in Online Targeted Advertising. In Proceedings of the Conference on Fairness, Accountability, and Transparency (FAccT ’18).Google Scholar
- Ian Stewart, René D Flores, Timothy Riffe, Ingmar Weber, and Emilio Zagheni. 2019. Rock, Rap, or Reggaeton?: Assessing Mexican Immigrants’ Cultural Assimilation Using Facebook Data. In The World Wide Web Conference. 3258–3264.Google Scholar
- Karolina Sylwester and Matthew Purver. 2015. Twitter Language Use Reflects Psychological Differences between Democrats and Republicans. PLOS ONE 10, 9 (2015), 1–18.Google ScholarCross Ref
- Oy De Vel, Mw Corney, and Am Anderson. 2002. Language and gender author cohort analysis of e-mail for computer forensics. In Proceedings of the Digital Forensics Research Workshop(DFRWS ’02).Google Scholar
- Carolina Vieira, Filipe N. Ribeiro, Pedro Olmo Vaz de Melo, Fabricio Benevenuto, and Emilio Zagheni. 2020. Using Facebook Data to Measure Cultural Distance between Countries: The Case of Brazilian Cuisine. In Proceedings of The Web Conference(WWW ’20).Google ScholarDigital Library
- Emilio Zagheni, Venkata Rama Kiran Garimella, Ingmar Weber, and Bogdan State. 2014. Inferring international and internal migration patterns from twitter data. In Proceedings of the 23rd International Conference on World Wide Web. 439–444.Google ScholarDigital Library
- Emilio Zagheni and Ingmar Weber. 2012. You Are Where You e-Mail: Using e-Mail Data to Estimate International Migration Rates. In Proceedings of the ACM Conference on Web Science(Evanston, Illinois) (WebSci ’12).Google ScholarDigital Library
- Emilio Zagheni, Ingmar Weber, Krishna Gummadi, 2017. Leveraging Facebook’s advertising platform to monitor stocks of migrants. Population and Development Review 43, 4 (2017), 721–734.Google ScholarCross Ref
Recommendations
Factors driving young users' engagement with Facebook
User engagement has recently been the focus of attention for marketing planners who want to capture the enormous opportunities provided by social media. In this study, we investigate the drivers of social media user engagement by extending an existing ...
The Janus face of Facebook: Positive and negative sides of social networking site use
AbstractThere is an increasing awareness that social networking site (SNS) use includes a socio-psychologically positive and a negative side. However, research remains largely silent on which side dominates in driving SNS use. To address this ...
Highlights- We investigate the two sides of self-presentation and need to belong in SNSs.
- ...
Comments