Keywords

1 Introduction and Background

Despite the fact that an increasingly overwhelming attention in the knowledge discovery in data community has been on social media and its enormous volume of big data, little research has been on the autism-related data till [1] published in 2014. The present study, the first of its kind, was using the public data from Twitter to extract the linguistic and semantic usage patterns to enrich our understanding of the public knowledge on autism as well as the public awareness of it. To the best of our knowledge, there is no published English works on Chinese autism community; this study, the first of its kind focusing on Chinese autism society, attempts to offer unprecedented window of opportunities to examine the collective behaviors and patterns such as public collective sentimental on autism, the linguistic pattern, etc., in China.

1.1 Autism Spectrum Disorder and the Current Research and Development Landscape in China

Individuals with Autism Spectrum Disorder (ASD), a neurodevelopment disorder, have marked impairments in social interaction and communications, prone to restricted interests, demonstrate stereotyped repetitive behaviors, exhibit difficulty interpreting others’ mental states [38]. It is known that early intervention might lead to improved social skills and increased quality of life and independence when they grow into adolescence [38]. Compared to the plenty of research efforts on the assessment, diagnosis and intervention of ASD, similar works in Eastern Asia lag alarmingly far behind [39], especially in China. Autism was not officially recognized as a disorder until the early 1980s in mainland China, and till now, it is unclear how many of China’s population (including adults and children) are with ASD [39, 40].

1.2 Study Motivation

Decades of prior works have consistently affirmed that public and personal mood states play a crucial role in influencing human decision-making [2,3,4, 7, 29,30,31]. Among these, Taquet et al. [29] called for researchers in the big data and collective behavior areas to recognize the importance of emotional factors in understanding the collective human behaviors in the big data era. It is notably known the emotional variable sometimes out-shadow other variables in the human decision-making cycle [28].

On the other hand, an exponential body of research has been given to the role of big data in understanding human’s collective behavior in various real-life applications (refer to [12] for a complete review). Some attempts have been made to link the impact of such collective behavior as manifested by public emotional states or mood for predictive analysis [13, 14], which motivates our study. Specifically, due to the relatively low public awareness (not to mention the subsequent published English works on the assessment, diagnosis and intervention of ASD), our studies, in the long run, could potentially push for public’s understanding of the corpus of knowledge of ASD among both the medical community and the general public; greatly facilitating our understanding of the current practices in ASD.

While our work provides more insights about the link between the temporal and collective public sentiment on ASD and its public awareness in China, its findings might also pave the way for further investigations that are beneficial to both raising public awareness and executing public policies on autism education and medical care for thousands of millions of ‘undocumented’ individuals and their loved ones living in ASD.

2 Related Work

2.1 Data Mining in Social Media for and in Healthcare Application

Numerous prior works have been carried out on the adoption of data mining in healthcare (a few recent among many [1, 11, 15,16,17,18,19,20, 22,23,24]). Some also have pointed out that the importance of understanding the demographics of users of online social networks in healthcare applications [11, 21]. Among these, one notable work by Ginsberg et al. [27] has revealed that geographically localized health-related search queries can bring extremely effective level of estimate of influenza-like illness in the United States: with a startlingly one day lag—much faster than the estimated by CDC. These findings have collectively painted a picture of the evolving benefits and practices surrounding big data analytics.

2.2 Data Mining in Social Media on ASD

Despite the fact that an increasingly overwhelming attention in the knowledge discovery in data community has been on social media and its enormous volume of big data, little research has been on the autism-related data till [1] published in 2014. The study, the first of its kind, using the public data from Twitter to extract the linguistic and semantic usage patterns, and enriched our understanding of the public knowledge autism as well as the public awareness of it. The authors later continue to pursue down this path to probe into the autism community. To the best of our knowledge, there is no published English works on Chinese autism community.

2.3 Sentimental Analysis and Social Media

Prior studies have applied indirect assessment of public sentiment from the results of soccer games [8] as well as weather conditions [9]. However, due to the sensitivity of the chosen correlation indicators, the accuracy of these assessments remains unsatisfactory [5]. Recently, there have been many research extracting public mood states directly from large volumes of rich public data such as blogs [6, 10], and twitter (a few recent among many others [21, 25, 26]).

3 Our Study

3.1 Data Collection and the Corpus

The data was collected from one of the most famous autism support site in ChinaFootnote 1. A java free open-resource web crawler was sent to go through over 400,000 pages in the site; advertisement page, blank page and non-textual contents were cleaned resulting in a total of 19,014 pages. The second set of the data comes from Weibo, a Chinese version of Twitter which takes social media by storm in China. We first used keyword-based searching to filter out non-ASD related posts; a total of 19 commonly adopted keywords were applied including “自闭症谱系障碍 (ASD)”, “来自星星的孩子(star children)”, “埃斯伯格综合症 (Asperger Syndrome)” etc. The automatic data collector was programmed using the open-source Weibo API; data collection period is between November 3, 2015, and November 22, 2015 in 20 consecutive days; a total of 270,750 Weibo posts had been collected during the aforementioned time period.

3.2 Emotion Lexical-Affinity-Based Sentimental Analysis

Following the common practices suggested in [32,33,34, 37], we conduct the sentimental analysis based on two emotion lexicons. One is the Chinese translation of the notable NRC Emotion LexiconFootnote 2 [37]; another is DUTIRFootnote 3 (adopted in [32]). Since the former is a translated Chinese version of the original NRC English version using Google Translate2, and due to the complicated intrinsic natures of Chinese words especially in their rich and subtly different semantic features (for example, Chinese words are notoriously known to bear sentiment ambiguity [36] or the same words might carry multiple emotions [35], we speculate that the Chinese version of NRC emotion lexicon might not be reliable as a basis.

3.3 Study Results and Discussions

Study 1: Emotion Lexicon-Based Study on the Chinese ASD Support Site.

Our analysis was performed on the 19,014 pages from the autism support site. The widely adopted Chinese word segmentation- the Natural Language Processing Information Retrieval (NLPIR)Footnote 4 was used to segment the words. Figure 1 shows the total emotions shown in the site. Results showed that positive and negative contents have no significant differences; fear, surprise and sadness emotion, and the disgust emotion (against prejudice on people with ASD and their family members) combined might out-shadow other emotions in the public, which is aligned with the relatively low awareness and public acceptance of ASD in the community [39]. The trust emotion could spark hope to call for public awareness of ASD. Overall, the results only showed an aggregate emotional landscape of ASD as reflected from the pages and failed to capture more finer grained emotional responses from the articles. We attribute it to the richness of Chinese words in expressing delicate emotions, thus, next, we will show the results obtained from a popular native Chinese emotion lexicon.

Fig. 1.
figure 1

The frequency of different emotion labels in pages of guduzheng.net based on NRC Emotion Lexicon (Chinese)

The NRC Emotion lexicon contains two labels of negative and positive emotion as well as eight emotions (as shown from left to right in Fig. 1), while DUTIR has richer emotion labels and each word must be associated with a major and minor emotion labels. To simplify the data pre-processing without compromising the outcome, we adopted weighted emotion calculation to again compute the aggregate emotion labels. Figure 2 shows the same output using this emotion lexicon.

Fig. 2.
figure 2

The frequency of different emotion labels in pages of guduzheng.net based on DUTIR

The result clearly includes more varied and finer-grained emotions as reflected in the site. Among the emotions, the five most heavily moods are disappointment, shy, surprise, belief, miss accordingly. The ‘sum’ of these emotions is generally in line with those from Study 1, which vividly illustrates the current emotions the public might hold for this population.

Study 2: Emotion Lexicon-Based Study on Weibo posts.

Similar analysis had been performed on this set of data. Due to space limitation, we will only report the aggregate moods using the NRC emotion lexicon. Figures 3 and 4 depict the emotion frequency collectively from these Weibo posts related to ASD.

Fig. 3.
figure 3

The frequency of different emotion labels in Weibo posts based on NRC Emotion Lexicon (Chinese)

Fig. 4.
figure 4

The frequency of positive and negative emotions in Weibo posts based on NRC Emotion Lexicon (Chinese)

Positive emotions such as trust, joy, and anticipation combined depicted an encouraging mood in the public; due to the enormously popular of Weibo (dubbed as the Chinese Twitter), the results are more interesting.

Out of 270,750 Weibo corpus, we found merely ten posts (about 0.0039%) are closely related to autism; among the ten, 90% of the posts are related to keyword “自闭症” and the rest (one post) is related to keyword “孤独症”. As for the geographic location, we found most of the Weibo posts vary from different provinces across China. No correlation among the location and the keywords are found. Interestingly, one post was released by a user from Japan. The extremely unexpected low number of posts might, from another perspective, reflect the low public awareness and interest in this population, which might not be surprising. Since in the Chinese society, having children with mental disabilities such as ASD is regarded as a family shame and failure [41], and thus should be kept as secret and further negatively affect their help-seeking.

Discussions. Compared with the results obtained from the two emotion lexicons (Figs. 1 and 2), the one using the DUTIR emotion lexicon yields more fine-grained results which thus offering us with richer information regarding public emotions on the ASD population. The results are thus consistent with our speculations the Chinese version of NRC emotion lexicon might not be reliable as a basis due to the sophisticated nature of Chinese words [35, 36].

During data cleaning, we also found out that a large number of posts were removed due to the fact that autism was cited as a reclusive personality trait which is different from the word use in English. The unexpectedly low number of posts related to autism might explain the extremely low awareness of ASD in China despite recent government efforts to overcome this increasingly serious issue [39].

4 Concluding Remarks

In Asia, the diagnosis, assessment and intervention of ASD is significantly lagging behind its western counterpart: there is no systematic prevalence study in China yet as to how many of its population has been affected by ASD. In this paper, we present our study, the first of its kind, to offer some preliminary, yet early valuable insights into the practices, knowledge and public awareness of ASD through lexical-affinity based emotion analysis on textual contents extracted from a notably well-known Chinese support site on ASD and one enormously popular social media site-Weibo. Mixed results were obtained. The ‘sum’ of our feeling is potentially positive and encouraging; yet the data obtained from Weibo are in line with previous works that public awareness of ASD is very low in China and the Asia Pacific region [40, 41]. Thanks to the increasing Chinese government supports and more research and development in this area, it is our ‘collective’ hope that more HCI community can engage in such efforts in China.

While our work provides valuable, yet preliminary, insights on the link between the temporal and collective public sentiment on ASD and its public awareness in China, its findings might pave the way for further investigations that are beneficial to both raising public awareness and executing public policies on autism education and medical care for thousands of millions of ‘undocumented’ individuals and their loved ones living in ASD.