Open Access 27-06-2025 | Original Article
Media Equation of the Interaction of Children with Autism Spectrum Disorder: A Proof-of-Concept Approach Using an Equivalence Test in a Within-Subject Design
Gepubliceerd in: Journal of Autism and Developmental Disorders
Abstract
Autism spectrum disorder (ASD) is a neurodevelopmental disorder characterized by long-term deficits in social interaction and communication, as well as repetitive and restricted behaviors, interests, or activities (Diagnostic and Statistical Manual of Mental Disorders– Fifth Edition; American Psychiatric Association, 2013). The global rate of ASD diagnosis has increased (Bougeard et al., 2021; Chiarotti & Venerosi, 2020), with approximately 1 in 100 children receiving this diagnosis worldwide (Zeidan et al., 2022). The average age at first diagnosis of ASD in Western countries is approximately 5 years (van’t Hof et al., 2021), while children with intellectual disability are receiving an earlier diagnosis than those without (Höfer et al., 2019). In Germany, the average age at first diagnosis is as late as 6.5 years, when most children are already attending school and parents have expressed concerns from around the age of two onward (Höfer et al., 2019). One of the most reliable way to discriminate between individuals with ASD and typically developing (TD) individuals seems to administer tasks that aim to capture social communication and interaction (Mukherjee et al., 2024). For example, a systematic review and meta-analysis by Wood-Downie et al. (2021) revealed that individuals without ASD have better social interaction and communication abilities than individuals with ASD. Observing and evaluating interaction difficulties is a crucial part of diagnosing ASD and is therefore considered the gold standard (Lord et al., 2020). However, the process of evaluating social interaction and communication requires clinical expertise and is very time-consuming, which explains long waiting lists.
The longer waiting times for diagnosis in Germany than in other countries (Höfer et al., 2019), which is associated with uncertainty for the entire family (Wiggins et al., 2006) and delays the initiation of family support and child therapy (Tariq et al., 2018), clearly demonstrate the need for improved forms of care for suspected cases of ASD. This problem is addressed by the publicly funded project “Identification of Autism Spectrum Disorder using speech and facial expression recognition” (IDEAS). The project aims to develop a tablet-based digital screening tool that captures and automatically rates symptoms of ASD on the basis of speech, facial expression, and interaction and removes those candidates who are not suspected of having ASD from the waiting list. The implementation of new technologies could accelerate the diagnostic process and, as a result, facilitate earlier treatment for children with ASD.
In recent years, research into the clinical use of new technologies for ASD patients has increased, accelerated by the COVID-19 pandemic (Kumm et al., 2022). In particular, the process of ASD detection can be significantly improved via digital technologies (Dahiya et al., 2021; Desideri et al., 2021; Kohli et al., 2022). For example, Colombo et al. (2022) examined a web-based online screening tool for the detection of ASD and provided evidence for the integration of such screening services into primary care (Colombo et al., 2022). In addition, the results of the systematic review by Dahiya et al. (2021), which identified 16 studies on the use of new technologies to assess children with suspected ASD, strongly support live video assessment, video observation, and digital or telephone-based methods. Furthermore, Drimalla et al. (2019, 2020) reported that it is possible to predict the diagnosis of ASD with high accuracy on the basis of facial expressions, vocal characteristics, and gaze patterns via simulated dialog using video calls with a digital tool that they developed to automatically quantify biomarkers of social interaction deficits. Overall, the use of digital tools with objective data collection and analysis methods offers promising potential for the early detection of ASD.
Using a variety of sensors, such as those available in tablets (cameras with depth sensors, sound recordings, and touch-sensitive screens), ASD-related symptoms such as social-emotional and language skills can be recorded even in natural environments (Mukherjee et al., 2024). For example, Coffman et al. (2023) examine whether objective and quantitative measures of ASD-related behaviors collected via an app on a smartphone or tablet correlate with standardized caregiver and clinician ratings. The results confirm that the measured behaviors do in fact correlate with clinician ratings, indicating that the app can be useful in identifying ASD profiles (Coffman et al., 2023).
Automated digital screening using an avatar for interaction may be of particular interest to children with ASD (Georgescu et al., 2014) and may even be better for them. However, the results are contradictory. On the one hand, for example, in the study by Kellems et al. (2023), elementary-aged children with ASD (8–10 years) presented greater social engagement when they were interacting with an avatar than when they were interacting with a human. Here, four out of five children prefer to talk to an avatar rather than a human (Kellems et al., 2023). However, the explanation for this preference is not yet clear. One possible reason could be the more colorful presentation of a cartoonish avatar and its exaggerated facial expressions (Kellems et al., 2023). Another reason could be that interactions with an avatar are less socially demanding (Moore & Calvert, 2000). On the other hand, in contrast to Kellems et al. (2023), a study by Carter et al. (2014) suggested that humans are more effective than avatars in eliciting verbal and non-verbal communication in children aged 4–8 years (n = 12). In both studies, the avatar was not a human character, but a tropical fish in the study by Kellems et al. (2023) and a turtle in the study by Carter et al. (2014), as in the movie "Finding Nemo". In both cases, the avatar was controlled live by the experimenters via a computer. In contrast to the study by Carter et al. (2014), Kellems et al. (2023) had the experimenter introduce the interaction with the avatar by having a short conversation with the avatar to demonstrate how to communicate properly with it. This may also have contributed to a preference for interaction with the avatar. In conclusion, further research into mediated communication is needed, as is validation of the diagnostic comparability of real and simulated interactions for the specific needs of individuals with ASD.
To achieve the goal of developing automated digital screening for ASD, the first step and aim of this research was therefore to test the media equation, i.e., whether children with ASD interact in a digital environment as they would in a face-to-face situation. The media equation implies that people perceive technology-mediated experiences in the same way as nonmediated experiences do (Lee, 2008; Reeves & Nass, 1996). This is because interacting with media is essentially social and natural, similar to interacting in real life (Reeves & Nass, 1996). This can be supported by the rich-get-richer hypothesis (Valkenburg & Peter, 2007), which suggests that adolescents show continuity in social interaction across settings, including computer-mediated communication (CMC). Correspondingly, individuals who have communication difficulties in real life might also have difficulties with CMC (Paulus et al., 2020). These findings suggest that individuals with ASD could also have difficulties with social communication in the digital environment. On the other hand, individuals with ASD report that interaction via CMC does not replace face-to-face interaction because of reduced social information (Cummings et al., 2002; Ritzman & Subramanian, 2024). Therefore, it can also be assumed that individuals with ASD, for whom face-to-face interaction is challenging, may be attracted to and can benefit from CMC (MacMullin et al., 2016; Shane & Albert, 2008; van der Aa et al., 2016). Bagatell (2010) suggested that it can give individuals the opportunity to interact with others in terms of the safety and security of a familiar place. This finding is supported by the study by Cardon and Azuma (2012), in which children with ASD preferred video presentations over live presentations. The participants’ visual attention was greater during the video presentation than during the live presentation (Cardon & Azuma, 2012). Thus, the results suggest that visual attention in children with ASD can be influenced by the type of presentation (Cardon & Azuma, 2012).
In summary, the theory of media equation suggests that people behave in the same way in a digital environment as they do in a real environment, but the state of research also shows that there may be a preference for digital presentation, and it is also unclear how avatars affect the interaction of children with ASD. However, media equation is often simply assumed in the development of digital screening tools for children with suspected ASD. To the best of our knowledge, no study has investigated this underlying assumption prior to the development of a digital screening tool. However, for digital screening to be successful, the symptoms of ASD must be elicited and presented to discriminate between children with and without ASD (important here: sensitivity and specificity). This means that children must behave in the same way in a digital situation as they would in a real situation. The purpose of this paper is therefore to address this research gap via a proof-of-concept approach by addressing the following question:
1)
Do children with ASD interact in a digital environment in the same way that they would interact in a face-to-face situation?
Materials and Methods
The materials and methods used in the proof-of-concept study, including details on the research design, are presented in the following sections. Prior to the study, a positive ethics vote was obtained from the ethics committee, Department of Rehabilitation Sciences, TU Dortmund University (GEKTUDO_2022_45).
Participants
Families with children diagnosed with ASD were recruited in 2023 and 2024 from autism therapy facilities in a German metropolitan area as a part of a larger research project (IDEAS) on the medial elicitation of ASD-associated symptoms. Written information about the study was mailed or given to the families by the staff of the autism therapy facilities, after which the families contacted the study manager if they were interested, had their questions answered, and provided written consent. The children were also informed verbally and asked for their consent at the testing appointment. Demographic information about the families and children (e.g., age, highest level of parental education, diagnoses, IQ, information about language skills) was collected through a parent questionnaire at the testing appointment.
The inclusion criteria were preceded by special considerations regarding gender aspects. Research highlights that more is known about boys with ASD, who are diagnosed more often than girls, with a male-to-female ratio of approximately 3:1 (Elsabbagh et al., 2012; Jiménez-Muñoz et al., 2022; Loomes et al., 2017; Zeidan et al., 2022). Girls may be misdiagnosed or diagnosed later because their ASD symptoms present differently (Hodges et al., 2020; Hull et al., 2020); for example, they tend to have better social interaction and communication skills (Wood-Downie et al., 2021). However, to approximate the highly heterogeneous population in terms of symptoms, the current phase of the study included only boys aged 6–11 years (elementary school age), who are verbally fluent and have no cognitive impairment.
For an initial proof-of-concept within-subject design, 20 monolingual German-speaking boys diagnosed with ASD were recruited. Only one participant was bilingual. Most of the participants (n = 16) achieved a sufficient score (T ≥ 40) on the Test of Grammar Comprehension (German adaptation of the Test for Reception of Grammar; Fox-Boyer, 2023), which was lower for four children (reasons for cancellation: lack of motivation and concentration, bilingualism). The standardized test assesses comprehension of a variety of syntactic structures of increasing complexity. In total, the children achieved a mean T-score of 52.68 (SD = 17.99) on the German adaptation of the Test for Reception of Grammar. Therefore, problems in answering the questions should not be due to a lack of language comprehension.
The mean age of the boys was 112.75 months (SD = 20.82), i.e., 9.4 years. Despite the diagnosis of ASD, parents did not report any additional physical diagnoses or cognitive impairments for these children. However, five children had a psychiatric diagnosis of impulse control disorder, two had attention deficit hyperactivity disorder, two boys had both and one child had epilepsy. All participants had adequate vision and hearing (vision corrected with glasses in n = 5). All diagnostic information was provided by the parents and was based on medical examinations and parent questionnaires. Most families (n = 16) reported a high school diploma as the highest level of education. Four families reported that they had a lower level of education.
Research Design and Testing Materials
To investigate whether children with ASD interact in digital formats in the same way as they would face-to-face, a within-subject design was implemented with five conditions representing a hierarchy of successive digital mediations (see Fig. 1).
Fig. 1
The conditions in the current investigation were counterbalanced in characteristics concerning the authenticity of communication and the child’s interlocutor (experimenter). The conditions are presented in the upper boxes, and the characteristics are visualized on the x-axis (Figure based on Pliska et al., 2023)
The face-to-face condition was considered the baseline condition because it closely resembles everyday conversations as well as typical face-to-face assessments in gold standard ASD diagnostics (described in Kamp-Becker et al., 2021). From there, the conditions varied in time (real-time vs. pre-recorded) and in the authenticity of the interlocutor (real person ‘live’ vs. pre-recorded, digital avatar ‘live’ vs. pre-recorded). Overall, the conditions can be grouped into different clusters that face each other. One possibility is a real person (conditions 1, 2, and 4) vs. an avatar (conditions 3 and 5). Another option is real-time (conditions 1, 2, and 3) vs. pre-recorded (conditions 4 and 5). The digital avatar, created via Apple’s Memoji software (Apple Inc., 2023), visually mimicked the real-person experimenter conducting the test.
Each child participated in two sessions and completed all five conditions. To control for systematic effects, the conditions were fully counterbalanced. For example, one child went through the conditions in order 1, 2, 3, 4, and 5, the next child in order 2, 3, 4, 5, and 1, then a child in order 5, 4, 3, 2, and 1, and so on, so that no child had the same order. The conditions and their characteristics are shown in Fig. 1, with the conditions shown in upper boxes and the characteristics visualized on the x-axis. In each condition, tasks were set to elicit behavior related to interaction, among other things. These tasks are designed to potentially elicit symptoms of ASD, such as deficits in social interaction and communication (American Psychiatric Association, 2013).
The task to induce interactional behavior in an interview sequence (warm-up/conversational situation task) was always performed at the beginning of each condition (methodologically inspired by Drimalla et al., 2019, 2020). To ensure comparable content under each condition while avoiding repetitive behavior, the content used in each condition was parallelized. The interview sequence followed a similar question sequence (e.g., open-ended question, decision question, open-ended question), but different age-appropriate topics of conversation were chosen (e.g., seasons, animals, etc.; see an example in Table 1). This approach allowed for variation in conditions while maintaining consistency in content. In addition, the experimenter performs the task via a script so that the experimenter always interacts in a neutral and consistent manner. This procedure was practiced by the experimenter in a pre-test in advance.
Table 1
Example of test material (Personal Translation)
Condition | Question 1 | Question 2 | Question 3 |
---|---|---|---|
3: Avatar real-time | What is actually your favorite food? | What is your favorite drink? | What is your favorite sweet? |
5: Avatar pre-recorded | Which animals do you actually like? | What is your favorite animal? | What other animals do you like? |
Testing Procedure
The tests were conducted via extensive and sophisticated technical equipment. To facilitate communication in the digitally mediated conditions, several iPads (including an iPad Pro 11”, 3rd gen.; running iPad OS 16.3) and Apple’s FaceTime technology (iPad OS version 16.3; Apple Inc., 2023) were used. The speech component of the test was recorded via a TASCAM DR-40X recorder and a Sennheiser MEB table microphone for speech analysis.
Figure 2 provides a visual representation of the test setup. The child is seated with an unobstructed view of the external screen displaying the digitally mediated conditions and the experimenter in the face-to-face condition. An external camera is positioned diagonally in front of the child. This camera records the entire test. A technical assistant is positioned behind the child and monitors and controls the stimuli throughout the test.
Fig. 2
Test Setting in the Current Investigation
Each child is tested in the same room with a low level of stimuli. Children 1 to 16 were tested in a room at the autism therapy center where the child received therapy sessions. Children 17 to 20 were tested in the testing room at the university.
Data Analysis
To compare the interaction in the different media formats for each individual child with ASD, the warm-up/conversational situation task was analyzed. For this purpose, the videos of the tests were coded by two independent raters on an ordinal scale from 1 (no response), 2 (non-verbal or other reactions not related to the conversation), 3 (verbal response not related to the conversation), 4 (non-verbal or other reactions related to the conversation) to 5 (verbal response related to the conversation). Zero is a residual category that includes “no idea”, “don’t know”, or responses given after further prompting. First, both raters were given videos of the test and evaluated them, clarifying any uncertainties with the test administrator. For example, if the child simply waves at the screen in response to the question “Which animals do you actually like?“, this would be coded as a 2. If the child imitates a cat, it would be a 4. If the child says, “We went to the zoo yesterday,” it would be a 3. Only if the child names a specific animal would it be assigned a score of 5.
Interrater reliability was calculated for 15% of the coding via Krippendorff’s alpha with the “stringr” package (Wickham and Posit Software, 2023) in R (R Core Team, 2023, version 4.3.1). For all three variables of the warm-up/conversational situation task, Krippendorff’s alpha is 1, indicating perfect agreement (see Zapf et al., 2016) and highlighting the objectivity of the rating categories. Descriptive and statistical analyses to test whether there was no difference– i.e. equivalence– in the interaction between conditions were also performed via R (R Core Team, 2023, version 4.3.1). One statistical way to test for equivalence is the two one-sided test (TOST; Lakens et al., 2018). We adjusted the TOST to be based on the Wilcoxon signed rank test– due to the small sample size and non-normal distribution– with mu = 0.5 (equivalence would be rejected with a 0.5 difference in encoding) and Bonferroni correction. The Bonferroni correction divides the original significance level (α = 0.05) by the number of tests performed to reduce the likelihood of false positives. The Wilcoxon signed rank test was used over other non-parametric tests because it is best used with ordinal data with non-normal distribution (Caldwell, 2025). Since equivalence is tested and the effect size is always related to a difference, it can be assumed that the smaller the effect, the more it indicates equivalence. The dependent variable was always the score of the question (ordinal score from one to five; see above), and the independent variable was the condition (condition 1 vs. 2 vs. 3 vs. 4 vs. 5 or real person vs. avatar or real-time vs. prerecorded).
Results
Figure 3 show the codes for each question and condition per child. At the individual level, children 2, 9, 18 and 20 interacted differently between conditions. This was the case for all three questions. For children 1, 5, 12 and 13, this was only the case for 2 out of 3 questions. Children 3, 4, 6, 11, 16, 17 and 19 interacted identically in each condition and for each question, namely, verbally in relation to the conversation (score 5). The other children showed a deviation in their interaction in only one question for one condition; otherwise, they interacted in the same way across all conditions.
Fig. 3
Single Case Presentation of Children 1 to 20
Table 2 displays the proportions (in %) of the type of interaction within a condition between the questions. In Question 1, 100% of the children gave a verbal response that was related to the conversation within condition 1, and 95% of the children in conditions 2 and 4, the real person conditions. In conditions 3 and 5 (avatar conditions), this was the case for fewer children (condition 3: 80%; condition 5: 85%). In question 2, 95% of the participants gave a verbal response that was related to the conversation within condition 3 and 90% in conditions 1 and 2 (real-time conditions), and 5% provided a verbal response that was not related to the conversation (conditions 1 and 3). In conditions 4 and 5 (pre-recorded conditions), a verbal response to the conversational response was given by fewer children: 80% in condition 5 and 75% in condition 4. In both conditions, there are more than two different responses or reactions. For question 3, in conditions 1 and 3, 95% of the children responded verbally related to the conversation. In condition 5, it was 90%, and in conditions 2 and 4, it was 80%. In condition 2 and 4, there are more than two different responses or reactions. An overview is also shown in Fig. 4.
Table 2
Percentage distribution of the interaction
Task–Question | Value/Code | Condition | ||||
---|---|---|---|---|---|---|
1. Face-to-Face | 2. Facetime | 3. Avatar real-time | 4. Video pre-recorded | 5. Avatar pre-recorded | ||
Warm-up– Question 1 | 0 | / | / | 15% | 5% | 10% |
1 | / | 5% | / | / | 5% | |
2 | / | / | / | / | / | |
3 | / | / | 5% | / | / | |
4 | / | / | / | / | / | |
5 | 100% | 95% | 80% | 95% | 85% | |
Warm-up– Question 2 | 0 | 5% | 10% | / | 20% | 10% |
1 | / | / | / | / | / | |
2 | / | / | / | / | / | |
3 | 5% | / | 5% | 5% | 5% | |
4 | / | / | / | / | 5% | |
5 | 90% | 90% | 95% | 75% | 80% | |
Warm-up – Question 3 | 0 | 5% | 15% | / | 5% | 10% |
1 | / | / | 5% | 10% | / | |
2 | / | / | / | / | / | |
3 | / | 5% | / | / | / | |
4 | / | / | / | 5% | / | |
5 | 95% | 80% | 95% | 80% | 90% |
Fig. 4
Answers by Condition and Question for Individuals with ASD (n = 20). Conditions: 1 = Face-to-Face, 2 = Facetime, 3 = Avatar real-time, 4 = Video pre-recorded, 5 = Avatar pre-recorded
Results of Equivalence Testing for Each Item
Overall, the median across all children and all conditions is 5 (see Table 3). Conversely, the mean values of the raw conversation scores show some variation between conditions.
Table 3
Descriptive Data of analysis for individuals with ASD (n = 20)
Condition | Task | ||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Warm-up– Question 1 | Warm-up– Question 2 | Warm-up– Question 3 | |||||||||||||||||||
min | max | Md | M(SD) | min | max | Md | M(SD) | min | max | Md | M(SD) | ||||||||||
1. Face-to-Face | 5 | 5 | 5 | 5.00 (0.00) | 0 | 5 | 5 | 4.65 (1.18) | 0 | 5 | 5 | 4.75 (1.12) | |||||||||
2. Facetime | 1 | 5 | 5 | 4.80 (0.89) | 0 | 5 | 5 | 4.50 (1.54) | 0 | 5 | 5 | 4.15 (1.84) | |||||||||
3. Avatar real-time | 0 | 5 | 5 | 4.15 (1.84) | 3 | 5 | 5 | 4.90 (0.45) | 1 | 5 | 5 | 4.80 (0.89) | |||||||||
4. Video pre-recorded | 0 | 5 | 5 | 4.65 (1.18) | 0 | 5 | 5 | 3.90 (2.05) | 0 | 5 | 5 | 4.30 (1.59) | |||||||||
5. Avatar pre-recorded | 0 | 5 | 5 | 4.30 (1.72) | 0 | 5 | 5 | 4.35 (1.57) | 0 | 5 | 5 | 4.50 (1.54) | |||||||||
Pre-recorded | 2.5 | 5 | 5 | 4.48 (0.98) | 0 | 5 | 5 | 4.13 (1.44) | 2.5 | 5 | 5 | 4.00 (1.02) | |||||||||
vs. | |||||||||||||||||||||
Real-time | 2 | 5 | 5 | 4.65 (0.81) | 1.67 | 5 | 5 | 4.68 (0.82) | 3.3 | 5 | 5 | 4.57 (0.71) | |||||||||
Avatar | 0 | 5 | 5 | 4.23 (1.39) | 2.5 | 5 | 5 | 4.63 (0.79) | 2.5 | 5 | 5 | 4.65 (0.86) | |||||||||
vs. | |||||||||||||||||||||
Real Person | 3 | 5 | 5 | 4.82 (0.57) | 0 | 5 | 5 | 4.35 (1.27) | 3 | 5 | 5 | 4.40 (0.86) |
In question 1, only condition 1 has a median of 5 and a mean of 5.00 (SD = 0), so each boy gives a verbal response related to the conversation. Conditions 2 and 4 also have a median of 5 but a mean of 4.80 (SD = 0.89) in condition 2 and a mean of 4.65 (SD = 1.18) in condition 4. The small differences in the means are due to the fact that one child, in each condition, did not give a verbal conversational response. However, equivalence can be assumed due to the percentage frequency of 100% verbal conversational response in condition 1 and 95% in conditions 2 and 4 (see Table 2). These are real person conditions. In the conditions involving the avatar, the means are smaller (condition 3: M = 4.15, SD = 1.84; condition 5: M = 4.30, SD = 1.72). In conditions 3 and 5 of question 1, only 80% in condition 3 and 85% in condition 5 gave a verbal conversational response. Two children in condition 5 and three children in condition 3 did not answer verbally in a conversational manner but answered “no idea”, “don’t know”, or only when prompted. In addition, one child in condition 3 gave a verbal, nonconversational response, and one child in condition 5 gave no response. Overall, the mean value in the real person conditions is 4.82 (SD = 0.57) and 4.23 (SD = 1.39) in the avatar conditions. When considering the pre-recorded conditions (conditions 4 and 5) are compared with the real-time conditions (conditions 1, 2 and 3), the mean for the pre-recorded conditions is 4.48 (SD = 0.98), and for the real-time conditions is 4.65 (SD = 0.81). Question 1 shows significant equivalence only between conditions 1 (Face-to-Face) and 2 (Facetime) and between conditions 2 (Facetime) and 4 (Video pre-recorded; see Table 4). These are real person conditions. For question 1, the mean difference between an avatar and a real person is greater (0.59; M = 4.23 and M = 4.82) than for questions 2 (0.28; M = 4.63 and M = 4.35) and 3 (0.25; M = 4.65 and M = 4.40).
Table 4
Equivalence Test
Question | Comparison (G1 vs. G2) | N | Median | TOST | Effect Size | ||||
---|---|---|---|---|---|---|---|---|---|
G1 | G2 | Lower p.value | Upper p.value | Test Decision | |||||
1 | WU_Question1 | Avatar pre-recorded vs. Avatar real-time | 20 | 5 | 5 | 0.0315 | 0.1951 | TOST H0 cannot be rejected | -0.086 |
2 | WU_Question1 | Avatar pre-recorded vs. Face-to-Face | 20 | 5 | 5 | 0.2776 | 0.0001 | TOST H0 cannot be rejected | 0.271 |
3 | WU_Question1 | Avatar pre-recorded vs. Facetime | 20 | 5 | 5 | 0.2448 | 0.0025 | TOST H0 cannot be rejected | 0.186 |
4 | WU_Question1 | Avatar pre-recorded vs. Video pre-recorded | 20 | 5 | 5 | 0.2358 | 0.0263 | TOST H0 cannot be rejected | 0.095 |
5 | WU_Question1 | Avatar real-time vs. Face-to-Face | 20 | 5 | 5 | 1.0000 | 0.0002 | TOST H0 cannot be rejected | 0.352 |
6 | WU_Question1 | Avatar real-time vs. Facetime | 20 | 5 | 5 | 0.6033 | 0.0002 | TOST H0 cannot be rejected | 0.352 |
7 | WU_Question1 | Avatar real-time vs. Video pre-recorded | 20 | 5 | 5 | 0.8756 | 0.0034 | TOST H0 cannot be rejected | 0.248 |
8 | WU_Question1 | Face-to-Face vs. Facetime | 20 | 5 | 5 | 0.0001 | 0.0021 | significant equivalence | -0.095 |
9 | WU_Question1 | Face-to-Face vs. Video pre-recorded | 20 | 5 | 5 | 0.0001 | 0.0366 | TOST H0 cannot be rejected | -0.186 |
10 | WU_Question1 | Facetime vs. Video pre-recorded | 20 | 5 | 5 | 0.0023 | 0.0027 | significant equivalence | -0.005 |
11 | WU_Question2 | Avatar pre-recorded vs. Avatar real-time | 20 | 5 | 5 | 0.5585 | 0.0027 | TOST H0 cannot be rejected | 0.262 |
12 | WU_Question2 | Avatar pre-recorded vs. Face-to-Face | 20 | 5 | 5 | 0.2250 | 0.0096 | TOST H0 cannot be rejected | 0.095 |
13 | WU_Question2 | Avatar pre-recorded vs. Facetime | 20 | 5 | 5 | 0.3685 | 0.0342 | TOST H0 cannot be rejected | 0.148 |
14 | WU_Question2 | Avatar pre-recorded vs. Video pre-recorded | 20 | 5 | 5 | 0.0254 | 0.8359 | TOST H0 cannot be rejected | -0.171 |
15 | WU_Question2 | Avatar real-time vs. Face-to-Face | 20 | 5 | 5 | 0.0024 | 0.0368 | TOST H0 cannot be rejected | -0.095 |
16 | WU_Question2 | Avatar real-time vs. Facetime | 20 | 5 | 5 | 0.0024 | 0.0414 | TOST H0 cannot be rejected | -0.100 |
17 | WU_Question2 | Avatar real-time vs. Video pre-recorded | 20 | 5 | 5 | 0.0025 | 1.0000 | TOST H0 cannot be rejected | -0.352 |
18 | WU_Question2 | Face-to-Face vs. Facetime | 20 | 5 | 5 | 0.0023 | 0.0027 | significant equivalence | -0.005 |
19 | WU_Question2 | Face-to-Face vs. Video pre-recorded | 20 | 5 | 5 | 0.0002 | 1.0000 | TOST H0 cannot be rejected | -0.352 |
20 | WU_Question2 | Facetime vs. Video pre-recorded | 20 | 5 | 5 | 0.0029 | 0.8750 | TOST H0 cannot be rejected | -0.252 |
21 | WU_Question3 | Avatar pre-recorded vs. Avatar real-time | 20 | 5 | 5 | 0.0366 | 0.0024 | TOST H0 cannot be rejected | 0.100 |
22 | WU_Question3 | Avatar pre-recorded vs. Face-to-Face | 20 | 5 | 5 | 0.0325 | 0.0024 | TOST H0 cannot be rejected | 0.090 |
23 | WU_Question3 | Avatar pre-recorded vs. Facetime | 20 | 5 | 5 | 0.0029 | 0.2231 | TOST H0 cannot be rejected | -0.171 |
24 | WU_Question3 | Avatar pre-recorded vs. Video pre-recorded | 20 | 5 | 5 | 0.0340 | 0.3685 | TOST H0 cannot be rejected | -0.138 |
25 | WU_Question3 | Avatar real-time vs. Face-to-Face | 20 | 5 | 5 | 0.0023 | 0.0025 | significant equivalence | -0.005 |
26 | WU_Question3 | Avatar real-time vs. Facetime | 20 | 5 | 5 | 0.0029 | 0.9712 | TOST H0 cannot be rejected | -0.267 |
27 | WU_Question3 | Avatar real-time vs. Video pre-recorded | 20 | 5 | 5 | 0.0002 | 0.2843 | TOST H0 cannot be rejected | -0.352 |
28 | WU_Question3 | Face-to-Face vs. Facetime | 20 | 5 | 5 | 0.0029 | 0.8750 | TOST H0 cannot be rejected | -0.252 |
29 | WU_Question3 | Face-to-Face vs. Video pre-recorded | 20 | 5 | 5 | 0.0034 | 0.4744 | TOST H0 cannot be rejected | -0.243 |
30 | WU_Question3 | Facetime vs. Video pre-recorded | 20 | 5 | 5 | 0.2233 | 0.2825 | TOST H0 cannot be rejected | -0.038 |
In question 2, only condition 3 had a median of 5 and a mean of 4.90 (SD = 0.45) because 95% of the children gave a verbal conversational response. The mean value of condition 1 is 4.65 (SD = 1.18), and of condition 2 is 4.50 (SD = 1.54), where 90% of the children gave a verbal conversational response. One child in each of conditions 1 and 3 gave a verbal response that was not conversational, and one child in condition 1 and two children in condition 2 answered ‘no idea’, ‘don’t know’ or only when prompted. Conditions 1, 2 and 3 are real-time conditions. The averages of the pre-recorded conditions are different (condition 4: M = 3.90, SD = 2.05; condition 5: M = 4.35, SD = 1.57), where the interaction was more variable in conditions 4 and 5 (see Tables 2 and 3). Overall, the mean for the pre-recorded conditions is 4.13 (SD = 1.44), and for the real-time conditions, it is 4.68 (SD = 0.82). For question 2, the mean value was 4.63 (SD = 0.79) in the avatar conditions and 4.35 (SD = 1.27) in the real person conditions. Question 2 also shows a significant equivalence between conditions 1 (Face-to-Face) and 2 (Facetime), i.e., the real person and real-time conditions (see Table 4). The comparison between all three real-time conditions (1, 2 and 3) was just below significance in TOST. Therefore, the mean difference between real-time and pre-recorded conditions is greater for question 2 (0.55; M = 4.68 and M = 4.13) than for question 1 (0.17; M = 4.65 and M = 4.48) but very similar to question 3 (0.57; M = 4.57 and M = 4.00).
In question 3, the median for each condition is 5. The mean of condition 1 is 4.75 (SD = 1.12), and that of condition 3 is 4.80 (SD = 0.89). In both conditions, 95% of the children gave a verbal response related to the conversation. Condition 2 has a mean value of 4.15 (SD = 1.84), condition 4 has a mean of 4.30 (SD = 1.59), and condition 5 has a mean of 4.50 (SD = 1.54). Here, 90% of the children in condition 5 gave a verbal response related to the conversation, and 80% of the children in conditions 2 and 4 gave a verbal response. Therefore, it can be assumed that conditions 1, 3, and 5 are almost equivalent. However, question 3 only shows a significant equivalence between conditions 1 and 3, i.e., between a real person and an avatar (see Table 4). The comparison with condition 5 was just below significance. Question 3 has a mean of 4.00 (SD = 1.02) in the pre-recorded conditions and 4.57 (SD = 0.71) in the realtime conditions and 4.65 (SD = 0.86) in the avatar conditions and 4.40 (SD = 0.86) in the real person conditions.
In summary, Table 4 displays the results of the equivalence test. Question 1 shows significant equivalence only between conditions 1 (Face-to-Face) and 2 (Facetime) and between conditions 2 (Facetime) and 4 (Video pre-recorded). These are real person conditions. Question 2 also shows a significant equivalence between conditions 1 (Face-to-Face) and 2 (Facetime), i.e., the real person and real-time conditions. Question 3 only shows a significant equivalence between conditions 1 (Face-to-Face) and 3 (Avatar real-time). For the remaining comparisons between conditions, the null hypothesis of TOST (H0: difference in the interaction between conditions) could not be rejected. However, some of the comparisons just missed the significance threshold: for question 1 between conditions 1 and 4 (real person), for question 2 between conditions 1 and 3 and between conditions 2 and 3 (real-time), and for question 3 between conditions 3 and 5 and between conditions 1 and 5 (real person– avatar).
Overall, the largest, however small effect to be found is for question 1 between conditions 3 and 1 and conditions 3 and 2 (effect size of approximately 0.3). In particular, the effects for the demonstrated equivalences are as good as 0, which also supports equivalence. Overall, the effects close to 0 indicate equivalence between the conditions in the questions.
Overall Results of the Task
Table 5 shows the descriptive analysis of the conditions in the warm-up task where the three questions were averaged. All conditions have a median of 5. The means between conditions and after clustering differ only slightly from each other. It only varies between a mean of 4.28 and 4.80. The mean value of condition 1 is 4.80 (SD = 0.75), of condition 2 is 4.48 (SD = 0.90), of condition 3 is 4.62 (SD = 0.84), of condition 4 is 4.28 (SD = 1.36), and of condition 5 is 4.38 (SD = 1.25). Comparing the pre-recorded and real-time conditions, the pre-recorded conditions have a mean of 4.33 (SD = 0.93), and the real-time conditions have a mean of 4.63 (SD = 0.66). The mean difference between the pre-recorded and real-time conditions is 0.30 (M = 4.33 and M = 4.63). The mean of the avatar vs. real person comparison was 4.50 (SD = 0.83) for the avatar conditions and 4.52 (SD = 0.76) for the real person conditions. The mean difference between the avatar and real person conditions is even smaller (0.02; M = 4.50 and M = 4.52).
Table 5
Descriptive Analysis of the conditions in the Warm-Up task for individuals with ASD (n = 20)
Condition | Warm-up task | |||
---|---|---|---|---|
min | max | Md | M(SD) | |
1. Face-to-Face | 1.67 | 5 | 5 | 4.80 (0.75) |
2. Facetime | 2 | 5 | 5 | 4.48 (0.90) |
3. Avatar real-time | 2 | 5 | 5 | 4.62 (0.84) |
4. Video pre-recorded | 0.33 | 5 | 5 | 4.28 (1.36) |
5. Avatar pre-recorded | 0 | 5 | 5 | 4.38 (1.25) |
Pre-recorded | 1.67 | 5 | 5 | 4.33 (0.93) |
Real-time | 3.78 | 5 | 5 | 4.63 (0.66) |
Avatar | 1.67 | 5 | 5 | 4.50 (0.83) |
Real Person | 2.11 | 5 | 5 | 4.52 (0.76) |
Discussion
The present study investigated how children with ASD interact with a differently mediated interlocutor. The results are interpreted in relation to the research question of whether children with ASD interact in the same way with differently mediated interlocutors.
Question item 1 shows no differences when interacting with a real person but slight differences when interacting with an avatar. This is also confirmed by the analysis result, which shows a significant equivalence between conditions 1, 2, and 4. Thus, our results support the findings of the study by Carter et al. (2014) that humans are more effective at eliciting communication than avatars. In addition, our results contradict those of the study by Kellems et al. (2023), in which the majority of the children indicated a preference for communicating with an avatar over a human. This can be partially supported by the results of question item 2. The findings revealed significant equivalence between conditions 1 and 2, which are real-time conditions with a real person. However, the third question item shows no difference between a real person in real-time and an avatar in real-time in the analysis results. Overall, equivalence was found in real-time conditions. Therefore, our findings do not support those of Cardon and Azuma (2012), who reported that children with ASD exhibited a preference for videos over live presentations. The results of the present study indicate that there were no significant differences in the interactions with real-time communication partners. However, differences were observed in the interactions with pre-recorded communication partners. To the best of our knowledge, the interaction between a real-time communication partner and a pre-recorded communication partner has not yet been investigated. For example, the studies by Carter et al. (2014) and Kellems et al. (2023) also only examined the interaction in real-time conditions, as in both studies, the experimenter controlled the avatar live via a computer. Furthermore, the results of both studies must be viewed with caution, as the sample size in both studies was very small (Carter et al., 2014: n = 12; Kellems et al., 2023: n = 5). Notably, in both studies, the avatar was not a human animated avatar but rather an animal, as in the movie “Finding Nemo” (Carter et al., 2014; Kellems et al., 2023). In the study by Kellems et al. (2023), a pilot study was conducted to determine whether to use an animated fish or a human animated character. In their pilot study, animated fish were chosen because animated human characters are often distracting and uncomfortable for children (Kellems et al., 2023). It is therefore surprising that the children in our study interacted with the human animation avatar as well as with the real person. This may also explain why we see such large differences in interaction between conditions for children 2, 9, 18, and 20 (see Fig. 3). These differences could be avoided by having the experimenter speak to the human avatar on the screen in front of the child at the beginning, as was done in the study by Kellems et al. (2023). Other reasons for these differences may be that all four children received additional prompts from the experimenter’s technical assistant; therefore, the responses had to be coded as 0 because the additional prompts may have biased the results. Notably, all four children had comorbidities: Child 2 impulse control disorder, Child 9 impulse control disorder and attention deficit hyperactivity disorder, Child 18 attention deficit hyperactivity disorder, and Child 20 epilepsy. In addition, the differences in Child 2 may be explained because the child was only 6 years old and very insecure without the presence of the mother. Notably, Child 20 has below average language skills, which may have led to the differences despite sufficient language comprehension.
When looking at the interactions across the entire warm-up task, the results are close to equivalence between the conditions. Overall, the data clearly indicates that the children with ASD interacted similarly in different conditions, despite the individual differences noted. It can therefore be assumed that a digital application with an avatar can induce similar behavior in children, as would be the case in a real-life conversation with an experimenter. Our findings support the premise of the rich-get-richer hypothesis (Valkenburg & Peter, 2007) that young individuals show continuity in social interaction across different environments. Furthermore, the results of this study do not indicate that mediated communication is more beneficial than face-to-face communication for children with ASD.
Limitations
The limitations of the study are the small sample size, which leads to low statistical power, and the subject of the study itself. Our study design is unique: first, because of the within-subject design, and second, because of the equivalence test. In a classical hypothesis test, an attempt is made to reject the null hypothesis of equality (Lakens et al., 2018). However, we were convinced that the research question of this paper could be analyzed only through a within-subject comparison and an equivalence test. Due to the small sample size and the desire for equivalence, the data show no to minimal variance. Therefore, the interpretation of the statistical results should be interpreted with caution. The low to no variance is due to the presence of ceiling effects. For example, Figs. 3 and 4 show that most of the children in each condition always responded verbally and conversationally, indicating a high level of social competence. As we are analyzing the media equation of the interaction between conditions within each child and have also assumed equality, it was to be expected that there would be a lack of variance. Notably, the coding of the residual category is included in the analysis, which can be seen as a potential limitation. As shown in Figs. 3 and 4, and in Table 2, this category is often coded, which may affect the results. However, it should be noted that this category is not equivalent to missing values and was therefore included in the analysis. Furthermore, children’s pragmatic abilities and related deficits are not taken into account, although individuals with ASD show pragmatic abnormalities (Eberhardt-Juchem, 2023). It would therefore be desirable to investigate the influence of pragmatic skills on interaction. A comparison with a control group in terms of ASD symptoms would also be of clinical interest. However, this does not contribute to the research question of the current study and has therefore been omitted.
Research Implications
The results of this study have several implications for further research. On the one hand, the results show a significant equivalence between the conditions (media equation), so that it can be assumed that the participating children behave in the same way in a digital environment as they do in a real environment. However, with a sample size of only 20 children, this requires further validation. Therefore, the sample size in this design should be increased. In addition, further analysis should also be carried out using other statistical analysis methods, such as the TOST bootstrap. The TOST bootstrap is also suitable for small samples, has minimal assumptions, and is best used when precise confidence intervals are important (Caldwell, 2025). In addition, further studies should replicate the design with a non-verbal interaction task and include children with ASD who communicate non-verbally to determine whether media equation can also be detected. In conclusion, the results of this study provide a positive outlook on the possibility of digital screening for ASD, so the next step should be to test whether specific tablet-based tasks can discriminate between children with and without ASD. To this end, children with and without ASD should perform various tablet-based tasks that have been shown to discriminate in other research, and this should be evaluated in terms of sensitivity and specificity.
Conclusion
Considering the present findings and despite the limitations of this research, we conclude that children with ASD interact similarly in digital conditions as they do in face-to-face situations. For the future vision of a digital screening tool, it is important that children with ASD do not show different behaviors in a digital environment, as is the case in the study by Kellems et al. (2023), where children show greater social engagement when interacting with an avatar than when interacting with a human. It is important that the children show (almost) the same behavior in the digital environment in order to identify ASD in a digital screening tool. On the basis of this study, it can be assumed that computer-mediated communication is very similar to face-to-face communication in children with ASD. Therefore, in this proof-of-concept approach, the media equation is present in children with ASD. This means that the participating children with ASD can be expected to exhibit ASD-related behaviors in the digital environment as they would in person. Further validation of the media equation theory for children with ASD is needed, and future research is needed to determine whether ASD-related behaviors can be reliably stratified in an application. To do this, different tasks need to be tested and evaluated in the development process. In conclusion, further research is still needed to determine whether an app can be used as a screening tool for ASD.
Acknowledgements
We would like to thank all the participants and their families. We would also like to thank our cooperation partner for support with recruitment and Laura Mellinghaus, Ina Zawadka, Friederike Köller, Leoni Guska and Pia Verholen for support with data collection. We thank our IDEAS project partners from Fraunhofer Institute for Digital Media Technology and SpeechCare GmbH for their support with technical equipment. Finally, we thank statistical consulting of TU Dortmund University, particularly Swetlana Herbrandt, for support with equivalence analyses.
Declarations
Ethical Approval
The study involving human participants was reviewed and approved by the Ethics Committee, Department of Rehabilitation Sciences, TU Dortmund University (GEKTUDO_2022_45). Written informed consent to participate in this study was provided by the participant’s legal guardian/next of kin.
Conflict of interest
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.