Bridging Languages, Broadening Access: Examining an Observation-Based Autism Assessment with a Latinx Sample
- Open Access
- 09-02-2026
- Original Article
Abstract
Delen
Autism spectrum disorder (ASD), hereinafter referred to as autism, is diagnosed based on the presence of social communication challenges and restricted or repetitive behaviors that typically manifest early in life (American Psychiatric Association, 2013). On average, individuals in the U.S. are diagnosed with autism at 47 months old but age of diagnosis varies by geographical location (36 months [California]—69.5 months [Laredo, Texas]; Shaw et al., 2025). There are benefits to being identified as having autism earlier in life, such as accessing intervention services that support the development of social communication skills (Wallis & Guthrie, 2024). The effects of early intervention have led to efforts by specialists and researchers to reliably screen and identify children as early as possible.
Various tools have been implemented to screen for autism in early childhood. Level 1 screeners (e.g., Modified Checklist for Autism in Toddlers, Revised, Ages & Stages Questionnaires, Parents’ Evaluation of Developmental Status, etc.) are designed to detect individuals who may be at risk for developmental delays in the general population (Sanchez-Garcia et al., 2019). These screeners may indicate that an individual needs close monitoring and further testing. While attempts to implement universal screening are warranted, the use of level 1 screeners often leads to high false positive rates because they err on the side of over-identification, so as to not miss children who need support (Wetherby et al., 2021). To address this issue, level 2 screeners were developed to differentiate between autism and other conditions (such as global developmental delays or language disorders) among those who have already been identified as being developmentally at risk. Level 2 screeners help triage individuals at higher risk for autism and reduce overall wait times by minimizing false positives before referral for a comprehensive evaluation (Khowaja et al., 2018). These screeners aim to provide greater specificity than broader caregiver-report level 1 screeners and require fewer resources than full, comprehensive assessments (Khowaja et al., 2018). These tools are available in various formats, including caregiver rating scales, brief caregiver interviews, and observation-based measures.
Observation-based level 2 screeners are particularly useful because direct observations can help offset the limitations of informant-only reports, demonstrating strong clinical utility for differentiating among neurodevelopmental disorders (Miller et al., 2017; Nordahl-Hansen et al., 2014; Pellegrini, 2001). In fact, medical autism diagnoses rely primarily on direct behavioral observation by trained clinicians, along with a developmental history, typically obtained through caregiver report or medical chart review. Observation-based tools have the potential to speed up access to services. Widely used and standardized tools like the Autism Diagnostic Observation Schedule, Second Edition (ADOS-2; Lord et al., 2012) remain the gold standard of observational assessments, but they can be time intensive to administer and score, and families must travel to the clinic. For clinicians working in organizations with long waitlists, level 2 observational tools can offer valuable structure and guidance, helping standardize what behaviors to look for that are indicative of autism and score rather than relying solely on subjective impressions. The ability to confidently diagnose individuals and quickly assess their strengths and weaknesses in combination with other assessment tools that include medical chart review and parent interviews and questionnaires may help them access services sooner and allow clinicians to serve a greater number of individuals. Overall, the use of brief, low-cost standardized observational tools is one practical strategy that can address the ongoing waitlist crisis in the autism field (Kanne & Bishop, 2021) without compromising assessment quality.
The dual function of these brief standardized observation-based assessments became particularly evident during the COVID-19 pandemic. During the pandemic, the Autism Diagnostic Observation Schedule, Second Edition (ADOS-2; Lord et al., 2012) could not be administered in a standardized manner due to masking requirements. This limitation underscored the need for clinicians to be able to observe core autism features (i.e., social communication differences and restricted, repetitive behaviors) within a relatively structured, yet ecologically valid context and to code these behaviors using a standardized framework modeled after the ADOS-2. As a result, the Brief Observation of Symptoms of Autism (BOSA; Dow et al., 2021) was made widely accessible during the COVID-19 pandemic, with providers receiving free training in its administration and coding to ensure continued access to standardized behavioral assessments when in-person visits were not feasible. The BOSA is a 12–14-min semi-structured video-recorded interaction between the individual being assessed and a familiar social partner (e.g., caregiver, sibling or another familiar adult). The BOSA can be administered remotely or in-person. There are four different versions of the BOSA that were designed for different ages and language levels (more detailed information in the Method). Using the BOSA during the COVID-19 pandemic allowed clinicians to observe social communication and restricted, repetitive behaviors within a relatively structured, yet ecologically valid context and to code these behaviors using a standardized framework modeled after the ADOS-2. Individuals being assessed were observed (through video recordings or from an observation room) interacting with a familiar individual. This gave clinicians the opportunity to observe and score behaviors systematically. Not only did the use of the BOSA as part of the comprehensive assessment lead to a diagnostic conclusion, but having a standardized observation between the individual being assessed and a familiar adult allowed clinicians to better support goal setting and treatment recommendations. Beyond its use during the COVID-19 pandemic, the BOSA remains a valuable tool for clinicians, allowing them to observe interactions between the individual being assessed for autism and familiar individuals (e.g., caregivers, siblings) and incorporate these observations into comprehensive assessments.
Some widely used observational assessments that are available include the Screening Tool for Autism in Toddlers (STAT; Stone et al., 2004), Naturalistic Observation Diagnostic Assessment (NODA; Smith et al., 2017), TELE-ASD-PEDS (Wagner et al., 2021), the Autism Detection in Early Childhood (ADEC; Young & Nah, 2016), the Systematic Observation of Red Flags (SORF; Dow et al., 2019), and the Brief Observation of Symptoms of Autism (BOSA; Dow et al., 2021). All of these tools were created with the goal of expediting the diagnostic process and most of them in reducing the need for in-person visits. Because these level 2 screeners are also available remotely, they have the potential to promote greater inclusion for individuals with limited access to trained providers (Corona et al., 2021). A unique strength of the BOSA in particular is its utility across a broad age range. As the field of autism assessment has expanded to include and diagnose older individuals with autism, the availability of tools validated across a wide age range has become increasingly important (Wigham et al., 2019).
Despite overall advances in autism screening practices and the proliferation of observation-based tools, including level 2 screeners, families of color from non-English-speaking households remain more susceptible to having longer wait times for assessment, receiving later diagnoses, and facing delays in starting early intervention services, compared to primarily English-speaking households (Chavez et al., 2021; Imanpour, 2024; Lim et al., 2020; Zuckerman et al., 2017). Implementation of observation-based assessments in real-world settings remains limited by cost, time, and a shortage of qualified multilingual providers. There is limited research on the psychometric properties of observation-based level 2 screeners with underserved populations across a broader age range. As such, additional research is needed to assess how observational tools such as the BOSA perform within community-based, multilingual individuals from low-income households, in order to address existing disparities and promote equitable, timely access to autism diagnostic services.
The Current Study
The present study evaluates the reliability of the BOSA among Latinx youth and adults with autism and related neurodevelopmental conditions. Currently, the BOSA is available to researchers and experienced ADOS-2 users for free upon request. Preliminary validation of the BOSA in English-speaking samples has shown strong sensitivity, specificity, and convergent validity with the ADOS-2 across assessment settings (Dow et al., 2021). More recent work in Latin America also supports its feasibility in Spanish-speaking contexts, though some modules exhibited reduced sensitivity or specificity, highlighting the need for continued investigation across varying linguistic and cultural settings (Granana et al., 2025).
The goal of our study was twofold. We aimed to (1) assess the utility of the BOSA in Spanish with a bilingual sample and (2) expand the literature on its use in English with a culturally diverse Latinx sample fluent in English. We calculated the psychometric properties, including sensitivity and specificity of the BOSA in Spanish and English. Then, using the subset for whom we had both Spanish and English BOSAs, we calculated whether its performance is comparable across languages. Furthermore, we examined the role of individual language proficiency on BOSA performance in each of the languages. By centering a standardized, time- and cost-efficient observational tool, this study aims to inform more equitable yet reliable approaches to autism diagnosis in under-resourced, multilingual communities.
Method
Participants
A total of 98 Latinx participants (ranging in age from 15 months to 42 years) who completed at least one BOSA in English were included in this study (see Table 1 for demographic information). All participants were living in the United States and were fluent in English. Of the 98 participants, a subset (N = 42) was recruited as part of a separate and subsequent study focused on English–Spanish bilingual individuals and families from Southern California (see Tafolla et al., 2025) and thus were administered BOSAs in both English and Spanish. The 42 bilingual families were predominantly from low-income (69% of participants reported a household income below $65,000), primarily Spanish-speaking households (74% of caregivers reported Spanish was their primary language). The bilingual group was a community-based sample recruited from schools, Regional Centers, community non-profit organizations, and local autism organizations.
Table 1
Demographics for participants with english (N = 98) and Spanish (N = 42) Data by BOSA version
MV-T English | MV-T Spanish | MV-1 English | MV-1 Spanish | PSYF English | F1 English | F1 Spanish | F2 English | F2 Spanish | |
|---|---|---|---|---|---|---|---|---|---|
n | 19 | 9 | 19 | 8 | 14 | 34 | 13 | 12 | 12 |
Age in months [M (SD)] | 25.95 (5.14) | 23.2 (4.7) | 51.95 (17.7) | 53.6 (17.5) | 65.7 (26.3) | 96.7 (43.8) | 131.0 (28.2) | 298.8 (92.6) | 298.8 (92.6) |
Sex male | 63% | 67% | 89% | 75% | 78% | 68% | 54% | 50% | 50% |
Autism diagnosis | 84% | 63% | 84% | 67% | 79% | 88% | 77% | 75% | 75% |
Verbal IQ [M (SD)] | 54 (23) | 51 (28) | 47 (29) | 41 (28) | 68 (9) | 93 (15) | 85 (18) | 103 (25) | 103 (25) |
Nonverbal IQ [M (SD)] | 86 (19) | 87 (26) | 72 (29) | 71 (31) | 86 (18) | 102 (14) | 96 (15) | 99 (14) | 99 (14) |
Interactant | |||||||||
Caregiver | 47% | 89% | 21% | 88% | 14% | 35% | 92% | 75% | 75% |
Clinician | 53% | 11% | 79% | 12% | 79% | 65% | 8% | 17% | 17% |
Other | 0% | 0% | 0% | 0% | 7% | 0% | 0% | 8% | 8% |
Test location | |||||||||
Home | 16% | 33% | 21% | 50% | 21% | 65% | 46% | 33% | 33% |
Clinic | 84% | 67% | 63% | 12.5% | 79% | 29% | 38.5% | 50% | 50% |
Community | 0% | 0% | 16% | 37.5% | 0% | 6% | 15.5% | 17% | 17% |
Dominant language | |||||||||
Spanish | – | 78% | – | 88% | – | – | 38% | – | 25% |
Diagnostic Procedures
All 98 participants had diagnoses of autism or other neurodevelopmental (e.g., global developmental delays, ADHD) or mental health conditions (e.g., depression, anxiety). Most participants (84%) received a best estimate diagnosis of autism. Because participants were recruited as part of different research studies, diagnostic procedures varied across participants (Dow et al., 2021; Tafolla et al., 2025).
Comprehensive evaluations were conducted for the subset of 42 English–Spanish bilingual participants (Tafolla et al., 2025). Their battery of assessments included the ADOS-2 in Spanish and in English, a measure of cognitive functioning, and caregiver reports of autism-related symptoms, adaptive skills, and externalizing/internalizing behaviors gathered via questionnaires. For some participants, an additional semi-structured interview was also administered (when a more detailed history was necessary to determine the best estimate diagnosis).
For the remaining English-speaking participants, various strategies were used to determine the best estimate diagnoses (Dow et al., 2021). Some of these participants received comprehensive evaluations which also included a battery of assessments such as the ADOS-2, measures of cognitive functioning, and caregiver reports of autism-related symptoms gathered via questionnaires or a semi-structured interview. Other participants presented with historical diagnoses given by community providers, which were confirmed by the clinical team using an ADOS-2 and/or a semi-structured questionnaire (the Autism Diagnostic Interview, Revised [ADI-R; Rutter et al., 2003]), and reviewing all available information. Assessments were conducted in clinic or community-based settings (e.g., homes and libraries).
Measures
A demographic form was collected to gather background information about participants and caregivers, including child characteristics (e.g., sex and age).
Bilingual Participants
For the 42 bilingual participants in the second study, we collected additional data including sociodemographic information (e.g., income and caregiver primary language), language dominance, and BOSAs both in English and Spanish. To quantify the 42 participants’ language dominance, we used a parent questionnaire called the Bilingual Input–Output Survey (BIOS; Peña et al., 2018). The BIOS allows for an estimate of the amount of language individuals hear and speak during each waking hour of weekdays and weekends. We used the percentage of exposure to each language to determine language dominance for less verbally fluent participants and output of each language to determine language dominance for verbal participants. For less verbally fluent participants, being exposed to Spanish over 50% of the time or above were considered Spanish dominant. For verbally fluent participants, speaking Spanish over 50% of the time or above were considered Spanish dominant. See Tafolla et al., 2025 for more information on calculating language dominance. BOSAs were administered in English and Spanish approximately four weeks apart, with the order of administrations randomized. Parents were asked to participate as their child’s social partner in both the English only and Spanish only BOSAs if they were able; however, many parents in this sample had limited English proficiency. If a parent was unable to administer the BOSA in either language, a bilingual examiner conducted the administration instead. All adult participants provided written informed consent, and parents provided consent for minors. All participants were compensated following each in-person visit. The study was approved by and conducted in compliance with the Institutional Review Boards at all participating institutions.
The Brief Observation of Symptoms of Autism (BOSA)
The BOSA was adapted from the Autism Diagnostic Observation Schedule, Second Edition (ADOS-2; Lord et al., 2012), one of the most widely used standardized observation-based assessments used to inform diagnoses of autism. The ADOS-2 is a 40- to 60-min semi-structured assessment administered and scored by trained clinicians, consisting of five modules that are commercially available (Toddler, Module 1, Module 2, Module 3, Module 4) and two modules available for research (Adapted Modules 1 and 2 for adolescents and adults with limited language; Bal et al., 2020) tailored to different ages and language levels.
Similar to the ADOS-2, the BOSA is an observation-based assessment that provides a naturalistic social context for interaction between a familiar adult (e.g., caregiver, teacher, or clinician–referred to as the interactant or social partner) and the individual, using standardized materials and activities (Dow et al., 2021). The measure entails a brief 12–14-min semi-structured interaction modeled on the ADOS-2 and adapted from the Brief Observation of Social Communication Change (BOSCC; Byrne & Lord, 2023, 2024; Grzadzinski et al., 2016). The BOSA can be administered remotely or in-person, as all sessions are video recorded. Administrations may take place in a variety of settings, including the participant’s home, a clinic, or community-based locations (e.g., libraries, community centers). Interactants are given instructions on each of the tasks approximately five minutes before they begin the interaction, during which they can ask clarifying questions. The observing clinician helps guide the interaction by signaling when it is time to move on to the next activity, either in-person or virtually.
The BOSA consists of four versions based on the individual’s age and language level. The BOSA-MV is designed for individuals of any age who are nonspeaking or who use only single words or rote phrases. It corresponds to the ADOS-2 Toddler Module, Module 1 or the Adapted Module 1. Each administration uses two sets of developmentally appropriate toys and bubbles, which can be drawn from existing ADOS-2 kits or purchased through the University of California, Los Angeles (Byrne et al., 2023, 2024). The BOSA-PSYF is for individuals of any age with phrase speech or for verbally fluent children under age eight and aligns with the ADOS-2 Module 2 or the Adapted Module 2. It includes developmentally appropriate toys or materials (some overlapping with the BOSA-MV), a dollhouse or mailbox, interactive items (e.g., a rocket launcher), and bubbles. The BOSA-F1, intended for verbally fluent children ages 6 to 10 who generally meet criteria for the ADOS-2 Module 3, incorporates turn-taking games and structured conversational prompts. The BOSA-F2 is designed for verbally fluent individuals ages 11 through adulthood and corresponds to the ADOS-2 Module 4. It includes more complex games (e.g., Jenga, Slap Jack) and social-emotional questions adapted from the ADOS-2.
Each BOSA version uses two sets of similar materials (one set per half-session) to support standardized elicitation and social presses by providing novel materials to the participants throughout. After administration, a clinician experienced in using the ADOS-2 codes the interaction using the corresponding ADOS-2 protocol. Research reliability on the ADOS-2 is not a requirement for community-based clinicians to use the BOSA. These codes are then transferred to a BOSA scoring sheet using a binary system (0/1). Select items, chosen based on empirically supported psychometric properties that prioritize sensitivity (Dow et al., 2021), comprise the BOSA algorithm. Each version has a distinct cutoff score and categorizes individuals into one of three autism concern ranges: little-to-no concern, mild-to-moderate concern, and moderate-to-severe concern. Cutoffs for concern are tied to the ADOS Module that would have been most appropriate. Thus, when the BOSA-MV is given to a child who would have received the Toddler Module (BOSA-MV-T) on the ADOS-2, it has a different cutoff score than when the BOSA-MV is given to a child who would have received Module 1 (BOSAMV-1) on the ADOS-2. Meeting the BOSA cutoff indicates a level of concern equivalent to moderate-to-severe symptoms consistent with DSM-5 criteria for autism.
Coding Procedure
The BOSA uses the ADOS-2 scoring protocol, requiring experience using and coding the ADOS-2. While research reliability on the ADOS-2 is not required for community-based clinicians who use the BOSA, all coders in this study were ADOS-2 research reliable because data was for study purposes. Coders were instructed to watch the 12–14-min semi-structured interaction, take notes using ADOS-2 protocols, and assign item scores based on their observations. The number of items coded by the clinicians ranged from 29 to 41, depending on the ADOS-2 module. These items reflect behaviors related to communication, reciprocal social interaction, and restricted and repetitive behaviors in alignment with DSM-5 and ICD-11 diagnostic criteria.
Each item was scored on a scale from 0 (no abnormality in the behavior) to 2 or 3 (abnormality of the behavior clearly present). Because the BOSA interaction is significantly shorter than a full ADOS-2 administration, coders sometimes lacked sufficient information to score certain items. In these cases, items were marked as ‘8’ (unable to code), which were later converted to 0 to avoid over-penalizing the participant. Select ADOS-2 items that comprise the BOSA algorithm were then converted to binary scores: 0 (no abnormality of behavior observed) or a 1 (abnormality of behavior clearly present). Algorithm items were summed to determine whether the individual met the established cutoff indicating moderate-to-severe concern for autism (Dow et al., 2021).
Spanish-language BOSAs were randomized to four coders who fluently spoke and/or understood Spanish, while English BOSAs were randomized to more than nine coders (including those who scored videos in Spanish). We controlled the assignment of BOSAs so that the same coder did not code the same participant’s BOSA in both languages. All coders were blind to participants’ diagnostic status. In this project, all participants were asked to stick to the language they were assigned during the administration. However, coders noted that participants often codeswitched from one language to the other, meaning if they were assigned to speak in Spanish they sometimes used English. This did not affect the scoring, as the administrator of the BOSA would respond in the language that was assigned for the BOSA and it did not occur frequently. Furthermore, coders were asked not to count those instances of codeswitching against the participant and give credit.
Analytic Plan
Using best-estimate clinical diagnoses as the reference standard, sensitivity and specificity with exact (Clopper–Pearson) 95% confidence intervals were calculated for each BOSA version (i.e., MV-T, MV-1, F1 and F2) in Spanish (N = 42). The PSYF version was excluded from the Spanish analyses due to the limited number of available Spanish PSYF administrations. Sensitivity and specificity with exact (Clopper–Pearson) 95% confidence intervals were also calculated for all English BOSA versions (N = 98) (i.e., MV-T, MV-1, PSYF, F1 and F2, N = 98).
Multilevel logistic regression models were then fit separately by language to assess the association between a positive BOSA screen and an autism diagnosis across BOSA versions. Models allowing both random intercepts and random slopes for module were initially evaluated but due to the small number of modules and limited sample sizes within BOSA versions, these models did not provide stable estimates of between module heterogeneity. Therefore, results from hierarchical models including random intercepts only were retained.
To assess systematic differences in positive classifications between English and Spanish BOSA administrations, exact McNemar’s tests were conducted for the 42 participants who completed BOSAs in both languages. Subsequently, agreement between English and Spanish screener classifications within each BOSA version was evaluated using Cohen’s kappa. Finally, for participants with both Spanish and English BOSAs, we calculated the percentage of participants with discrepant cutoff classifications across languages and descriptively examined demographic characteristics including language proficiency to identify whether that may have influenced the discrepancy. All analyses were done using R (R Core Team, 2023).
Results
Sensitivity and Specificity
For the Spanish BOSA, the MV-T, MV-1, and F1 showed good sensitivity and adequate specificity, whereas F2 showed great sensitivity but poor specificity. For the English sample, the cutoff scores resulted in good discrimination between autism and non-autism groups across most versions, except for F1, which had the lowest sensitivity (see Table 2). Formal statistical comparisons were not computed due to the small sample sizes. Sensitivity and specificity for the English BOSAs from the subset of bilingual participants were also computed and were comparable to the full English sample, thus only the results for the full sample were reported.
Table 2
Sensitivity and Specificity of the BOSA in English and Spanish with 95% Confidence Intervals
BOSA | English sample N = 98 | English sensitivity | English specificity | Spanish sample n = 42 | Spanish sensitivity | Spanish specificity |
|---|---|---|---|---|---|---|
n | % (CIs) | % (CIs) | n | % (CIs) | % (CIs) | |
MV-T | 19 | 0.94 (0.70–1.00) | 1.00 (0.29–1.00) | 9 | 0.83 (0.36–1.00) | 0.67 (0.09–1.00) |
MV-1 | 19 | 0.94 (0.70–1.00) | 1.00 (0.29–1.00) | 8 | 0.80 (0.28–1.00) | 0.67 (0.09–1.00) |
PSYF | 14 | 0.91 (0.59–1.00) | 0.67 (0.09–0.99) | N/A | 0.50 (0.01–0.99) | Not estimable |
F1 | 34 | 0.63 (0.44–0.80) | 0.75 (0.19–0.99) | 13 | 0.80 (0.44–0.98) | 1.00 (0.29–1.00) |
F2 | 12 | 0.89 (0.52–1.00) | 0.67 (0.09–0.99) | 12 | 1.00 (0.66–1.00) | 0.33 (0.01–0.91) |
Multilevel Logistic Regression Models
Results from the multilevel logistic regression models showed that across both English and Spanish administrations, screening positive on the BOSA was strongly associated with an autism diagnosis. After accounting for clustering by BOSA version, children who screened positive had higher odds of receiving an autism diagnosis in both English (odds ratio ≈ 23) and Spanish (odds ratio ≈ 11). In both languages, the predictive value of a positive screen was consistent across BOSA versions. Results are presented in Table 3.
Table 3
Multilevel Logistic Regression Models
Language | Effect | Estimate (log-odds) | SE | Odds ratio (OR) | 95% CI (OR) |
|---|---|---|---|---|---|
English | Intercept | −0.003 | 0.50 | 1.00 | 0.37–2.66 |
English | Screen positive | 3.14 | 0.78 | 23.1 | 5.00–107.8 |
Spanish | Intercept | −0.47 | 0.57 | 0.63 | 0.20–1.91 |
Spanish | Screen positive | 2.38 | 0.78 | 10.8 | 2.3–50.04 |
Agreement and Positive Classifications Between English and Spanish
The BOSA demonstrated moderate agreement between the Spanish and English administrations across the different versions (κ = 0.43–0.53), indicating reasonable consistency between languages (see Table 4). The PSYF had a very small number of paired subjects and produced unstable estimates. Only F1 approached statistical significance (p = 0.053). Overall, these findings support moderate cross-language agreement of the screener, while highlighting variability due to limited data in some BOSA versions. McNemar's tests showed no evidence of asymmetric classification between English and Spanish versions across modules (all p > .05). Within the PSYF version, there were no discordant English/Spanish classifications and therefore the test was not applicable. Cutoff classifications between the Spanish and English BOSAs were consistent for 79% of bilingual participants. Among the nine participants with discrepant scores, five were within one point of the cutoff on one version, suggesting minimal differences in those cases. Whether participants met the cutoff in one language versus the other did not appear to be influenced by language dominance. The observed variability may instead be due to other unexplored factors, including differences in interactants or day-to-day fluctuations in behavior.
Table 4
Cohen’s Kappa across BOSA versions
BOSA version | N (paired subjects) | Cohen’s κ | Interpretation | z | p Value |
|---|---|---|---|---|---|
MV-T | 9 | 0.50 | Moderate | 1.5 | 0.134 |
MV-1 | 8 | 0.47 | Moderate | 1.32 | 0.187 |
PSYF | 2 | 0 | NA | NA | NA |
F1 | 13 | 0.53 | Moderate | 1.94 | 0.05 |
F2 | 12 | 0.43 | Moderate | 1.81 | 0.07 |
Discussion
This study provided initial evidence regarding the psychometric performance of the Brief Observation of Symptoms of Autism (BOSA) with a culturally and linguistically diverse Latinx sample. Our sample was unique in that the entire sample was Latinx and fluent in English and approximately half of the participants were English–Spanish bilingual and completed BOSAs in both languages. The BOSA and other similar remote assessments have been proposed as alternatives to gold-standard, in-person observational tools, or as a screener to triage individuals and determine whether a comprehensive evaluation is warranted. The BOSA is not intended as a replacement for comprehensive diagnostic evaluations, as supported by the psychometric evidence from the current study, but rather as a complementary tool to support and inform clinical decision-making. The BOSA may be particularly valuable in settings where traditional autism assessment tools are less feasible due to language-related or geographical barriers for instance, though additional research and potential refinements are still needed.
Overall, the BOSA demonstrated adequate psychometric properties with our Latinx sample for some age and language level groups, but not for all. Sensitivity and specificity findings were generally consistent with those reported in the original validation study, particularly for minimally verbal individuals and those speaking in simple phrases (Dow et al., 2021). Consistent with these findings, multilevel logistic regression analyses showed that screen-positive status was strongly associated with autism diagnosis across languages, even after accounting for clustering by BOSA version. However, sensitivity was notably lower for individuals with fluent speech, with similar patterns observed in the original study (Dow et al., 2021). This suggests that the BOSA may be less robust in detecting autism symptoms in specific sub-populations, and that particular care should be taken in screening older and more verbal individuals.
Furthermore, the English and Spanish versions across the different modules of the BOSA performed similarly, as evidenced by their moderate agreement (Cohen’s K [kappa]) and positive classifications (McNemar’s test), with some notable nuances. Specifically, the Spanish version of the BOSA-F1 (younger individuals with fluent language) showed better psychometrics than the English version, showing both higher sensitivity and specificity (Table 2). One possible explanation for this finding is the difference in interactants across the English and Spanish BOSAs within this group. In the English version of the BOSA-F1, clinicians were more likely to serve as the social partner, whereas caregivers were more commonly the interactants in the Spanish administrations. Unlike the ADOS-2, where clinicians are held to some standards of skill and following protocols, the BOSA can be done with parents or other familiar adults whose behaviors may be more varied (though there are attempts to keep instructions and ways of conveying expectations as clear as possible across all interactants). Future research should aim to hold the interactant constant across both language administrations. This was not feasible in our present sample, as many parents were monolingual Spanish speakers and therefore could not complete the BOSA in English; however, obtaining a parent–child interaction, when possible, even in just one language, was still considered valuable.
In relation to the BOSA-F2 group (older individuals with fluent language), the Spanish BOSA showed great sensitivity but specificity below the expected threshold, meaning it had a high number of false positives. One explanation may be that most individuals administered the BOSA-F2 were more fluent in English than Spanish, further complicating the interpretation of Spanish-language administrations, potentially complicating social communication between the participant and the interactant. While these findings should be interpreted cautiously due to the small sample size, they highlight the need for additional research on how bilingualism may affect brief observational assessments for adults. Bilingualism could introduce complexity in interpretation of behavior, especially in brief assessments, where social behaviors related to the use of a second language, such as code switching or social hesitancy, may resemble features of autism (Fombonne, 2020). These findings suggest that this may be a particular issue for individuals with more fluent language, as opposed to those who primarily speak in single words or short phrases.
While our small sample size limits our ability to draw firm conclusions regarding BOSA performance with this population, our findings indicate that the BOSA shows promise as a tool for addressing several gaps in autism assessment, pending further validation. The BOSA may serve as an effective observational assessment within autism screening and diagnostic procedures, especially when used in combination with other tools such as medical chart review and parent interviews and questionnaires, particularly for less verbally fluent individuals and toddlers. Its potential for administration by non-specialists in early intervention and school-based settings could help reduce diagnostic delays that disproportionately affect families of color and non-English-speaking households (Aylward et al., 2021). Future research should evaluate the BOSA’s sensitivity, specificity, and predictive value in this role, especially when used by clinicians with varying levels of ADOS-2 training.
Further, the BOSA has the potential to expand access to autism assessments and address the evaluation waitlist crisis if used with additional tools (Kanne & Bishop, 2021). The BOSA kits can be shipped anywhere, including families’ homes to avoid travel and minimize barriers to access. Interactants can be coached through the interaction via telehealth, or they can send video recordings of the BOSA interactions to the clinician via a secure platform if preferred. Observational tools like the BOSA that can be administered in naturalistic, or community settings are particularly valuable for reducing barriers to early detection in underserved populations (Dow et al., 2019; McCarty & Frye, 2020). Moreover, few brief observational tools have been validated for linguistically diverse populations across the lifespan, which positions the BOSA as uniquely valuable in addressing diagnostic disparities (Dow et al., 2021; Zander et al., 2015).
Finally, the BOSA administrations (when videoed at multiple timepoints using the same materials) have been used to measure change over time in response to intervention. A validated coding scheme designed for this purpose, the Brief Observation of Social Communication Change (BOSCC), has been validated using adult–child interaction videos in the same format as the BOSA (Byrne et al., 2023, 2024; Grzadzinski et al., 2016; Reszka et al., 2024). This multipurpose kit enhances clinical utility and may help reduce overall costs for providers. Although the BOSCC has not yet been formally validated with multilingual or non-English-speaking populations, it has been administered in several languages, including German, French, Hindi, Korean, Dutch and Spanish, though systematic validation in these languages is still needed.
Limitations and Future Directions
Results presented should be carefully interpreted given the small sample size, especially for the Spanish group. The statistical power is insufficient to draw firm conclusions, and larger samples are needed to ensure stable parameter estimates and more reliable cross-language comparisons. Despite the sample size limitation, this study provides some preliminary evidence regarding both the utility of the BOSA, with different methodologies leading to the same results, and its limitations. While the BOSA offers several advantages, including a modular standardized structure, brief administration time, and use of relatively inexpensive materials, its current requirement that coders be familiar with the ADOS-2 may limit scalability in community settings and among clinicians without formal ADOS-2 training. Because the BOSA relies on clinicians experienced in ADOS-2 administration and scoring, an important next step will be to evaluate whether it can be used effectively by clinicians who do not use the ADOS-2 in their clinical practice. Findings from this study also suggest that further adaptation may be needed when using the BOSA as a screener for individuals with more fluent language abilities (i.e., versions F1 and F2 of the BOSA), given the lower sensitivity and specificity in this subgroup. Therefore, additional research is needed to optimize performance among older and more verbally fluent populations.
Looking ahead, a key goal is to continue refining the BOSA and evaluate its role within diagnostic referral and evaluation pathways. In applying the BOSA to bilingual and multilingual populations, it will also be important to further examine factors that may influence the validity and interpretability of the BOSA assessment, such as the role of language dominance and the identity of the social partner (e.g., caregiver vs. clinician). Recruiting larger samples will be critical to address these additional questions while also potentially needing to adjust cutoff scores to increase psychometrics.
Conclusion
Equitable access to autism evaluations remains a critical global need (Brinster et al., 2023; Divan et al., 2021). Observational assessments play a central role in identifying autism-related behaviors, particularly for individuals whose symptoms may not be fully reflected in parent-report measures (Zander et al., 2015). Mischaracterization of autism, whether through under- or over-identification, can have lasting consequences for individuals and families, including inappropriate service provision, increased stigma, and persistent unmet support needs. The BOSA can help address this diagnostic equity gap by providing a standardized yet flexible observational tool that is more feasible to implement in under-resourced, multilingual communities and clinically informative when used with additional tools. Beyond these contexts, the BOSA also holds broader clinical value, as it enables the collection of more naturalistic behavioral samples (for example, of a child with a parent) to supplement in-person evaluations with a range of social partners not limited to the clinician, and helps reduce access barriers (such as transportation) by allowing for remote, video-based administration. Continued refinement and validation in real-world settings will be essential to ensuring the BOSA’s utility in promoting culturally and linguistically responsive autism diagnostic practices.
Acknowledgments
We would like to thank all of the families who participated in this study and all community-based organizations that made recruitment possible.
Declarations
Conflict of Interest
Catherine Lord receives royalties from Western Psychological Services (WPS) for the sales of the ADOS-2, SCQ, and ADI-R.
Ethical approval
The BOSA is copyrighted by WPS due to its overlap with the ADOS and BOSCC, but it is currently available without cost with permission from WPS for researchers. This study received ethical approval from the University of California, Los Angeles Institutional Review Board (#23–000535) in May 2023. All study procedures were conducted in compliance with the ethical standards of the institutional research ethics committee. Written informed consent was obtained from all adult participants, or from parents or legal guardians for minor participants and those unable to provide consent.
Author Contributions
Data collection was performed by all authors. M.T., J.L. and C.L. contributed to the study conception and design. Material preparation and analysis were performed by the first author. M.T. and J.L. wrote the original draft, and all authors contributed to reviewing, editing and approved the final submission.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.