Scolaris Content Display Scolaris Content Display

Diagnostic tests for Autism Spectrum Disorders (ASD) in preschool children

This is not the most recent version

Collapse all Expand all

Abstract

This is a protocol for a Cochrane Review (Diagnostic test accuracy). The objectives are as follows:

This review aims to identify which of the four interview tools has the best diagnostic test accuracy for diagnosing ASD in preschool children against the gold standard of multidisciplinary team clinical judgement. As it is inefficient to use more than one parent or carer report tool, the aims of this review are to assess the following.

1. Which of the parent or carer interview tools (ADI‐R, GARS, DISCO, or 3di) has the best diagnostic test accuracy?

2. How does the diagnostic test accuracy of the best performing interview tool compare to the diagnostic test accuracy of the CARS?

2. How does the diagnostic test accuracy of the ADOS‐G compare to the CARS?

3. Is the diagnostic test accuracy of any one test sufficient for it to be suitable as a sole assessment tool for preschool children?

4. Is there any combination of tests which, if offered in sequence, would provide suitable diagnostic test accuracy and enhance test efficiency?

5. If data are available, does a combination of an interview tool with a structured observation test have better diagnostic test accuracy (fewer false positives and fewer false negatives) than either test alone?

This review will evaluate diagnostic tests putting most weight on the specificity of the various index tests for the reasons outlined above.

Background

Diagnosis of an autism spectrum disorder (ASD) is complex. The currently recommended diagnostic evaluation includes an assessment of social behavior, language and nonverbal communication, adaptive behavior, motor skills, atypical behaviours and cognitive status by an experienced multidisciplinary team (Akshoomoff 2006). With regard to specific diagnostic information, it is recommended that the diagnostic process should include measures of parental report, child observation and interaction, and the use of clinical judgment (SIGN 2007; Zwaigenbaum 2009).

Target condition being diagnosed

ASD includes impairments in social interaction and communication, and restrictive repetitive or stereotypic patterns of behaviour. There are distinct diagnostic classifications within ASD, each with different requirements within the three clinical domains of communication, social interaction, and behavioural problems or differences. While ‘autism spectrum disorder’ is a commonly used term in clinical practice, it is not recognised by current mainstream disease classification systems such as the DSM‐IV (American Psychiatric Association 1994), DSM‐IV‐TR (American Psychiatric Association 2000), and ICD‐10 (World Health Organisation 2007). Diagnoses from these classification systems that are covered by ASD include 'Childhood autism' or 'Autistic Disorder'; 'Pervasive Developmental Disorder ‐ Not Otherwise Specified' (PDD‐NOS); 'Other pervasive developmental disorders'; 'Pervasive developmental disorder, unspecified'; 'Asperger syndrome' or 'Asperger Disorder', and 'Atypical autism'. Inconsistent use of diagnostic classification terms causes confusion in clinical care and service access, and complicates the conduct of research studies and the application of research findings. It is expected that the next version of DSM (DSM‐V) will group all these terms under 'Autism Spectrum Disorder' (http://www.dsm5.org/).

Estimates of the incidence of ASD vary (Williams 2006). The most recent estimates place the prevalence of any ASD between 22 and 116 out of 10,000 (Baird 2006; Guillem 2006; Williams 2008; Fombonne 2009). Males are affected about four times more frequently than females (Fombonne 2009). Problems usually present in early childhood, continue throughout life, and place a considerable burden of care on the family, heath, and educational services. Follow‐up studies have found that only 3% to 27% of people with ASD are able to live independently as adults, with variations for different diagnostic groups within the autism spectrum, and the higher percentages for those with Asperger Disorder (Howlin 2004; Cederlund 2008).

As described above, the gold standard assessment for diagnosis involves multiple professionals and multiple assessment mechanisms, is time intensive, and still requires clinical judgement. We have not found any reported studies that compare gold standard assessments made by different multidisciplinary teams. Clinical experience suggests that there would not be complete agreement between teams and that agreement would be highest for Autistic Disorder or Childhood autism diagnoses and lowest for diagnoses of Atypical autism and PDD‐NOS. Nevertheless, multidisciplinary team assessment is accepted as best practice for diagnosis of all developmental disability and, as such, services have been created for some time to provide this level of expertise in the UK, US, Europe, and Australia (Filipek 2000; SIGN 2007; Ministry of Health and Education 2008). They have also been developing in other countries (Academy of Medicine Singapore 2010).

While evidence of the effectiveness of interventions is scant in the ASD literature, it is believed that the earlier intervention begins, the greater the chances of long‐term gains. This is particularly true of behavioural interventions (Roberts 2006; Wiggins 2006). Early diagnosis also improves educational planning, service provision, and family support (Filipek 2009).

Therefore, it is important that clear recommendations with regard to the accuracy of diagnostic tests exist for clinicians and healthcare decision makers.

Index test(s)

There are a variety of diagnostic tools used in both research and clinical settings for the diagnosis of an ASD. Some rely on parent or carer report while others use observation and interview (Table 1). Many of these tools are used to standardise aspects of the history‐taking and examination, while others are used to reduce the length of diagnostic interviews and to reduce costs, especially in research studies. Most include additive scales and subscales and rely on diagnostic cut‐offs, which have been based on the classification systems at the time. 

Open in table viewer
Table 1. Types of tools

Type of instrument

Name

Parent or carer interview

Face‐to‐face ADI‐R

Face‐to‐face DISCO

Computerised 3di

Questionnaire GARS

Combination of interview and observations of unstructured activity

CARS

Semi‐structured observational assessment

ADOS‐G

The six diagnostic tests recommended in national guidelines that have been published since 1995 will be assessed in this review.

  • Developmental, Dimensional and Diagnostic Interview (3di) (SIGN 2007).

Parent or carer interview tools

The Autism Diagnosis Interview Revised (ADI‐R) provides a diagnostic algorithm for ASD that is consistent with both the ICD‐10 and the DSM‐IV. It is a standardised, semi‐structured interview during which parents or carers report information about an individual suspected of having an ASD. It assesses behaviour across three domains: reciprocal social interaction; communication and language; and restricted and repetitive, stereotyped interests and behaviours. For an individual to receive a diagnosis of ASD, scores on all three domains must be elevated beyond cut‐off levels. It focuses on current behaviours and is appropriate for adults and children with a mental age of 18 months and above. It takes one to two hours to administer (Lord 1994; Rutter 2003; Mazefsky 2006).

The Diagnostic Interview for Social and Communication Disorders (DISCO) is a detailed, semi‐structured interview that should be used with someone who knows the person being diagnosed well, preferably from infancy. It uses a dimensional approach to facilitate understanding of patterns of behaviour which have developed over time. It takes three hours to administer (Wing 2002).

The Developmental, Dimensional and Diagnostic Interview (3di) is a computerised parental interview that measures intensity of symptoms and co‐morbidities across the autism spectrum. It takes two hours to administer (Skuse 2004).

The Gilliam Autism Rating Scale (GARS) is a parent or teacher questionnaire based on the DSM‐IV and focuses on four content areas: stereotyped behaviours, communication, social interaction, and developmental disturbances. It is an effective tool in discriminating participants with ASD from those with behavioural disorders (Gilliam 1995; Mazefsky 2006). The questionnaire consists of 56 items divided into four scales: Social Interaction, Communication, Stereotyped Behaviors, and Developmental Disturbances. It takes approximately five to 10 minutes to administer (Washington State Department 2009).

Combination of interview and observations of unstructured activity

The Childhood Autism Rating Scale (CARS) is an older tool (its use began in 1966) that rates children on a scale of one to four across 15 criteria, to yield a composite score that is used to assign a diagnosis of non‐autistic, mildly autistic, moderately autistic, or severely autistic (Schopler 1986). It is particularly useful for distinguishing between children with ASD and those with other developmental disabilities. It can be completed by a clinician, parent, or teacher and is often used in research studies. It takes about 20 to 30 minutes to administer (Schopler 1980; New York State Department of Health 2005).

Semi‐structured observational assessment

The Autism Diagnostic Observation Schedule‐Generic (ADOS‐G) is a semi‐structured assessment of communication, social interaction, and play. It can be used on children or adults with limited or no language as well as those who are verbally fluent and high functioning. It consists of standard activities that allow the examiner to observe behaviours consistent with a diagnosis of ASD or other pervasive developmental disorders. Cut‐off scores are provided for disorders across the autism spectrum, including classical autism. It consists of modules (Lord 2000) that are administered based on the capacity of the child or adult. Usually one module is administered per assessment but more may be administered if the child or adult displays unexpected abilities that require further assessment (Lord 1999).

Diagnosis for clinical care

In diagnostic practice, multidisciplinary teams often use more than one of these tests and combine the results of the tests with clinical judgement to develop an overall diagnosis based on current diagnostic classification systems, like ICD‐10 or DSM‐IV‐TR.

Alternative test(s)

Tools used to screen populations for ASD will not be evaluated nor will child health surveillance tests that are used to assess clinical populations but not to provide a diagnosis (SIGN 2007).

Because Asperger Disorder (or Asperger syndrome) is not a common diagnosis in this preschool age group, diagnostic tools that have been developed specifically to diagnose that disorder will not be included.

Rationale

An ASD is a lifelong disorder and early and accurate diagnosis is important so that interventions to improve outcomes and quality of life for affected individuals and their families can be started. Current methods of diagnosis require multidisciplinary teams and lengthy assessments. Standardised parent or carer interviews and observation instruments have been developed and these are used in both research and clinical practice. They are used either in isolation, in conjunction with other tools, or as part of a multidisciplinary team assessment depending on the geographic location and service availability. However, we are unsure which of these tests has the best diagnostic accuracy, if any of them have satisfactory diagnostic accuracy for use in isolation in clinical settings or as the only tool in a multidisciplinary team assessment. We also do not know whether a combined approach or a sequential approach to administration of the tools would improve diagnostic accuracy or provide satisfactory diagnostic test accuracy in a more efficient way.

For one tool to be used in isolation, it would need to perform well with regard to both sensitivity and specificity in order to make a diagnosis of ASD, because of the implications of a false positive result in terms of labelling, selection of correct interventions, and the resource implications of those interventions. Equally, false negatives lead to a missed opportunity for timely intervention and for family adjustment and planning; they have resource implications for services needed in future years that accurate diagnosis may have made unnecessary. False negatives are of more concern if the result of a test inhibits future access to services; and are less concerning if review and follow‐up are available if a child continues to have problems that are of concern to parents and carers or other education, health, and community based professionals.

As the instruments that are currently recommended as diagnostic tests for ASD use different assessment approaches (interview versus observation versus mixed methods), it is likely that when these assessments are combined or conducted in series they offer opportunities to enhance diagnostic test accuracy or improve efficiency. We will only be able to assess if combinations of tests improve diagnostic test accuracy if studies have been conducted that present data in that way. We will be able to consider if there are any potentially suitable sequences for offering testing if we find a test suitable to be the first test in a series, that is, a test which could have a lower specificity but would have high sensitivity, is quick to administer, and requires less training. Subsequent tests would need to have both high sensitivity and specificity but would only need to be administered to those who tested positive, and so families and services would be saved time and fewer costs would be incurred.

Diagnostic test accuracy requirements for tests that are to be used as part of a multidisciplinary team assessment will be lower than for those to be used in isolation, as the multidisciplinary team assessment activity will provide opportunities to improve sensitivity and specificity, even though, to our knowledge, there are no reports of this to date. However, the same general principles apply to tests chosen to be used as part of a multidisciplinary assessment as to tests that are to be used in isolation. This is in relation to requirements for use of one test or multiple tests in combination or in sequence.

A systematic review of the available diagnostic tests is required to determine which test, if any, is best suited to clinical diagnosis of an ASD.

Objectives

This review aims to identify which of the four interview tools has the best diagnostic test accuracy for diagnosing ASD in preschool children against the gold standard of multidisciplinary team clinical judgement. As it is inefficient to use more than one parent or carer report tool, the aims of this review are to assess the following.

1. Which of the parent or carer interview tools (ADI‐R, GARS, DISCO, or 3di) has the best diagnostic test accuracy?

2. How does the diagnostic test accuracy of the best performing interview tool compare to the diagnostic test accuracy of the CARS?

2. How does the diagnostic test accuracy of the ADOS‐G compare to the CARS?

3. Is the diagnostic test accuracy of any one test sufficient for it to be suitable as a sole assessment tool for preschool children?

4. Is there any combination of tests which, if offered in sequence, would provide suitable diagnostic test accuracy and enhance test efficiency?

5. If data are available, does a combination of an interview tool with a structured observation test have better diagnostic test accuracy (fewer false positives and fewer false negatives) than either test alone?

This review will evaluate diagnostic tests putting most weight on the specificity of the various index tests for the reasons outlined above.

Secondary objectives

1. Does any diagnostic test have greater diagnostic test accuracy in age‐specific subgroups within the preschool age range?

2. Does any diagnostic test have greater diagnostic test accuracy for the different diagnostic subgroups, that is in differentiating Autistic Disorder/Childhood autism from other ASD?

Investigation of sources of heterogeneity

Potential sources of heterogeneity include age of study participants; severity and type of diagnosis (Autistic Disorder or Childhood Autism versus PDD‐NOS); presence or absence of language delay; presence or absence of intellectual disability or developmental delay; diagnostic mix of population included; prospective versus existing diagnosis; study type, and duration between diagnosis and diagnostic test accuracy studies being performed.

Methods

Criteria for considering studies for this review

Types of studies

We will include:

1. cohort studies or cross‐sectional studies addressing the diagnostic accuracy of the diagnostic tools specified for any ASD, and where participants are given one or more index tests and the reference standard;

2. randomised studies of test accuracy where participants are randomised to different index tests and all participants are verified by the same gold standard;

3. case‐control studies where participants have been selected on the outcome side, i.e. a sample of patients with ASD (for example, selected from an existing cohort) and a sample of non‐ASD children from a different source.

Participants

Participants should be those children who are suspected of having an ASD and are being seen prospectively because of concerns with social, communication or behavioural problems, or both, of the type seen in autism. Age will be restricted to the preschool years. There will be no restriction placed on setting.

Clinical subgroups are:

  • children with co‐existing developmental delay or intellectual disability;

  • children with co‐existing language delay;

  • children with co‐existing mental health problems including attention deficit hyperactivity disorder (ADHD), anxiety, and attachment disorders.

Index tests

The following index tests for ASD will be addressed.

Parent or carer interviews: Autism Diagnosis Interview Revised (ADI‐R), Diagnostic Interview for Social and Communication Disorders (DISCO), Gilliam Autism Rating Scale (GARS), and Developmental, Dimensional and Diagnostic Interview (3di).

Combination of interview and observations of unstructured activity: Childhood Autism Rating Scale (CARS).

Semi‐structured observational assessment: Autism Diagnostic Observation Schedule ‐ Generic (ADOS‐G).

Comparator tests

The diagnostic accuracy of the index tests will be compared against each other.

Target conditions

The target condition is ASD in preschool children. Diagnostic subgroups are autism (Childhood autism (ICD 10) or Autistic Disorder (DSM‐IV)), pervasive developmental disorder (Atypical autism (ICD 10) or Pervasive Developmental Disorder ‐ Not Otherwise Specified (PDD‐NOS) (DSM‐IV)), and Asperger syndrome or Asperger Disorder.

Reference standards

The reference standard will be a clinical diagnosis of autism or another ASD, as defined above, using a currently accepted classification system (DSM‐III, DSM‐III‐R, DSM‐IV, DSM‐IV‐TR, ICD‐9 or ICD‐10) as assigned by an experienced multidisciplinary team. The assessment by the multidisciplinary team will include an assessment of social behavior, language and nonverbal communication, adaptive behaviour, motor skills, atypical behaviour, and cognitive status or intellectual function. This assessment will be based on information from a clinical assessment and from health professionals involved in the child's care and from those caring for the child in community settings, such as preschool or child care settings.

Because it is known that diagnosis of specific ASD varies over time, the reference standard assessment and index test must have been performed within six months of each other.

Search methods for identification of studies

Electronic searches

Electronic searches of CENTRAL (The Cochrane Library), MEDLINE, EMBASE, ERIC, CINAHL, PsycINFO, ISI Science Citation Index, ISI Social Science Citation Index, ASSIA, Social Services Abstracts, DARE, and Autism Data will be conducted for all years held. No language or date restrictions will be applied and we will seek the translation of relevant data from non‐English studies. The MEDLINE strategy, which we will use and adapt for other databases, is in Appendix 1.

Searching other resources

Reference lists of included studies, guidelines, and reviews found in electronic searches will be searched for additional studies and web searches will be made for any index tests found.

Data collection and analysis

Selection of studies

Two authors (AS and KS‐L) will independently assess all studies for inclusion. Disagreements will be resolved by discussion. A first selection will be made by screening the titles and abstracts of the identified studies. Definitive inclusion will be done by reading the full paper.

Data extraction and management

Two authors (AS and KS‐L) will independently extract data using standardised data extraction forms. Disagreements will be resolved by discussion and consultation with a third review author (KW). If data from published reports is insufficient, the study investigators will be contacted for clarification.

All the data required to complete the 'Characteristics of studies' table and for subgroup analyses will be extracted including the following.

1. Characteristics of the participants

a. age

b. intellectual function

c. diagnoses for inclusion

d. setting for recruitment

2. Index tests

a. type of test

b. cut‐offs for diagnostic categories

3. Reference standards

a. type

b. diagnostic categories used

c. adequacy of assessment including disciplines represented by members of the multidisciplinary team, assessments completed and sources of material used to inform the diagnostic assessment

4. Study type

a. cross‐sectional study

b. cohort study

c. randomised test accuracy study

d. case‐control study

5. Results

a. number of true positives, false positives, false negatives, and true negatives

Assessment of methodological quality

Methodological quality will be assessed by two independent authors (AS and KS‐L) using the Quality Assessment of Diagnostic Accuracy Studies (QUADAS) instrument (Whiting 2003; Whiting 2004). QUADAS consists of 11 items that refer to internal validity (for example, blind assessment of index and reference test, or avoidance of verification bias). We have added one item regarding the absence of possible conflicts of interest that the researchers may have. We will classify each item as 'yes' (adequately addressed), 'no' (inadequately addressed), or 'unclear' according to the criteria listed in Table 2. Disagreements will be resolved by discussion and, if necessary, by consulting a third review author.

Open in table viewer
Table 2. Operationalisation of QUADAS items

Items and guide to classification

1. Representative spectrum. Was the spectrum of patients representative of the patients who will receive the test in practice?

  • Classify as ‘yes’ if the sample consists of children that were consecutively referred for further diagnosis of ASD and the children represent a mixture of conditions (including absence of any condition) that usually are present (for example, autistic disorder, pervasive developmental disorder not otherwise specified, children with developmental disability that is not autism but has some characteristics in common, such as global developmental delay in association with language delay, language delay alone, attachment disorders, ADHD, anxiety disorders).

  • Classify as ‘no’ if only healthy controls are used, or if non‐response is high and selective, or there is clear evidence of selective sampling.

  • Classify as ‘unclear’ if insufficient information is given to make a judgment. 

2. Acceptable reference standard. Was the reference standard likely to classify the target condition correctly?

  • Classify as ‘yes’ if the reference standard consists of a clinical diagnosis of autism or another ASD using a current accepted classification system (DSM‐III, DSM‐III‐R, DSM‐IV, DSM‐IV‐TR, ICD‐9 or ICD‐10) as assigned by an experienced multidisciplinary team (including assessment of social behaviour, language and nonverbal communication, adaptive behaviour, motor skills, atypical behaviours, and cognitive status/intellectual function) and based on information from a clinical assessment and from health professionals involved in the child's care and from those caring for the children in community settings, such as preschool or child care settings.

  • Classify as ‘no’ if the above‐mentioned methods were not used.

  • Classify as ‘unclear’ if insufficient information is given on the reference standard.

3. Acceptable delay between tests. Was the time period between the reference standard and the index test short enough to be reasonably sure that the target condition did not change between the two tests?

  • Classify as ‘yes’ if the time period between index test and the reference standard is 6 months or shorter.

  • Classify as ‘no’ if the time period between index test and the reference standard is longer than 6 months.

  • Classify as ‘unclear’ if there is insufficient information on the time period between index test and reference standard.  

4. Partial verification avoided. Did the whole sample or a random selection of the sample receive verification using the intended reference standard?

  • Classify as ‘yes’ if it is clear that all patients or a random selection of those who received the index test went on to receive a reference standard, even if the reference standard is not the same for all patients.

  • Classify as ‘no’ if not all patients or a random selection of those who received the index test received verification by a reference standard.

  • Classify as ‘unclear’ if insufficient information is provided to assess this item.

5. Differential verification avoided. Did patients receive the same reference standard regardless of the index test result?

  • Classify as ‘yes’ if it is clear that all patients who received the index test are subjected to the same reference standard.

  • Classify as ‘no’ if different reference standards were used.

  • Classify as ‘unclear’ if insufficient information is provided to assess this item.

6. Incorporation avoided. Was the multidisciplinary team assessment which formed the reference diagnosis independent of the index test (i.e. the index test did not form part of the reference standard)?

  • Classify as ‘yes’ if the index is not part of the reference standard.

  • Classify as ‘no’ if the index test is clearly part of the reference standard.

  • Classify as ‘unclear’ if insufficient information is provided to assess this item.

7. Reference standard results blinded. Were the index test results interpreted without knowledge of the results of the reference standard?

  • Classify as ‘yes’ if the results of the index test were interpreted blind to the results of the reference test.

  • Classify as ‘no’ if the assessor of the index test was aware of the results of the reference standard.

  • Classify as ‘unclear’ if insufficient information is given on independent or blind assessment of the index test.

8. Index test results blinded. Were the reference standard results interpreted without knowledge of the results of the index test?

  • Classify as ‘yes’ if the results of the reference standard were interpreted blind to the results of the index test.

  • Classify as ‘no’ if the assessor of the reference standard was aware of the results of the index test.

  • Classify as ‘unclear’ if insufficient information is given on independent or blind assessment of the reference standard.

9. Relevant clinical information. Were the same clinical data available when the index test results were interpreted as would be available when the test is used in practice?

  • Classify as ‘yes’ if only clinical data (for example, speech and language therapy, occupational therapy, developmental or psychology reports that address general assessments that are not specific for autism assessments, information from a doctor, nurse, teacher or allied health professional that lists why autism is of concern) were available in the study that normally would be available when the test results would be interpreted.

  • Classify as ‘no’ if this is not the case, for example, if other test results are available that can not be regarded as part of routine care.

  • Classify as ‘unclear’ if the paper does not explain which clinical information was available at the time of assessment.

10. Uninterpretable results reported. Were uninterpretable/intermediate test results reported?

  • Classify as ‘yes’ if all test results are reported for all patients, including uninterpretable, indeterminate or intermediate results.  

  • Classify as ‘no’ if you think that such results occurred, but have not been reported.

  • Classify as ‘unclear’ if it is unclear whether all results have been reported.

11. Withdrawals explained. Were withdrawals from the study explained?

  • Classify as ‘yes’ if it is clear what happens to all patients who entered the study (all patients are accounted for, preferably in a flow chart) or if the authors explicitly reported the absence of any withdrawals.

  • Classify as ‘no’ if it is clear that not all patients who were entered completed the study (received both index test and reference standard), and not all patients were accounted for.

  • Classify as ‘unclear’ when the paper does not clearly describe whether or not all patients completed all tests, and are included in the analysis.

12. Conflicts of interest avoided. Were conflicts of interest avoided or absent?  

  • Classify as ‘yes’ if the authors/researchers were not involved in the development of the diagnostic instrument.

  • Classify as ‘no’ if the authors/researchers were involved in the development of the diagnostic instrument.

  • Classify as ‘unclear’ if insufficient information is given.

Statistical analysis and data synthesis

The index tests assessed in this systematic review have different diagnostic outcome categories. To allow primary analyses, all diagnoses relevant to the category ASD will be considered as a diagnosis and will be compared to those diagnoses that are not an ASD.

The expected results of the index tests are as follows.

1. ADI‐R: diagnostic categories are Autistic Disorder and Asperger Syndrome, which will be combined as ASD (Lord 1994; Rutter 2003).

2. ADOS‐G: diagnostic categories are Autism and ASD, which will be combined as the category ASD (Lord 1999; Lord 2000).

3. CARS: provides a score with cut‐offs where a cut off of < 30 is non‐autistic; score of 30 to 36 is mildly autistic; and moderate or severe autism requires a score of 37 and above (Schopler 1980). A cut off of < 30 will be classified as 'not ASD' and scores of ≥ 30 as ASD (Schopler 1986).

4. DISCO: the diagnostic categories based on DISCO algorithms that are relevant to the ICD‐10 classification system are Childhood Autism, Atypical Autism, and Asperger syndrome (Wing 2002). In addition, there are diagnostic algorithms for 'early infantile autism' according to Kanner 1956, 'Asperger syndrome' based on the definition in Gillberg 1989, and 'criteria for autistic spectrum disorder' according to Wing 1979. Any of these diagnostic categories will be classified as ASD. Other diagnostic categories, including overactive disorder with mental retardation and stereotyped movements, or childhood disintegrative disorders, and children failing to fulfil ASD categories will be classified as not ASD.

5. 3dii: responses are generally coded on a 3‐point scale. There are 266 questions that are directly or indirectly concerned with disorders on the autism spectrum and 291 questions that relate to current mental states as relevant to other diagnoses (Skuse 2004). For a diagnosis of ASD, 'cut off scores' must be achieved for five categories. 'Cut off scores' for each category are ≥ 10 for reciprocal social interaction skills, ≥ 1 for social expressiveness, ≥ 8 for use of language and other social communication skills, ≥ 7 for use of gesture and non‐ verbal play, ≥ 3 for repetitive and stereotyped behaviours.

6. GARS: an overall autism quotient (AQ) is established and is then broken down into seven ordinal categories, ranging from a 'Very Low' to a 'Very High' probability of autism. A diagnostic cut‐off score of ≥ 90 specifies that the child is 'probably autistic' (Gilliam 1995; South 2002).

The reference standard results will be either a diagnosis of autism (Childhood autism, Autistic Disorder) or a diagnosis of pervasive developmental disorder (Atypical Autism, PDD‐NOS) or a diagnosis of Asperger Disorder, which will be considered together as ASD. As such, diagnostic categories will be identified and children then assigned to either a diagnosis of 'ASD' or 'not ASD'. Test results will be treated as positive or negative for the cut‐off values of the index tests as described above.

Forest plots showing pairs of sensitivity and specificity, with 95% confidence intervals (CI) will be constructed for each study. The sensitivity and specificity pairs will be visualised in the receiver operator characteristic (ROC) space for each test. Meta‐analyses of pairs of sensitivity and specificity will be calculated using bivariate random‐effects methods (Reitsma 2005). This will enable the calculation of summary estimates while accounting for variation within and between studies and any potential correlation between sensitivity and specificity. For these analyses we will use SAS software.

We will compare with each other the summary sensitivity and specificity of all tests. We will discriminate between direct and indirect comparisons. Direct comparisons are comparisons that have been made in studies that addressed more than one index test in the same participants or in a randomised test accuracy study where participants were randomised to the various index tests and all participants were verified by the same reference standard. For indirect comparisons we will include covariates (indicator terms) for each test (minus one) in the model and analyses will be undertaken of all studies and of the groups of studies that made direct comparisons between tests (see also under 'Sensitivity analyses').

We expect that for each index test the same cut‐off value will have been used, according to the description in the manuals that accompany the various instruments. However, if different cut‐off values have been applied, we will perform the above‐mentioned analyses for subgroups of tests with similar cut‐off points. To compare the accuracy of the various tests, we will construct summary ROC curves for each test with all cut‐offs included (one data point per study). A hierarchical summary receiver operating characteristic (HSROC) model will then be fitted to test for differences in the summary ROCs between the index test(s) (Gatsonis 2006). The pairing of test results within a study will be taken into account. This analysis will be used to assess whether the two tests differ in accuracy and whether any difference in accuracy depends on threshold (that is do the curves differ in shape). The studies will be combined in a HSROC model using a NLMIXED model in SAS. The model will be used to test whether a statistically significant difference exists between the diagnostic accuracy of the various instruments.

Investigations of heterogeneity

Where there is sufficient data, we will investigate heterogeneity by adding covariates to the bivariate model according to the following subgroups:

  • age of study participants (age by year, from age two to age five);

  • severity and type of diagnosis (Autistic Disorder versus other ASD);

  • presence or absence of language delay;

  • inclusion of only high‐risk populations (i.e. those children thought to have intellectual impairment);

  • diagnosis made prospectively versus existing diagnosis;

  • cross‐sectional, cohort, or randomised test accuracy study versus case‐control study;

  • duration between diagnosis and diagnostic test accuracy studies being performed.

Sensitivity analyses

Where there are sufficient data, we will investigate the robustness of the results by comparing subsets of studies that fulfil QUADAS criteria 2, 4, 5, 6, and 12. We have chosen these criteria because in ASD research it is difficult to have a consistent, independent reference standard. To assess the impact of variation in the reference standard on diagnostic test accuracy we will perform sensitivity analyses based on adequacy of the reference standard (QUADAS 2), partial verification (QUADAS 4), differential verification (QUADAS 5), and independence of the reference standard diagnosis of the index test (QUADAS 6). We will also conduct sensitivity analyses based on conflicts of interest (QUADAS 12) because conflicts of interest are common in ASD diagnostic tool research. Finally, we will perform sensitivity analyses of direct comparisons of index tests versus both direct and indirect comparisons.

Table 1. Types of tools

Type of instrument

Name

Parent or carer interview

Face‐to‐face ADI‐R

Face‐to‐face DISCO

Computerised 3di

Questionnaire GARS

Combination of interview and observations of unstructured activity

CARS

Semi‐structured observational assessment

ADOS‐G

Figures and Tables -
Table 1. Types of tools
Table 2. Operationalisation of QUADAS items

Items and guide to classification

1. Representative spectrum. Was the spectrum of patients representative of the patients who will receive the test in practice?

  • Classify as ‘yes’ if the sample consists of children that were consecutively referred for further diagnosis of ASD and the children represent a mixture of conditions (including absence of any condition) that usually are present (for example, autistic disorder, pervasive developmental disorder not otherwise specified, children with developmental disability that is not autism but has some characteristics in common, such as global developmental delay in association with language delay, language delay alone, attachment disorders, ADHD, anxiety disorders).

  • Classify as ‘no’ if only healthy controls are used, or if non‐response is high and selective, or there is clear evidence of selective sampling.

  • Classify as ‘unclear’ if insufficient information is given to make a judgment. 

2. Acceptable reference standard. Was the reference standard likely to classify the target condition correctly?

  • Classify as ‘yes’ if the reference standard consists of a clinical diagnosis of autism or another ASD using a current accepted classification system (DSM‐III, DSM‐III‐R, DSM‐IV, DSM‐IV‐TR, ICD‐9 or ICD‐10) as assigned by an experienced multidisciplinary team (including assessment of social behaviour, language and nonverbal communication, adaptive behaviour, motor skills, atypical behaviours, and cognitive status/intellectual function) and based on information from a clinical assessment and from health professionals involved in the child's care and from those caring for the children in community settings, such as preschool or child care settings.

  • Classify as ‘no’ if the above‐mentioned methods were not used.

  • Classify as ‘unclear’ if insufficient information is given on the reference standard.

3. Acceptable delay between tests. Was the time period between the reference standard and the index test short enough to be reasonably sure that the target condition did not change between the two tests?

  • Classify as ‘yes’ if the time period between index test and the reference standard is 6 months or shorter.

  • Classify as ‘no’ if the time period between index test and the reference standard is longer than 6 months.

  • Classify as ‘unclear’ if there is insufficient information on the time period between index test and reference standard.  

4. Partial verification avoided. Did the whole sample or a random selection of the sample receive verification using the intended reference standard?

  • Classify as ‘yes’ if it is clear that all patients or a random selection of those who received the index test went on to receive a reference standard, even if the reference standard is not the same for all patients.

  • Classify as ‘no’ if not all patients or a random selection of those who received the index test received verification by a reference standard.

  • Classify as ‘unclear’ if insufficient information is provided to assess this item.

5. Differential verification avoided. Did patients receive the same reference standard regardless of the index test result?

  • Classify as ‘yes’ if it is clear that all patients who received the index test are subjected to the same reference standard.

  • Classify as ‘no’ if different reference standards were used.

  • Classify as ‘unclear’ if insufficient information is provided to assess this item.

6. Incorporation avoided. Was the multidisciplinary team assessment which formed the reference diagnosis independent of the index test (i.e. the index test did not form part of the reference standard)?

  • Classify as ‘yes’ if the index is not part of the reference standard.

  • Classify as ‘no’ if the index test is clearly part of the reference standard.

  • Classify as ‘unclear’ if insufficient information is provided to assess this item.

7. Reference standard results blinded. Were the index test results interpreted without knowledge of the results of the reference standard?

  • Classify as ‘yes’ if the results of the index test were interpreted blind to the results of the reference test.

  • Classify as ‘no’ if the assessor of the index test was aware of the results of the reference standard.

  • Classify as ‘unclear’ if insufficient information is given on independent or blind assessment of the index test.

8. Index test results blinded. Were the reference standard results interpreted without knowledge of the results of the index test?

  • Classify as ‘yes’ if the results of the reference standard were interpreted blind to the results of the index test.

  • Classify as ‘no’ if the assessor of the reference standard was aware of the results of the index test.

  • Classify as ‘unclear’ if insufficient information is given on independent or blind assessment of the reference standard.

9. Relevant clinical information. Were the same clinical data available when the index test results were interpreted as would be available when the test is used in practice?

  • Classify as ‘yes’ if only clinical data (for example, speech and language therapy, occupational therapy, developmental or psychology reports that address general assessments that are not specific for autism assessments, information from a doctor, nurse, teacher or allied health professional that lists why autism is of concern) were available in the study that normally would be available when the test results would be interpreted.

  • Classify as ‘no’ if this is not the case, for example, if other test results are available that can not be regarded as part of routine care.

  • Classify as ‘unclear’ if the paper does not explain which clinical information was available at the time of assessment.

10. Uninterpretable results reported. Were uninterpretable/intermediate test results reported?

  • Classify as ‘yes’ if all test results are reported for all patients, including uninterpretable, indeterminate or intermediate results.  

  • Classify as ‘no’ if you think that such results occurred, but have not been reported.

  • Classify as ‘unclear’ if it is unclear whether all results have been reported.

11. Withdrawals explained. Were withdrawals from the study explained?

  • Classify as ‘yes’ if it is clear what happens to all patients who entered the study (all patients are accounted for, preferably in a flow chart) or if the authors explicitly reported the absence of any withdrawals.

  • Classify as ‘no’ if it is clear that not all patients who were entered completed the study (received both index test and reference standard), and not all patients were accounted for.

  • Classify as ‘unclear’ when the paper does not clearly describe whether or not all patients completed all tests, and are included in the analysis.

12. Conflicts of interest avoided. Were conflicts of interest avoided or absent?  

  • Classify as ‘yes’ if the authors/researchers were not involved in the development of the diagnostic instrument.

  • Classify as ‘no’ if the authors/researchers were involved in the development of the diagnostic instrument.

  • Classify as ‘unclear’ if insufficient information is given.

Figures and Tables -
Table 2. Operationalisation of QUADAS items