The aim of this study is to standardize Autism Diagnostic Observation Schedule (ADOS) scores within a large sample to approximate an autism severity metric. Using a dataset of 1,415 individuals aged 2–16 years with autism spectrum disorders (ASD) or nonspectrum diagnoses, a subset of 1,807 assessments from 1,118 individuals with ASD were divided into narrow age and language cells. Within each cell, severity scores were based on percentiles of raw totals corresponding to each ADOS diagnostic classification. Calibrated severity scores had more uniform distributions across developmental groups and were less influenced by participant demographics than raw totals. This metric should be useful in comparing assessments across modules and time, and identifying trajectories of autism severity for clinical, genetic, and neurobiological research.