Goal attainment scaling (GAS) holds promise as an idiographic approach for measuring outcomes of psychosocial interventions in community settings. GAS has been criticized for untested assumptions of scaling level (i.e., interval or ordinal), inter-individual equivalence and comparability, and reliability of coding across different behavioral observation methods. We tested assumptions of equality between GAS descriptions for outcome measurement in a randomized trial (i.e., measurability, equidistance, level of difficulty, comparability of behavior samples collected from teachers vs. researchers and live vs. videotape). Results suggest GAS descriptions can be evaluated for equivalency, that teacher collected behavior samples are representative, and that varied sources of behavior samples can be reliably coded. GAS is a promising measurement approach. Recommendations are provided to ensure methodological quality.