We investigated attention, encoding and processing of social aspects of complex photographic scenes. Twenty-four high-functioning adolescents (aged 11–16) with ASD and 24 typically developing matched control participants viewed and then described a series of scenes, each containing a person. Analyses of eye movements and verbal descriptions provided converging evidence that both groups displayed general interest in the person in each scene but the salience of the person was reduced for the ASD participants. Nevertheless, the verbal descriptions revealed that participants with ASD frequently processed the observed person’s emotion or mental state without prompting. They also often mentioned eye-gaze direction, and there was evidence from eye movements and verbal descriptions that gaze was followed accurately. The combination of evidence from eye movements and verbal descriptions provides a rich insight into the way stimuli are processed overall. The merits of using these methods within the same paradigm are discussed.