Introduction
Despite the fact that much attention has been paid to children with autistic spectrum disorders (ASD) globally, relatively few studies have been conducted in different Chinese contexts. In a recent review on the prevalence of autism in mainland China, Hong Kong and Taiwan, Sun et al. (
2013) criticized that available studies in different Chinese contexts “have methodological weaknesses” and “the results lack comparability with those from developed countries” (p. 1). Their meta-analytic findings also suggested a potential under-diagnosis and under-detection of ASD in Chinese communities and argued for the need to use more advanced methods for research of ASD (Sun et al.
2013). With specific reference to Hong Kong, it was not until the early 1990s that public awareness of autism began to increase. More and more parents started to call for government attention and resources to help their autistic children. As a result, local services aiming at children with ASD have been gradually grown and the cultural acceptance of ASD has been improved in Hong Kong society (Wong and Hui
2008a).
One milestone in the development of service for children with ASD in Hong Kong was the introduction of the renowned Treatment and Education of Autistic and related Communication handicapped CHildren (TEACCH) program by the Heep Hong Society in 1993. The TEACCH program adopts structured teaching strategies to facilitate learning and skills-building in children with ASD and to reduce their disruptive behavior (Schopler
1997). To accurately assess the development of children with pervasive developmental disorders and design individualized training plans, the TEACCH division in the North Carolina University (Schopler et al.
1990) developed a revised instrument called the Psychoeducational Profile-Revised (PEP-R). The PEP-R provides a useful framework for researchers and practitioners to formulate suitable education plan and ongoing evaluation of autistic children. The Heep Hong Society also translated this instrument into Chinese and conducted a validation study to examine the psychometric properties of the Chinese version of the PEP-R (CPEP-R; Shek et al.
2005).
Based on a sample of 63 preschool children with symptoms of ASD in Hong Kong, Shek et al. (
2005) found that different domains of CPEP-R had very good reliability in terms of internal consistency (Cronbach’s alpha ranged from 0.74 to 0.98), inter-rater reliability (intra-class correlation coefficients ranged from 0.84 to 0.87) and test–retest reliability (Pearson correlation coefficients ranged from 0.76 to 0.92). It was also reported that the CPEP-R scores were significantly correlated with the Merrill-Palmer Scale of Mental Tests (Stutsman
1948) and the Hong Kong Based Adaptive Behavior Scale (Kwok et al.
1989). These observations clearly provided support for the concurrent validity of the instrument. In the past years, the Chinese version of PEP-R has been widely used to assess the cognitive ability, social adaptive functioning, and developmental abilities in children with ASD in Hong Kong. Besides, it has been used by practitioners as an outcome measure when evaluating the effectiveness of educational programs for children with ASD.
In 2005, Schopler et al. further revised the PEP-R into a more comprehensive version—the Psycho-Educational Profile-3rd edition (PEP-3) for children with ASD whose developmental age is from 6 months to 7 years. Compared to the PEP-R, the PEP-3 has more concrete and interesting materials, limited verbal demands, and untimed administration process. Besides, the language items were separated from the general items (Chen et al.
2011; Schopler et al.
2005). According to Schopler et al. (
2005), the PEP-3 is a reliable and valid instrument which has the potential to assess and monitor the development of children with ASD in a more accurate and comprehensive way. Based on a sample of children with developmental disorders in the United States, Schopler et al. (
2005) reported good internal consistency, test–retest reliability, and inter-rater reliability for the PEP-3. The high correlations between PEP-3 and other measures assessing similar developmental constructs were also reported, providing support for the validity of the instrument. However, except for the findings based on the validation study reported in the PEP-3 manual, there are few publications on the psychometric properties of the PEP-3.
Among the limited studies, Fulton and D’Entremont (
2013) examined the ability of the PEP-3 in estimating cognitive and language skills of 136 children with ASD (aged 20–75 months) in Canada. Positive correlations were found between the PEP-3 cognitive and language measures and similar measures including the Child Development Inventory (Ireton
1992), the Merrill-Palmer Revised Developmental Scale (Roid and Sampers
2004), and the Vineland Adaptive Behavior Scale-2 (Sparrow et al.
2005). Significant differences in performances on PEP-3 cognitive and language measures were detected among three diagnostic groups of children with ASD, Asperger’s disorder, or pervasive developmental disorders. These findings provided support for the psychometric properties of the subtests of PEP-3 as an assessment tool measuring cognitive and language skills in children. Nonetheless, the reliability and validity of the subtests focusing on maladaptive behaviors (e.g., social reciprocal, affective expression, characteristic motor behavior, and characteristic verbal behavior) were not investigated in Fulton and D’Entremont’s (
2013) study.
In Taiwan, a group of researchers translated the PEP-3 into Mandarin Chinese and administered it in a sample of 63 children with ASD. While the reliability and validity of the Caregiver Report of the PEP-3 were supported (Fu et al.
2010,
2012), psychometric properties of the major part of PEP-3 (i.e., the Performance test) remain largely unknown probably because of the small sample of the study. Chen et al. (
2011) reported good sensitivity of the Performance test, i.e., the ability of the measure to detect change over time and in response to an intervention (Guyatt et al.
1992), which is the only available psychometric paper on the Performance test in Chinese children. As such, the psychometric properties of the PEP-3 for the assessment of Chinese children with ASD need to be further demonstrated.
Against this background, researchers in Hong Kong translated the PEP-3 into Cantonese Chinese and conducted a validation study based on a large sample of autistic children and a comparison group of normal children in Hong Kong. Shek and Yu (
2013) reported that the PEP-3 performance test showed good psychometric properties in terms of internal consistency, test–retest reliability, inter-rater reliability, content validity, and concurrent validity. While these results lent support for the reliable and valid use of CPEP-3 in Chinese population, the construct validity of the instrument was not examined. As such, the present study attempted to investigate the construct validity of the CPEP-3.
Construct validity refers to the extent to which an instrument measures the construct it claims to be measuring or the degree to which the underlying traits of the test can be identified (Anastasi and Urbina
1997). If a test lacks construct validity, results obtained by this measure will not be interpretable. Therefore, construct validity should be considered at the heart of any study when researchers use an instrument to measure a construct that is not directly observable (Cronbach and Meehl
1955). To accumulate sound evidence for the psychometric properties of a measure, construct validity must be established. According to Singleton and Straits (
1999), “evidence of construct validity consists of any empirical data that support the claim that a given operational definition measures a certain concept.” (p. 124) Four common types of evidence have been highlighted to establish construct validity, which include a) correlations with related variables (i.e., convergent validity); b) consistency across measures and methods of measurement (i.e., external validity); c) correlations with unrelated variables (i.e., discriminant validity); and d) differences between contrasted groups (i.e., contrasted groups validity). Some researchers also suggest factorial validity (i.e., the extent to which the data conform to the hypothesized dimensions of the measure) as a form of construct validity (Dooley
1990). The present study aimed to examine the construct validity of the CPEP-3 in terms of three aspects: (a) correlations with related variables and unrelated variables; (b) differences between contrasted groups, and (c) factorial validity.
Specifically, six hypotheses regarding four types of validity evidence were proposed and tested. The first two hypotheses were posited to provide evidence for the correlations between CPEP-3 subtests and related variables as well as unrelated variables. Primarily, different subtests were assumed to have different relationships with participants’ age. As seven subtests were designed to measure developmental skills (i.e., cognitive verbal/preverbal, expressive language, receptive language, fine motor, gross motor, visual-motor imitation, and personal self-care), it was hypothesized that the scores would be correlated with participants’ age. In other words, older children were assumed to have higher scores on these subtests than did younger children (Hypothesis 1a). On the other hand, six subtests measuring maladaptive behaviors, including affective expression, social reciprocity, characteristic motor behaviors, characteristic verbal behaviors, problem behavior, and adaptive behaviors, should be weakly correlated with age (Hypothesis 1b). Schumm et al. (
1986) suggested that a relevant correlation coefficient with a magnitude of at least 0.4 would be needed to establish convergent validity whereas a related correlation coefficient of 0.3 or less would provide evidence for the discriminant validity of the test. These criteria were adopted in the present study for the first two hypotheses testing.
In addition to age, it was hypothesized that gender would not be related to CPEP-3 subtests (Hypothesis 1c) in the sample of autistic children. It should be noted that although some researchers reported that girls appeared to have more severe autism than did boys, the findings are inconsistent and no strong evidence suggests that autistic boys tend to be higher functioning than autistic girls. Besides we advanced this hypothesis based on the hypothesis described in the PEP-3 manual.
Second, to examine differences between contrasted groups, one hypothesis was proposed. Since the CPEP-3 was devised to assess the characteristics of children with autistic disorders, it was hypothesized that autistic children would score lower than typically developing children (Hypothesis 2) on the ten Performance subtests.
Third, to test the factorial validity of CPEP-3, another two hypotheses were examined. Because different subtests of CPEP-3 measure different aspects of development and behaviors, they were expected to be moderately correlated with each other (Hypothesis 3). Besides, it was theoretically suggested that the ten Performance subtests would contribute to three domains (communication, motor skills, and maladaptive behavior), which reflect autistic children’s overall development in communication functions, motor skills, and presence of maladaptive behaviors, respectively. Particularly, cognitive verbal/preverbal, expressive language, receptive language would load on the factor “communication”; fine motor, gross motor, visual-motor imitation would load on the factor of motor, and affective expression, social reciprocity, characteristic motor behaviors, and characteristic verbal behaviors would contribute to the factor of maladaptive behavior. The factor structure of the three domains relating to their respective subtests should be supported by confirmatory factor analysis (Hypothesis 4).
In addition, although internal consistency is typically employed as an index of reliability, there are views considering internal consistency, a measure of the inter-relatedness of the items within a test, as an indicator to confirm whether or not a group of items are measuring the same construct/concept (Cortina
1993). Some researchers (Tavakol and Dennick
2011) proposed that internal consistency (e.g., Cronbach’s alpha) adds “validity and accuracy to the interpretation of their data” (p. 55). In the original test manual, internal consistency is regarded as an additional evidence of construct validity. Therefore, we also examined internal consistency for each subtest to provide further evidence for the construct of CPEP-3. It was expected that the construct validity of the CPEP-3 could be established with evidence obtained by testing the above hypotheses.
Discussion
Children with ASD often display various types of symptoms which make it essential to develop psychometrically sound assessment that can both effectively capture autistic children’s characteristic behaviors and accurately identify their developmental strengths and weaknesses. While it is convenient to translate and adapt well-developed instruments on ASD in different populations, its cross-cultural applicability must be carefully examined. In fact, studies have shown that Chinese translated scales did not show the original dimensions embedded in the original English version (Shek
1998,
2001,
2002). Hence, there is a strong need to validate translated measures in different Chinese contexts.
The present study attempted to examine the construct validity of the Chinese PEP-3. There are several lines of evidence supporting its construct validity. First, consistent with our prediction that children’s cognitive and motor functioning develops with age while maladaptive behaviors would be less related to age (Greenspan and Wieder
1997), significant correlations with age were detected in seven subtests of CPEP-3 that measure developmental skills, including cognitive verbal/preverbal, expressive language, receptive language, fine motor, gross motor, visual-motor imitation, and personal self-care in both the autistic sample and the normal sample, with older children scored higher than younger children in these areas. The findings give support to Hypothesis 1a.
On the other hand, the correlation coefficients between age and six subtests assessing behaviors (i.e., affective expression, social reciprocity, characteristic motor behaviors, characteristic verbal behaviors, problem behavior, and adaptive behavior) were relatively weak among which social reciprocity (r = 0.38) and characteristic verbal behaviors (r = 0.36) had the highest correlations with age. Although a lack of give-and-take of social interaction and appropriate verbal behaviors are typical features of ASD, it is possible that children’s ability in reading social cues and perspectives of others can be improved as they grow older and receive more home-based training from their caregivers (Sheinkopf and Siegel
1998). This may explain the age difference in social reciprocity and characteristic verbal behavior. Generally, these findings provide support to Hypothesis 1b.
Furthermore, consistent with the hypothesis, overall gender differences were non-significant using Bonferroni correction within the autistic children. Further analyses showed that gender differences were non-significant for all subtests except one subtest measuring CVP. These findings basically support the construct validity of CPEP-3 (i.e., Hypothesis 1c). Nevertheless, gender difference found in CVP is an interesting finding which deserves further discussion. Autistic boys showed better performance than did autistic girls in problem solving, verbal naming, sequencing and visual-motor integration, as assessed by CVP. Furthermore, despite the non-significant gender difference, there seems to be a tendency that boys displayed higher level of functioning than did girls in other aspects. Does it mean that autistic boys generally had better developmental level than autistic girls? In fact, similar findings were reported by previous researchers. For example, Wing (
1981) found that among people with high-functioning autism, the male to female ratio was about 15:1. On the other hand, in children with low-functioning ASD there were only twice as many boys as girls. This appears to suggest that although girls are less likely to develop ASD, they have more severe problems when they do. Some researchers speculated that boys are more noticeably different or disruptive than girls with the same underlying deficits, whereas girls with high functioning ASD may be better at hiding their difficulties in order to fit in with their peers. As a result, only when girls displayed severe ASD related problems, they are referred for diagnosis, and thus in available statistics girls with ASD seem to be more severely impaired (Attwood
2000; Ehlers and Gillberg
1993; Wing
1981). These may partially explain the gender difference in CVP, while further studies are needed to confirm these hypothesized reasons.
The present study also examined the ability of the CPEP-3 in differentiating children with ASD and their normally developing peers. As predicted, children in the normal group performed better than the autistic group in all 10 subtests they completed. The findings support Hypothesis 2. The findings also echo the results reported by Schopler et al. (
2005) on samples of children in the United States and provide evidence for the validity of CPEP-3.
Third, as different subtests of CPEP-3 were designed to measure different developmental and behavioral aspects in children with ASD, it was expected that moderate to large correlations would exist among the subtests. This hypothesis was supported by the present findings (i.e., Hypothesis 3). Fourth, according to the PEP-3 developer (Schopler et al.
2005), the ten subtests of Performance test are theoretically categorized into three composites: communication, motor and maladaptive behaviors. Whether such a three-factor model also applies to Chinese children needs to be tested. The results of confirmatory factor analysis in this study demonstrated a satisfying model fit to the current data based on Chinese children with ASD, suggesting that dividing the Performance test into three dimensions is meaningful when it is used in Chinese population. Hence, the findings supported Hypothesis 4.
Finally, the internal structure of each subtest was found to be homogenous in the present study as reflected in the high item-total correlation coefficients. This indicates that items under each subtest are measuring the general quality that they were designed to measure. Despite the fact that internal consistency is the most widely used measure of reliability, it helps researchers to understand the construct of a scale/subscale by examining the relationships between the item response and total score of the subtest and showing whether the items included in a test/subtest are really complementary and related. In this sense, internal consistency supplements our understanding of validity (Tavakol and Dennick
2011). Altogether, the above findings can be regarded as sound evidence for the construct validity of the CPEP-3.
It is noteworthy that the present study is the first scientific study that investigated the construct validity of the Chinese version of PEP-3 (CPEP-3) on a large sample of children with and without ASD in Hong Kong. In conjunction with the previous validation findings on the reliability, content validity, and concurrent validity of CPEP-3 (Shek and Yu
2013), the present study supports the cross-cultural applicability of this instrument for children with ASD in Chinese contexts.
However, several limitations of this study should be acknowledged. First, while the general sample size was reasonably large, the number of children with age ranging from 7 to 7.9 years was limited (six autistic children and two normal children). This may cause biased and uninterpretable results of the analyses for this age group. While the PEP-3 was developed for children with ASD with a developmental age between 6 months and 7 years, more autistic children above the age of 7 years with low functioning should be included in future studies. Second, the present study was conducted in Hong Kong where the medium of instruction was usually Cantonese. To further generalize the present finding, similar studies must be carried out in other Chinese contexts, including both Mandarin-speaking and Cantonese-speaking communities. Third, while the present study compared children with and without ASD on various CPEP-3 subtests, it would be meaningful to further investigate whether and to what extent the instrument can reflect the developmental differences between high-functioning and low-functioning autistic children. In future study, the severity of ASD for each autistic participant should be rated to make further comparison. Finally, while the factorial validity of the CPEP-3 Performance test was supported by the CFA results, it would be ideal if factorial invariance of the instrument could be examined across different cultural groups, such as autistic children in Hong Kong and in the United States. In this way, knowledge about whether the scores of the instrument could be compared cross-culturally can be accumulated.
Cronbach and Meehl (
1955) proposed that construct validity is important for every psychological test, which shall be evaluated by integrating evidence collected from different sources. Although it is impossible for researchers to examine all testable hypotheses related to construct validity, the more strategies used to demonstrate the validity of a test with convincing evidence, the more confidence test users would have in the construct validity of the test. Despite the limitations, the present study can be regarded as a useful contribution for the research and service of autistic children. With reference to four different aspects of construct validity, the present study provided good support for the construct validity of the Chinese version of Psycho-Educational Profile 3rd edition (CPEP-3) by giving a convincing set of validity arguments derived from the results.
There are both theoretical and practical implications of the present study. Theoretically, the findings provide support for the use of CPEP-3 in measuring autistic children in the Chinese context and add to the limited literature on validated instruments for Chinese children with ASD. Practically, this study suggests that the CPEP-3 would serve as a credible and valid measure for professionals to better assess and monitor the development of children with ASD in Hong Kong and other Chinese communities. This would further assist researchers to plan and develop individualized educational programs/projects according to children’s different developmental level.