Introduction

Multi-method as well as multi-informant assessments of anxiety that include the child, mother, and preferably the father and/or teacher of the child, are considered best practice for both research and clinical contexts (Silverman and Ollendick 2008). However, substantial discrepancies between parent and child reports of children’s anxiety challenge clinical decision-making. The lack of agreement between children and parents has received considerable attention in the literature. In particular, the correspondence between ratings of different informants has been estimated as low-to-moderate for internalizing (r = .25) and externalizing (r = .30) problems (De Los Reyes et al. 2015). High discrepancies have been observed particularly for anxiety, regardless of whether assessments were collected through rating scales (Wren et al. 2004), behavioral observations (Ollendick and Hersen 1993), or diagnostic interviews (Cosi et al. 2010).

In a comprehensive review, De Los Reyes and Kazdin (2005) summarized findings from studies evaluating child characteristics (e.g., age, gender, ethnicity), parent characteristics (e.g., psychopathology, stress), and family characteristics (e.g., marital status) in relation to informant discrepancies and concluded that findings were largely inconsistent. According to De Los Reyes and Kazdin, limitations of these studies included inconsistent measurement of informant discrepancies and the lack of a theoretical framework. Difference score models have been most often used to examine discrepancies, but they have been criticized on statistical grounds (Edwards 2002). Although it has been demonstrated that difference score models provide distinct but equivalent information to regression models (Laird and Weems 2011), polynomial regression analyses with interaction terms have been recommended for future studies on informant discrepancies (Laird and De Los Reyes 2013). Furthermore, De Los Reyes and Kazdin advanced a theoretical framework, the Attribution Bias Context (ABC) Model, and proposed that discrepancies occur due to the unique perspective of each informant and the attributions they make about the problems of the child.

Despite this conceptual model that explains why discrepancies exist and the extensive research conducted on this topic, interpretation of informant discrepancies remains challenging. Clinicians who are confronted with inconsistent parent–child reports are inclined to base their diagnosis on the information provided by the parent (Dirks et al. 2012). A child is more likely to be considered an unreliable informant when their parents report a greater number of problems compared to the child (De Los Reyes et al. 2011). In contrast, when children report more problems than their parents, parents are not likely to be considered unreliable. Children are often considered less reliable due to their limited cognitive and social-emotional development (Zeman et al. 2007) and their tendency to respond in a socially desirable manner (Comer and Kendall 2004). However, no empirical evidence exists showing parents to be more accurate reporters.

Recently, a new perspective on informant discrepancies, the operations triad model (OTM), has been proposed to improve empirical research on informant discrepancies to arrive at more meaningful conclusions and potential guidelines (De Los Reyes et al. 2013). According to the model, Converging Operations are different ways of observing or examining the same behavior that point to the same conclusion. Diverging Operations, in contrast, suggest that discrepancies among informants are due to different people observing the same behavior in different ways and these differences reflect meaningful variation. While previous literature relied mostly on Converging Operations, researchers who anticipate diverging findings are encouraged to form meaningful hypotheses regarding these discrepancies. Most evidence for the OTM comes from literature on externalizing problems and further studies on internalizing problems have been called for (De Los Reyes et al. 2015). Extracting meaningful information from converging and diverging reports may be particularly essential for the study of, and clinical practice with, childhood anxiety. Compared to more directly observable problems such as overt aggression and hyperactivity (Achenbach et al. 1987), informants are more likely to disagree when problems are less overt, as is the case with anxiety. Thus, the primary aim of the current study was to obtain meaningful information about informant discrepancies regarding childhood anxiety that could potentially provide research and clinical practice with helpful guidelines.

Literature on parent–child discrepancies regarding childhood anxiety indicates that in general, children report higher levels of intensity, frequency, and severity of anxiety symptoms compared to parents (Cosi et al. 2010), although there are some exceptions found in clinical samples (Krain and Kendall 2000). Scholars have hypothesized that in clinical samples, parents tend to report more symptoms due to biases related to seeking treatment. In clinical samples, parents are particularly more likely to report symptoms of generalized anxiety disorder and social anxiety disorder whereas children are more likely to report symptoms of separation anxiety disorder (Choudhury et al. 2003; Reuterskiöld et al. 2008). Furthermore, parents and children are more likely to agree on diagnostic symptoms that are concrete and observable, such as behavioral avoidance compared to worry, and on symptoms that occur in the home environment compared to school settings (Comer and Kendall 2004). Thus, one might expect less discrepant reports on subtypes of anxiety disorders that are characterized by more observable components, such as separation anxiety and social anxiety, and more discrepant reports on subtypes of anxiety disorders that are characterized by less observable symptoms, such as generalized anxiety and panic disorder. However, empirical research regarding the symptoms of which subtype of anxiety disorder is associated with better or worse parent–child agreement is scarce and the findings are inconclusive.

Some studies have found the highest agreement for separation anxiety disorder (Becker et al. 2016; Brown-Jacobsen et al. 2011), whereas others have found the highest agreement for specific phobia (Pereira et al. 2015; Reuterskiöld et al. 2008) or for generalized anxiety disorder (Stevanovic et al. 2012). Moreover, some studies have found the lowest agreement for generalized anxiety disorder (Brown-Jacobsen et al. 2011; Weems et al. 2011), while others have found the lowest agreement for social anxiety disorder (Reuterskiöld et al. 2008; Stevanovic et al. 2012). Interpretation of these findings is complex due to a number of factors. The parent–child agreement for the various subtypes of anxiety disorders ranged significantly (i.e., Kappa coefficients ranged from poor to excellent), the various samples that have been studied are not comparable (i.e., ranging from clinically referred children with a specific phobia or substance abuse, to children diagnosed with epilepsy, and children recruited from the community), the age groups varied from middle childhood to adolescence, the assessment methods varied from diagnostic interviews to screening questionnaires, and the data analyses ranged from difference score models to regression models. As a result of these variations in research designs and methods, it is impossible to reach a reasonable conclusion about these anxiety measurement discrepancies.

The purpose of the current study was to extend the literature on parent–child discrepancies regarding childhood anxiety in two key ways. First, the present study examined parent–child discrepancies at anxiety subtype level in the context of children’s mental health agencies with clinically referred children diagnosed with an anxiety disorder. We were particularly keen on using a clinically referred sample, with measures of anxiety subtypes, in the context of real-world mental health agencies (as opposed to academic contexts) because we aimed to make the discrepancy results directly relevant to clinicians and their everyday practice. For example, when low parent–child agreement for a particular subtype of anxiety disorder is expected, clinicians can decide prior to the examination to include other informants, such as a teacher, or to include other measures, such as behavioral observation of the child.

Second, the current study examined parent–child discrepancies at an anxiety subtype level in relation to independent observer ratings of behavioral anxiety. Prior literature suggests that parent–child agreement at the level of subtypes of anxiety might be explained by the severity of symptoms and whether they are noticeable to the informant. Using behavioral observations of anxiety, we could test whether differences in agreement at the subtype level can be explained by the extent to which symptoms can be observed. That is, the present study examined whether observed anxiety was more strongly related to subtypes of anxiety problems that are characterized by more observable symptoms (i.e., separation anxiety) and less strongly related to subtypes of anxiety problems that are characterized by less observable symptoms (i.e., generalized anxiety).

Although the importance of studying parent–child disagreements in relation to behavioral observations of anxiety has been recommended numerous times in the past (e.g., Muris 2007; Weems et al. 2011), few such studies exist. Exceptions are studies that have examined the expectations of children and parents about children’s anxiety compared with behavioral observations. For example, prior work indicates that children are better at predicting their anxious response to a fearful situation than parents (Cobham and Rapee 1999; DiBartolo and Grills 2006). Furthermore, there are studies that have examined reports of children and parents in relation to physiological measurement of fear. For example, Weems et al. (2005) found that only child reports of anxiety were related to the children’s heart rate response to a scary stimulus. None of these studies took anxiety subtypes into consideration.

The current study examined subtypes of anxiety in middle childhood. Disagreement in this age range in subtypes are particularly important to study because agreement between parents and children is lower in middle childhood compared to adolescence and clinicians are more inclined to prioritize the information of the parent over the child during middle childhood compared to adolescence (Dirks et al. 2012; Grills and Ollendick 2003). In general, rating scales are administered to children from the age of seven, when children are able to read and believed to be able to adequately reflect on their emotional states (Muris et al. 2007). Behavioral observations are most often used to assess anxiety in studies with primarily young, preschool children from whom self-reports are more difficult to obtain (e.g., Mian et al. 2015). Thus, behavioral observations can provide useful clinical information in middle childhood, such as how far anxious children dare to go in exposure tasks, but are rarely used in this age range (Silverman and Ollendick 2005).

Design and Hypotheses

The current study aimed to (a) obtain meaningful clinical information from informant discrepancies regarding childhood anxiety by examining parent–child discrepancies at the level of anxiety subtypes; (b) examine parent–child discrepancies in relation to independent observer ratings of behavioral anxiety; and (c) do so in the context of “real world” clinical mental health agencies. We collected data in the context of several community mental health agencies with clinically referred children diagnosed with an anxiety disorder. Anxiety rating scales were administered to children and mothers and mother–child dyads were observed by independent raters during an anxiety-provoking situation.

Several hypotheses were put forward. First, we expected considerable discrepancy between reports from mothers and children about the child’s anxiety and we expected higher levels of anxiety reported by mothers than by children, given past research with clinical samples. Second, we expected that the level of agreement between mothers and children would vary between anxiety subtypes with higher levels of agreement for subtypes of anxiety problems that are characterized by more observable symptoms (i.e., separation anxiety and social anxiety) and lower levels of agreement on subtypes of anxiety problems that are characterized by less observable symptoms (i.e., generalized anxiety and panic disorder). Third, we hypothesized that observed (behavioral) anxiety would be more strongly related to subtypes of anxiety problems that are characterized by more observable symptoms and less strongly related to subtypes of anxiety problems that are characterized by less observable symptoms, given that independent observers would detect the more noticeable symptoms of anxiety. Last, we explored how agreement and disagreement between mother and child regarding the child’s anxiety symptoms was related to observed anxiety. Although these latter analyses were considered exploratory, we expected that the situation in which mothers reported high levels of anxiety symptoms (especially for anxiety problems with more observable symptoms) and children reported low levels of anxiety symptoms would be more strongly correlated with observed anxiety.

Materials and Methods

Participants

This study was part of a larger effectiveness trial examining CBT in anxious children (Jansen et al. 2012). Overall, 79 dyads were recruited from three mental health agencies for children in the Netherlands. At intake, mother and child were asked to complete the screen for child anxiety related emotional disorders (SCARED; Birmaher et al. 1999) to assess the child’s level of anxiety. If the child’s or mother’s total SCARED score or one of the following subscales: generalized anxiety, social anxiety, separation anxiety, or panic disorder fell in the ‘high’ or ‘at risk’ category, eligibility for participation was further examined by experienced agency clinicians. Inclusion criteria were a DSM-IV anxiety disorder and exclusion criteria were a primary diagnosis of posttraumatic stress disorder, autism spectrum disorder, specific phobia, obsessive–compulsive disorder, an IQ below 80, and the need for immediate intervention to prevent the child or the family from harm (e.g., suicidal intentions). Children meeting the exclusion criteria required a treatment approach that was not offered in the effectiveness trial. The children ranged in age from 7 to 13 years (M = 10.10, SD = 1.32), and 66% (n = 52) were girls. Most children (83%, n = 66) resided in intact families, 10% (n = 8) lived in single-parent (exclusively maternal) households, and 7% (n = 6) in blended families. Most children were of Dutch origin (98%, n = 77), and 2% (n = 2) had another nationality (e.g., Moroccan, Ethiopian). The mothers ranged in age from 35 to 54 years (M = 43.43, SD = 4.84).

Procedure

The study was approved by the Ethic Committee of Radboud University’s faculty of Social Sciences. Families meeting inclusion criteria and agreeing participation, signed informed consent. They were reassured that their refusal to participate would not affect their treatment. Prior to treatment, a research assistant visited mother and child at home. Previous studies assessing observed anxiety in middle childhood used anxiety-provoking tasks, such as reading aloud, conversing with a peer, talking in front of a camera, and looking at fearful images (Beidel et al. 2000; Kendall 1994; Kendall et al. 1997; Turner and Romanczyk 2012). These tasks are specifically designed to elicit fears related to social phobia or specific phobia, neglecting the heterogeneity of anxiety disorders in middle childhood. In the present study, we included children with different subtypes of childhood anxiety disorders (i.e., social anxiety, generalized anxiety, separation anxiety, and panic disorder). Therefore, we constructed a more general anxiety-provoking task suitable for children with various subtypes of anxiety. Separately, child and mother completed a questionnaire describing 18 most common anxiety-provoking situations (e.g., being home alone, giving a speech). Each item was rated on a 3-point rating scale assessing the degree to which the child would have felt anxious about the event in the next week. The research assistant chose the item that both mother and child rated the highest as the topic of a 5-min discussion. When mother and child rated different events as the highest, the research assistant chose the item that mother and child both agreed on. After providing the instructions of having a discussion regarding this topic in front of the camera, the research assistant left the room. The anxiety-provoking task was recorded on a digital video camera. Only the data from pre-treatment assessments were used in the current study.

Measures

Screen for Child Anxiety Related Emotional Disorders (SCARED)

Children’s anxiety symptoms were measured using the SCARED (Birmaher et al. 1999; Muris et al. 2007). The SCARED has a child self-report (C) and parent-report (P) version, which both consist of 69 identical items. Mother and child were asked to rate each item on a 3-point scale ranging from 0 (never or almost never) to 2 (often). The psychometric properties of the SCARED have been well established (Muris et al. 2007). The questionnaire generates a Total score and scores on panic disorder, generalized anxiety disorder, separation anxiety disorder, social phobia, obsessive–compulsive disorder, posttraumatic stress disorder, and specific phobia (animal, medical, situational) subscales. Norm scores are available only for child self-report, for boys and girls separately (see Muris et al. 2007). Each scale provides a low, normal, high, or at risk score. In the current study, the reliability of the SCARED was excellent for child self-report (Cronbach’s α = .91) and good for mother report (Cronbach’s α = .87).

Behavioral Observations

Children’s observed anxiety was measured using a modified version of Kendall and colleagues’ coding system (1994, 1997). In Kendall’s coding system, independent observers rated children’s anxiety with seven observational codes; gratuitous body movements, gratuitous verbalizations, avoiding task, absence of eye contact, fingers in mouth, anxious voice, and body rigidity. These observational codes were based on the Preschool Observation Scale of Anxiety (POSA; Glennon and Weisz 1978) and were modified for use with middle-aged children. The current study included six of the seven observational codes from Kendall and colleagues. The observational code Gratuitous verbalizations (e.g., stating to want to leave, stating a dislike for the task, physical complaint) overlapped with the observational code Avoiding task. Therefore, both observational codes were combined into one code named Avoiding task. In addition, fearful facial expression was included in our coding system as an observational code while Kendall and colleagues included it as an additional rating scale. The physical cues of the specific affect (SPAFF) coding system were incorporated for fearful facial expression (Coan and Gottman 2007; Gottman et al. 1995). With these observational codes we planned to observe general signs of anxiety, as well as signs of panic disorder symptoms. However, signs of social anxiety symptoms and separation anxiety symptoms were not well reflected in these seven observational codes. Therefore, we added two observational codes; Shame and Proximity to mother. Shame was considered distinctive for socially anxious middle-aged children (DeKleyen and Greenberg 2008), whereas proximity to mother was considered distinctive for separation anxious middle-aged children (Muris et al. 2015).

Thus, the current coding system contained nine observational codes: Gratuitous body movements (e.g., shaking hands or legs, rocking body, fiddling); Avoiding task (e.g., not talking, changing subject, leaving the room); Absence of eye contact (e.g., not looking at mother during task); Fingers in mouth (e.g., touching lips, biting fingernails); Anxious voice (e.g., stuttering, whispering, giggling); Body rigidity (e.g., clenched fists, folded arms, unusually stiffness of body parts); Fearful facial expression (e.g., raising eyebrows, crying); Shame (e.g., stating to experience shame, blushing, hiding), and Proximity to mother (e.g., sitting in mother’s lap, holding hands). Each observational code was rated on a 5-point scale ranging from 1 (not at all) to 5 (very much) by research assistants. An experienced coding supervisor trained three research assistants with a bachelor’s degree in educational sciences over the course of 4 weeks until reaching an intraclass correlation coefficients (ICC) of .70. During the training, the coding manual and example files were reviewed, practice files were assigned, and calibration meetings were organized. Following the training, the research assistants coded the videotaped anxiety-provoking task and rated each observational code once at the end of the 5-min discussion. Weekly follow-up meetings were organized to minimize coder drift, and 25% of the videos were double coded.

Statistical Analyses

First, we computed the correlations among SCARED-C, SCARED-P, and behavioral observations. Fisher’s r to z transformation was used to examine differences in correlations between subscales. Then, we analyzed patterns of agreement and disagreement between SCARED-C and P. Following the recommendations of Edwards (2002) and Shanock et al. (2010), a score was considered discrepant when the standardized score of the SCARED-C was half a standard deviation above or below the standardized score of the SCARED-P. A paired t test was used to compare means between SCARED-C and P. Next, to analyze whether SCARED-C and P were related to behavioral observations, a regression model with the main effects of SCARED-C and P as predictors was tested. Last, polynomial regression with response surface modeling was used to assess whether agreement and discrepancy between SCARED-C and P were related to behavioral observations. Polynomial regression combined with response surface modeling was used to estimate the effects of agreement between two predictors and the size and direction of disagreement between two predictors and outcome. We followed the recommended procedures from previous papers (Edwards 2002; Laird and De Los Reyes 2013; Shanock et al. 2010). First, we analyzed the slope and curvature along the line of perfect agreement. The slope of this line captures the effect of agreement between SCARED-C and P on behavioral observations. The curvature of this line indicates whether the relationship is linear or nonlinear. Second, we assessed the line of incongruence when SCARED-C is not equal to SCARED-P. The slope of the line of incongruence presents the direction of the difference between SCARED-C and P in behavioral observations (i.e., the difference on behavioral observations when SCARED-C is higher or lower than SCARED-P) while the curvature of this line shows the influence of the degree of discrepancy between SCARED-C and P on behavioral observations. Since gender differences were found in previous studies, including ones that used the SCARED (Muris et al. 2007), the analyses were also carried out for boys and girls separately.

Results

Descriptive Statistics

The means and percentages of mother and child report of the SCARED are presented in Fig. 1 and Table 1. The children who scored in the clinical range on the SCARED-C can be found in Table 1. Table 2 lists means and interrater reliability statistics for all observational codes. The means show that Gratuitous body movements, Proximity to mother, and Absence of eye contact were often observed among the children while Avoiding task, Fingers in mouth, and anxious voice were observed occasionally. Fearful facial expression, Shame, and Body rigidity were rarely observed during the anxiety-provoking task. Almost all children (96%) had a score of 1 on fearful facial expression, indicating that fearful facial expression was not observed during the anxiety-provoking task. Low variance was also found for Shame (96% had a score of 1) and Body rigidity (87% had a score of 1). Most codes showed moderate to good intraclass correlation coefficients (ICC). For fearful facial expression, no ICC could be computed due to the absence of variance among the observers. For this code, we calculated the percentage of agreement, which was 95%.

Fig. 1
figure 1

Mean anxiety subscale rating for mother and child report of the SCARED. SCARED, screen for child anxiety related emotional disorders

Table 1 Descriptive statistics of SCARED
Table 2 Descriptive statistics of all categories of observational codes

Since the fearful facial expression, Shame, and Body rigidity were almost non-existent in the sample, a composite score that was calculated as the mean score of the remaining six codes was generated. The reliability of this scale was very poor, with Cronbach’s α = .18. Exclusion of observational codes had no significant effect on the reliability of this scale, indicating that this composite score of observed anxious behavior was not fit for use. Additionally, correlations among the observational codes (Table 4) showed that only Proximity and Absence of eye contact were significantly interrelated. None of the other observational codes were interrelated. Further analyses were therefore conducted with the six observational codes separately (i.e., Gratuitous body movements, Avoiding task, Fingers in mouth, anxious voice, proximity to mother, and absence of eye contact).

Mother–Child Discrepancy

The correlations between SCARED-C and P are presented in Table 3. Consistent with previous research, high levels of disagreement between SCARED-C and SCARED-P were found. The correlations on Total scale and the subscales were all in the low to moderate range. The strongest correlations were found between child and mother reports of separation anxiety and social anxiety (r’s being .55 and .40, respectively) while the lowest correlations were observed for the total scale (r = .26). Fisher’s r to z transformations indicated that agreement on separation anxiety subscale (r = .55) was higher than agreement on the total anxiety scale (r = .26; z = 2.17, p = .03). There was no significant difference between the other subscales. Contrary to our expectations for this clinical sample, children reported on average more symptoms compared to mothers for the total anxiety scale, t(78) = 4.37, p < .001, and the subscale panic disorder, t(78) = 5.51, p < .001. There was no significant difference for the subscales Separation anxiety, Social anxiety, and generalized anxiety. Furthermore, we explored the incidence of agreement and disagreement between SCARED-C and P (see Table 4).

Table 3 Correlations of the observations codes and the SCARED
Table 4 Agreement and disagreement between SCARED-C and SCARED-P

Mother–Child Discrepancy and Behavioral Observations

The correlations of the observational codes with the SCARED-C and P are presented in Table 4. A significant positive correlation was found between Proximity to mother and the subscale separation anxiety of the SCARED-C and a significant negative correlation was presented between anxious voice and the subscale panic disorder of the SCARED-C. No other significant correlations were found. Furthermore, regression analyses showed no significant relations between the observational codes and the discrepancies between SCARED-C and P (all F’s < 1.83, p’s > .05). Gender differences were present. For girls, no significant relations were found between behavioral observations and the discrepancy between SCARED-C and P (all F’s < 1.43, p’s > .05). For boys, a significant relation was found between anxious voice and the discrepancy between social anxiety reported by boys and mothers (F (5, 21) = 3.09, p = .03; adjusted r2 = .29) and between avoidance and the discrepancy between panic disorder reported by boys and mothers (F (5, 21) = 3.13, p = .03; adjusted r2 = .29). None of the other behavioral observations were related to the discrepancy between boys’ report (SCARED-C) and mother report (SCARED-P) (all F’s < 1.97, p’s > .05). For the two significant relations, further analyses were conducted.

The slope of the line of perfect agreement for social anxiety and anxious voice, as reported by boys and mothers, was negative and significant (B = − .17, p < .01). Anxious voice was high when mothers and boys agreed that social anxiety was low. The curve of the line of perfect agreement was positive and significant (B = .03, p = .05), meaning that the relation between anxious voice and social anxiety was non-linear. The relation between anxious voice and social anxiety was stronger when both boys and mothers scored low rather than high on social anxiety. Subsequently, the line of incongruence was examined, but both the slope (B = − .14) and the curvature (B = .00) were non-significant. The size and direction of the discrepancy between social anxiety, as rated by mother and boy, were unrelated to anxious voice. For the relation between avoidance and mothers’ and boys’ agreement on panic disorder; the slope (B = .62), and the curvature (B = .04) of the line of perfect agreement for panic disorder of the SCARED-C and P and avoidance were both non-significant. The line of incongruence showed a significant negative slope (B = − .99, p = .03) and a non-significant curvature (B = .07). This shows that avoidance is higher when mothers’ reports of panic disorder are higher compared to those of boys. The size of the discrepancy between mothers and boys on panic disorder was not related to avoidance.

Discussion

We aimed to examine agreement and disagreement between mother and child reports of anxiety among a sample of clinically referred anxious children. In line with our hypotheses and previous studies (e.g., Cosi et al. 2010), a high level of mother–child disagreement was shown on the reports of anxiety with correlations ranging from low to moderate. Also as expected, the level of agreement between children and mothers varied across anxiety subtypes, with the strongest correlations observed for separation anxiety and social anxiety (r’s being .55 and .40, respectively) and the lowest correlation for total anxiety (r = .26). Moreover, mothers and children showed significantly greater agreement regarding the levels of separation anxiety compared to levels of total anxiety. One possible reason for the higher levels of disagreement for anxiety overall is that most anxiety symptoms are internal and not observable by others. Mothers may be unaware of children’s worrying or internal distress. This is in line with previous studies that have found higher rates of agreement for anxiety symptoms that are more observable, such as for specific phobia, compared to less observable anxiety symptoms, such as those of generalized anxiety that often involve worrying and other difficult to observe states (Pereira et al. 2015). Higher levels of agreement about levels of separation anxiety may also reflect the fact that these anxiety symptoms are primarily displayed in relation to the primary caregiver and in the home environment. Therefore, mothers may be especially likely to notice—and perhaps be distressed herself by—them. Further research could explore whether this higher agreement about separation anxiety is specifically found for mother–child agreement, or whether it is also shown for father–child and teacher–child agreement.

Contrary to our expectations for this clinical sample, we found that on average children reported more symptoms compared to mothers for total anxiety and panic disorder. The average scores on the other subscales (i.e., separation anxiety, social anxiety, and generalized anxiety) did not significantly differ between mother and child. It is possible that our sample might have had somewhat different characteristics compared with samples from other clinical studies. Children in the current sample were assessed for eligibility based on the anxiety symptoms reported by either the mother or the child and regardless of the problem they were referred to the agency for. In most non-clinical studies, children have been shown to report higher intensity, frequency, and severity of anxiety symptoms compared to their mothers, consistent with our sample (Krain and Kendall 2000). Because it is the parent who most typically seeks help for their child’s anxiety, this pattern may be reversed among other clinical samples. Hence, our method of including children according to reports from either parent or child may well have resulted in the observed pattern that children reported overall more symptoms than their mothers. Moreover, we explored the incidence of agreement and disagreement between mothers and children and found that the number of children who reported more symptoms compared to their mothers was roughly equal to the number of children who reported fewer symptoms compared to their mothers. It is recommended that further research not only explores differences in average scores, but also in the size and direction of the discrepancies and between separate subgroups of parent–child dyads.

Next, we examined how an observational measure of anxiety was related to the various subscales of mother and child report. We hypothesized that observed anxiety would be more strongly related to subtypes of anxiety problems that are characterized by more noticeable symptoms and less strongly related to subtypes of anxiety problems that are characterized by more internally experienced symptoms. In line with expectations, we found that observed proximity to mother was positively correlated with separation anxiety. Interestingly, this relation was only observed on children’s reports. Children who indicated experiencing high levels of fear of separating from one of the parents were inclined to stay close to their mothers in the anxiety-provoking situation. Hence, children were accurate reporters of this need for parental proximity. There is evidence that children are better at predicting their own anxiety in stressful situations (DiBartolo and Grills 2006).

Furthermore, findings illustrated that children who had low levels of an observed anxious voice rated their own levels of panic disorder symptoms as high. Although this relation seems to be contradictory, it might be that children who score high on symptoms of panic disorder have become skilled in masking their anxious behaviors. The SCARED-C items that reflect symptoms of a panic disorder are mostly somatic symptoms (e.g., when I get frightened, my heart beats fast). It might be that children scoring high on these items are not by definition children with high levels of panic disorder, but rather experience anxiety accompanied by a high level of somatic complaints. These somatic symptoms are mostly internal and difficult to observe. Although both findings supported our hypothesis, no other relations between observed anxiety and children’s reports were found. Furthermore, and unexpectedly, there was a lack of relations between observed anxiety and maternal reports.

Our lack of significant associations might be due to limited reliability and validity of our observational measure of anxiety. The coding system in the present study was based on prior work by Kendall and colleagues (1994, 1997) who adapted the POSA (Glennon and Weisz 1978) for use in middle childhood. We further adapted and extended Kendall’s coding system. Although all observational codes in the current study were reliable, together they did not form one reliable construct of observed anxiety. Previous studies in middle childhood experienced similar problems, as each study excluded different observational codes to assemble one reliable construct of observed anxiety (Kendall 1994; Kendall et al. 1997; Turner and Romanczyk 2012). Anxiety is an internalizing problem that involves mostly anxious thoughts and feelings, which are difficult to detect from an observer’s point of view. Certainly, for children in middle childhood who are in the process of developing more complex emotions, self-awareness, and the ability to regulate and hide their emotions (Damon et al. 2006), anxiety becomes more difficult to observe. These children are starting to become aware of the social undesirability of showing anxiety and are more capable of masking their own anxiety than younger children.

In addition, not all situations will elicit the same amount of fear and distress in all children. In addition to our own current results, other researchers have pointed to the fact that anxiety evokes different types of coping strategies and different levels of capacities to mask anxious behaviors so that no single task can capture anxious behaviors reliably (Thorne et al. 2013). However, the current study used a single anxiety-provoking task (videoed discussion) to capture all subtypes of anxiety disorders instead of focusing on one specific anxiety disorder, such as social or specific phobia (Beidel et al. 2000; Kendall 1994; Kendall et al. 1997; Turner and Romanczyk 2012). In retrospect, the discussion between mother and child about a feared event did not elicit high enough levels of distress in most of the mother–child dyads to manifest overtly. Future studies should consider the use of a more specific and intensive anxiety-provoking task. Specific anxiety-provoking tasks or exposure tasks are necessary to elicit high enough levels of distress in children with a specific anxiety disorder.

Another reason for the limited agreement between observed anxiety and the rating scales might be the inherent discrepancy between what is being measured with a behavioral measure and a rating scale. Anxiety rating scales generally assess the child’s cognitions and feelings about their own anxiety and distress across a range of time. Thus, they measure trait anxiety, as a stable, summary-level, index of anxiety that is considered more persistent across situations and through development. On the other hand, when children are observed during an anxiety-provoking situation, real time, state anxiety (the experience of anxiety in the here and now) is being measured. Although these two indices of anxiety are correlated, they are also distinct. For example, behavioral observations of anxiety are more common among studies with preschool children and these studies tend to find only small to moderate correlations with parental reports (e.g., Stifter et al. 2008). Furthermore, the level of agreement between behavioral observations of anxiety and parental reports with preschool children vary depending on the level of threat within the anxiety-provoking situation (Kiel and Hummel 2017) and depending on the positivity of the observational codes (Stifter et al. 2008). Thus, our own study as well as these previous ones seems to suggest that there is distinct information about anxiety, its real-time expression (under lab conditions) and questionnaires that are important to note and be mindful of when designing clinical research and for clinical practice purposes. Lastly, the variance in observed anxiety might have been low due to the homogenous sample of highly anxious children.

Although we had our concerns regarding the observational measure of anxiety, there were some interesting gender differences that emerged. In general, gender of the child has been inconsistently associated with agreement between mother and child. Some studies found no differences between girls’ and boys’ reports and those of their mothers (Choudhury et al. 2003), whereas others have found that boys agreed more with their mothers than girls did (Grills and Ollendick 2003). In the present study, we found similar discrepancies between boys and girls. One curious finding emerged with respect to our exploratory analyses of mother–child discrepancies and observed anxiety. Contrary to expectations, mother–child discrepancies overall were not related to the behavioral observations. For boys, on the other hand, we found that when they agreed with their mother that their social anxiety was low, they had high levels of anxiety in their voice. Additionally, when mothers reported higher rates of panic disorder than did their sons, boys showed more avoidance in the anxiety-provoking task. This finding is difficult to interpret because it may be a chance finding or it needs to be qualified with more precise tests of which types of behavioral observations (e.g., anxious voice) are clear indicators for anxiety in children. However, it is clear that future studies regarding agreement and disagreement between parents and children should take gender differences into account.

The current study has an additional important limitation that should be acknowledged. Including children in the clinical agencies was more challenging than we anticipated. Although we extended our study by one full year, we were unable to include the full recommended sample of 120 children (Jansen et al. 2012). Therefore, the analyses were underpowered. Unfortunately, this is a common issue in clinical research and our sample size was comparable or even larger than other similar clinical studies (e.g., Choudhury et al. 2003; Esbjørn et al. 2013). Moreover, we believe that it is important to also publish studies that have failed in their initial aim to observe and code anxiety, with the effort to avoid problems that come with self- and parent-reports. We ourselves would have benefitted tremendously if we had found a prior study in the literature that had highlighted the potential limitations of observational tasks and coding systems and we hope that our observational treatment study with follow up data, could indeed make this contribution to the field. In future studies, it seems clear from these current data that researchers should be cautious with what anxiety-provoking tasks they use as well as the coding system for assessing anxiety in middle childhood. Furthermore, clinical studies with bigger sample sizes are needed.

This study had various limitations that were discussed (especially as they relate to behavioural observations of anxiety) and they certainly temper the implications. The high levels of discrepancy between mother and child reports underscore the importance of applying a multi-informant assessment in clinical practice, as well as research. Although numerous studies over the years have highlighted the importance of multi-informant assessments, there continues to be a long-held preference to prioritize information from the parent over the child (Dirks et al. 2012). Moreover, our findings suggest that discrepancies may vary among various subtypes of childhood anxiety. Researchers and clinicians may expect higher agreement between mothers and children on ratings of separation anxiety and lower agreement on ratings of total anxiety and somatic symptoms. It may be particularly important for anxiety problems that are characterized by more internally manifested factors (e.g., arousal), to incorporate both the child’s and parent’s perspectives in the assessment of anxiety.