Your own children are special: clues to the sources of reporting bias in temperament assessments

https://doi.org/10.1016/j.infbeh.2003.12.005Get rights and content

Abstract

We investigated sources of discrepancy between parents’ and observers’ reports of infant temperament behavior. A laboratory protocol was employed that involved rating of 4 segments of the mothers’ own children's behavior on 4 temperament characteristics (yielding a profile of 16 ratings) and rating 6 segments of behavior of standard children on the 4 temperament characteristics (yielding a profile of 24 ratings). These ratings were completed by mothers and by trained observers. Correspondence between mothers and observers for standard children was high, approaching the criteria typically employed to certify researchers on a coding protocol. In contrast, correspondence between mothers and observers for own children was negligible. The relationship status of the observer and target is important when considering correspondence among observers.

Introduction

Parents don’t always provide the precise information we would like to have about their children. If parents were complete and accurate reporters, research in human development would be far more economical, and our current knowledge base would likely be much more comprehensive. Developmentalists accept the premise that young children have individual differences in behavioral style, but attempts to verify this assumption have been fraught with difficulty. In particular, we have a strong empirical base indicating that when parents are used as informants about children's behavior, there is a larger amount of discrepancy with other informants than would be considered appropriate for most instruments used in research settings (e.g., Achenbach, McConaughy, & Howell, 1987; Hoyt, 2000; Sameroff, Seifer, & Elias, 1982). Unfortunately, we know relatively little about the processes underlying accuracy in parent reports beyond some very general correlates of the reports. The study we report here examines some features of the process by which parents report about the temperament behavior of their children, including their agreement with trained observers when rating their own children and unfamiliar children.

What do we mean when we say an informant is an accurate reporter? We invoke here the standard notion of reliability; that is, different informants should provide the same report about the behavior of the child in question. It should not matter whether there are any peculiar characteristics of the informant with respect to the child, but rather that the behavior of the child is described in more or less the same way from informant to informant (where one can invoke standard limits for acceptable levels of agreement).

Unfortunately, there are many sources of bias in ratings of behavior. Two principal sources are rater variance where consistent tendencies of a rater are in effect across target children, and dyadic variance where interactions of the rater and the target children yield systematic variance in ratings. Additional interactions with raters, such as rater-trait or rater-occasion, can also affect accuracy of behavior ratings (Hoyt, 2000; Hoyt & Kerns, 1999). In the domain of infant temperament (as well as other types of reports of child behavior) the use of family members to provide child behavior ratings introduces another factor, which is the amount of information available to different classes of informants. Obviously, those family members living with the children being reported on will have a much larger information base from which to work. In effect, this is one potential mechanism by which dyadic variance may be introduced to the rating of child behavior.

Rater bias in temperament research is likely not limited to dyadic variance. In contrast to simple inaccuracy (which may occur when parents who have not established rater reliability are used as raters [Hoyt & Kerns, 1999]), bias is indicated when there is systematic association of the content of informants’ reports with characteristics of the informants themselves. For example, in the temperament literature, several studies have indicated parental factors such as SES, race, anxiety, or depression are correlated with reports of child behavior, sometimes at a higher level than the agreement among different informants (Sameroff et al., 1982; Vaughn, Deinard, &, Egeland, 1980). In a related vein, the relationship of informant and target may restrict the information available. For example, when reporting on children's internalizing symptoms, parents are typically less aware of inner emotional states of their children (such as withdrawal or sadness) than are the children themselves (Ivens & Rehn, 1988; Kashani et al., 1985). There is also some suggestion that parents’ biases about their children's behavior can in part influence development in the direction of that bias (Pauli-Pott, Meurlesacker, Bade, Haverkock, & Beckmann, 2003).

There is a substantial body of work converging on the conclusion that when rating the behavior of children, the correspondence among raters having different relationships with the target children averages about .30 (when indexed by correlation coefficients—some studies reporting higher correspondence, some lower), with most of this work comparing information from questionnaires (Achenbach et al., 1987). In the temperament literature, there has been debate about who (if anyone) is reporting accurately on the children's behavior (Carey, 1983; Rothbart & Bates, 1998; Seifer, Sameroff, Barrett, & Krafchuk, 1994; Vaughn et al., 1980). On the one hand, the inability of parents to agree with observers (who have exceeded reliability standards following training) implies that parents do not accurately see the behavior of their children. On the other hand, it may be that observers are lacking because they do not have the information base available to parents. The stalemate resulting from this debate has called into question one of the methods often employed to assess temperament in infants and children.

One strategy for allowing both family members and non-family-members to gain large (if not equivalent) amounts of information about children is to have non-family members make multiple observations of children across weeks or months. Seifer et al. (1994) employed this strategy with some success in producing reports with higher than usual correspondence between family and non-family informants, but they still did not approach usual standards of inter-rater reliability.

Temperament research presents some special issues when considering accuracy of informants. Much of the interest is in infants and young children, thus precluding the use of children themselves as informants. Temperament constructs by definition do not consist simply of describing behavior expressed in a single context at a single time. Like personality, temperament constructs are meant to organize typical behavior evident across time and setting. This focus highlights the problems inherent in measurement of trait-like phenomena (Epstein, 1983; Mischel & Peake, 1982). Despite the fact that temperament refers to behavior that is relatively stable across time and setting, there are many studies that assess temperament in relatively short one-time observations in the laboratory. Some examples include the LABTAB procedure (Goldsmith & Rothbart, 1990) and methods used to assess behavior inhibition (Garcia-Coll, Kagan, & Reznick, 1984).

Finally, temperament is often studied in the context of other organized behavior systems in children, such as the presence of behavior problems. Since the other constructs are also frequently assessed by means other than direct observation or child self-report, the accuracy of raters may be especially salient if they are asked to report on different behavioral domains that will later be compared. To the extent that systematic inaccuracy is contained in the ratings, spurious relations among variables will result from informant factors rather than child factors.

Systematic exploration of parental reporting bias, or indeed whether such bias exists, will be enhanced with well-articulated models of how such bias might be manifest. As in other studies of behavior, such models provide for testable hypotheses that can be supported or undermined. In this section, we identify the model that guided the empirical work reported here.

Kenny (1994) outlines a set of basic principles regarding the general function of person perception in his Social Relations Model. A fundamental premise of this model is that relationship partners have perceptions of one another. Furthermore, these perceptions may be conceived as the sum of several components: target variance, perceiver variance, relationship variance, and a constant term. Target refers to the degree to which the particular target individual is perceived by others (in general) as high or low on a trait; perceiver refers to the degree to which a particular individual perceives targets (in general) as high or low on a trait; relationship refers to the unique view of a particular target by a particular perceiver, with the general target and perceiver variance controlled; constant refers to the average level perceivers (in general) view targets (in general) as high or low on the trait. A general finding from Kenny's review of person perception studies is that target variance accounts for about 15% of total judgment variance, perceiver variance accounts for about 20% of total judgment variance, and relationship variance account for about 20% of total judgment variance; the remaining 45% of judgment variance is attributed to error. Stated another way, the consensus among raters (which might be viewed as our best estimate of the actual trait level of the target) accounts for only a small portion of variance—less than either the perceiver (rater) characteristics or the relationship factors.

What is the meaning of relationship variance in person perception? One explanation is that different information is shared within a relationship than outside of a relationship. This, in fact, is a point often emphasized in support of parent-report methods. There is special knowledge held by parents that would be very hard to duplicate in trained observers who do not have an ongoing relationship with the target child because of limitation on time and place for interaction with the child. A second explanation is that different meaning may be attached to behavior because of the existent relationship, for which three distinct sub-components may be identified. Parents may have developed an attributional bias about their child based on expectations and history. These attributions can be generally positive or negative, and may have other more specific characteristics. Cultural milieu of the perceiver may also color the judgments made about others. For example, in some cultural groups a parent may value high activity level as an indicator that his or her child is vibrant and competent. Perceivers in a relationship may integrate observations into coherent narratives. A growing body of work points to the importance of individually constructed narratives in understanding how families behave with and understand one another (Fiese et al., 1999).

With different information, and applying different meaning to the behaviors of the target, relationships may thus promote idiosyncratic perceptions by the partners. This aspect of relationship variance may be particularly important in the case of parent reports of young children. This may be viewed as an aspect of perceiver variance, but with specific reference to the relationship with the target. Knowledge of, or affect about, the target gleaned from shared relationship history may influence the judgments made by the perceiver. When understanding how parent-report measures may provide error-laden information, it is important to keep in mind that judgments about others do not contain a high degree of consensus, while larger portions may be attributable to the perceiver and the relationship portions of variance. Thus, there is likely to be a large portion of variance in obtained measurements that is not about the actual behavior of the targets. One of the primary functions of training observers to rating reliability criteria is to minimize the perceiver component (while maximizing the consensus component) of their ratings—typically done in a context were raters do not have a relationship with the targets.

We summarize the key implications of our approach to understanding the behavior-rating process in Fig. 1, Fig. 2. These figures depict how similarities and differences in parent and non-family observers would result in different levels of rater correspondence when observing child behavior under conditions of strong parent–child relationship (Fig. 1) and no parent–child relationship (Fig. 2)—these formulations drive the basic hypotheses of this study.

We examined two hypotheses in this study:

  • Mothers’ reports of children's behavior will be concordant with observers’ reports when they have low level of personal relationship involvement in the behavior they are rating.

  • Mothers’ reports of children's behavior will be discordant with observers’ reports when they are highly involved in the relationship and the behavior.

Section snippets

Participants

Families were recruited at childbirth classes or during the lying-in period at a local obstetrics hospital that accounts for 90% of births in the state. Mothers were briefly told about the nature of the study and were asked for permission to contact them when their child was around 4 months of age regarding participating in the study (positive response rate was about 25%). When the children were about 4 months old, families who agreed during the prenatal/neonatal period were contacted by

Mother and observer ratings of standard children

The first hypothesis was that mothers would be able to accurately rate the behavior of unfamiliar children (i.e., the mothers’ ratings would have high correspondence with the observers’ ratings). The q-correlations summarizing the mother–observer correspondence for standard children were used to evaluate this hypothesis. The average correlation between profiles of mother ratings and profiles of observer ratings was .84 (S.D. = .09) and the median correlation was .84. All but one of these

Discussion

The main goal of this study was to examine the process of rater accuracy in reports of child temperament. We found two important results that speak to issues of rater bias. First, mothers and observers had high levels of agreement (with respect to rank order) when rating standard children. In fact, the level of correspondence was sufficient to support the proposition that most could be considered as reliable raters in a typical observation study of behavior. The second result was that mothers

Acknowledgment

This research was supported by a grant from the National Institute of Mental Health.

References (24)

  • S. Epstein et al.

    The person-situation debate in historical and current perspective

    Psychological Bulletin

    (1985)
  • B. Fiese et al.

    The stories that families tell: Narrative coherence, narrative interaction, and relationship beliefs

    Monographs of the Society for Research in Child Development

    (1999)
  • Cited by (78)

    • How to screen for social withdrawal in primary care: An evaluation of the alarm distress baby scale using item response theory

      2021, International Journal of Nursing Studies Advances
      Citation Excerpt :

      Second, parents and observers do not necessarily agree on ratings of the infant's behavior. Studies show that parents often do not agree or show minimal agreement with observers when rating their own infant's temperament, even when parents and observers rate the same situations (Seifer et al., 2004; Stifter et al., 2008). Given these issues, it is important to supplement parent-report with observer-rated screening tools.

    • Infant sleep moderates the effect of infant temperament on maternal depressive symptoms, maternal sensitivity, and family functioning

      2019, Infant Behavior and Development
      Citation Excerpt :

      Future research should draw upon genetically informed designs to disentangle associations among infant temperament, infant sleep, and maternal and family characteristics while accounting for gene-environment correlations, gene-environment interactions, and prenatal programming effects. Our focus on observed infant temperament and objectively measured infant sleep is a strength given that maternal reports of infant behavior are often biased and may inflate associations among infant characteristics, and indicators of maternal and family wellbeing (Sadeh et al., 1991; Seifer et al., 2004). This issue may be especially critical when examining interactive effects of temperament and sleep as mothers who rate their infants as more temperamentally difficult may rate their infants as more difficult sleepers, but these associations may not hold when utilizing observational designs.

    View all citing articles on Scopus
    View full text