In this study, the reliability and validity of generic as well as disease-specific FHS and HRQoL questionnaires have been assessed in the setting of a RCT concerning children with recurrent AOM. Most generic (RAND, FSQ-Generic and FSQ-Specific) and disease-specific (OM-6 and FFQ) questionnaires showed similar, good to excellent reliability and adequate construct and discriminant validity. Construct validity was poor for the numerical rating scales (NRS Child and NRS Caregiver), and discriminant validity was low to moderate for both NRS and the subscales of the TAIQOL considered to be otitis media-related (Tables 4
Generic as well as disease-specific questionnaires proved to be sensitive to change in the incidence of AOM (Table 8
). The effect sizes were found to be ranging from small to moderate for both generic and disease-specific questionnaires (Table 8
). The MCIDs for generic and disease-specific questionnaires were quite similar in terms of responsiveness (Table 9
and Figure 1
). However, most otitis media-related subscales of the TAIQOL, the only true HRQoL questionnaire, proved insensitive to change.
Reliability and validity
Results on internal consistency and test–retest reliability of the RAND, FSQ Generic, FSQ Specific, TAIQOL and OM-6 found in this study, were comparable with those of previous studies using these questionnaires [14
]. The consistency of results across different paediatric populations supports the reliability of these questionnaires. Similar to the poor discriminant validity in this study of the otitis media-related TAIQOL subscales, Fekkes et al. [51
] found the TAIQOL subscales ‘Problem behaviour’, ‘Positive mood’, and ‘Liveliness’ discriminated neither between healthy and preterm children nor between healthy and chronically ill children. The ability of the RAND, FSQ Generic and FSQ Specific to discriminate between children who differed in AOM frequency, on the other hand, supported their discriminant validity previously found in children with asthma and healthy children [41
]. However, the heterogeneity of methods used limits the comparability of results regarding validity of this study with those from previous studies.
The FFQ and NRS Caregiver are newly composed questionnaires to assess the influence of recurrent AOM on the caregiver and family. The FFQ demonstrated excellent reliability and validity, meeting the minimal required reliability coefficients of 0.90 for individual assessment [65
]. The strong correlation with the OM-6 supports its complementary usefulness in FHS and HRQoL assessment in children with rAOM. Results of the NRS Caregiver, however, were similarly poor as those observed for the NRS Child, which needs further exploration. Their global, single-item assessment of HRQoL may be too crude to reflect subtle differences in HRQoL [88
]. On the other hand, comments of the caregivers indicated that some of them may have misunderstood the NRS test-instructions. This is supported by the fact that improvement of construct validity occurred during follow-up assessments, presumably due to learning effects after reading the instructions a second time.
So far, little attention has been given to the responsiveness of the questionnaires used in our study. Only Rosenfeld et al. [55
] assessed effect sizes for the OM-6 (using a standardized response mean) that were much larger (1.1–1.7) than the ones found in this study. This may be explained by the use of different identifiers of change. Rosenfeld et al. [55
] used an intervention with expected clinical effectiveness, for which proxies were not blinded, as indicator of change. Since pneumococcal vaccination proved to be clinically ineffective [74
], treatment could not be used as an external criterion for change. Instead, a change of 2 or more AOM episodes per year was used as criterion to identify changed subjects. In addition, social desirability and expectancy bias may have influenced the outcome of the study of Rosenfeld et al. [55
Although clinical criteria such as change in the incidence of AOM episodes have been suggested as adequate alternative criteria to identify change [34
], the choice for any external criterion for change remains somewhat arbitrary. It is a surrogate measure that often only reflects one aspect of the QoL construct. The poor responsiveness of the TAIQOL subscales ‘Behavioural problems’, ‘Positive mood’ and ‘Liveliness’, for example, may indicate that our clinical indicator is less suitable as external criterion for change in emotional and behavioural functioning. However, considering the overall poor responsiveness of the twelve TAIQOL subscales (results not shown), it seems more obvious that poor responsiveness in itself mainly applies for these three subscales as well.
Several studies have supported the empirically found link between one SEM and the MCID for HRQoL questionnaires [75
]. In this study the MCIDS based on the value of one-SEM largely corresponded with a MCID that was estimated using 0.3 ES as a benchmark, which is in further support of the one-SEM as an indicator of MCID (Table 9
). However, it should be realized that the SEM as well as the ES are both only statistical indicators, which relate change to random (error) variance. Interestingly, the anchor-based methods yielded similar estimates for the MCIDs (Graphs 1
a, b, 2
), which is in agreement with recent observations that one-SEM equals anchor-based MCID in patients with moderately severe illness [90
]. By applying and comparing multiple methods as well as two evaluation periods, we have not only been able to demonstrate consistency in responsiveness but also to give ranges for minimally clinical important changes instead of point-estimates. As there is no ‘golden standard’ for the assessment of responsiveness in FHS and HRQoL measurement, a range of scores gives a more realistic reflection of responsiveness than a point-estimate. Point estimates can be misapplied by users who are either unaware of the limited precision of data used for estimating the MCID or who are unaware of the intrinsic limitations of dichotomising what is actually a continuum.
Generic versus disease-specific questionnaires
Although generic questionnaires are generally expected to be less sensitive to differences in FHS or HRQoL than disease-specific questionnaires [19
], in this study most disease-specific questionnaires performed only marginally better than the generic questionnaires on the discriminant validity test. Likewise, the responsiveness of generic questionnaires, and their usefulness as measures of outcome in randomized trials has been questioned [21
]. Although in some studies generic measures indeed were found to be less responsive to treatment effects than specific measures [93
], other studies did find comparable responsiveness [97
]. In this study, only the smaller effect sizes for the FSQ Generic and FSQ Specific may indicate that sensitivity to responsiveness of generic questionnaires is somewhat poorer than that of disease-specific questionnaires. Possibly, this higher sensitivity at the start of the study reflects the higher incidence of symptoms and functional limitations that are specific to AOM, whereas during the study AOM incidence decreases and consequently AOM symptoms become less prominent compared to other health problems. Overall, the generic questionnaires appeared to be as sensitive to clinical change as disease-specific questionnaires, except for the TAIQOL.
For the FSQ Generic and FSQ Specific, but not for the RAND which assesses general health perceptions, sensitivity to differences and change in FHS could be explained by their content, as they include many physical and emotional behaviour items that may be affected by rAOM. The more relevant a questionnaire is to a particular condition, the more sensitive it is likely to be. The sensitivity of the RAND, assessing general health and resistance to illness, may indicate that it meets the perceptions of the caregivers of children with rAOM in thinking that their overall health is worse compared with other children. It also may reflect the significant co-morbidity like chronic airway problems and atopic symptoms in the study population (Table 3
The reasons for the poor performance of the TAIQOL with regard to both discriminant validity and sensitivity to change are not obvious. Possibly the subscale scores represent each an aspect of HRQoL that is too limited to be sensitive to differences or change. Combining the subscales to more comprehensive constructs may then improve sensitivity. In addition, each item of the TAIQOL consists of two questions; a question about FHS is followed by the request to rate the child’s well-being in relation to this health status. Response shift bias may have modified the caregivers’ expectations about how their child feels in line with the child’s changing health, that is caregivers may rate their child’s well-being as better than it actually is as they adapt to the situation. Studies on factors that may influence sensitivity to change or responsiveness besides the type of questionnaire (generic versus disease-specific), such as questionnaire structure and content, disease severity, co-morbidity and other population characteristics, are needed.
Bias and generalisibility
There are several issues that need to be considered when interpreting the current results. First, frequency of AOM episodes at enrolment was based on proxy report, whereas during the trial only physician-diagnosed episodes were counted. The number of AOM episodes in the year prior to inclusion is likely to be overestimated by proxies [100
], resulting in the underestimation of HRQoL change scores because they may have evaluated the situation as worse than it objectively was in the first place. However, if such a recall-bias regarding AOM frequency was in fact present, it may also have influenced caregivers’ reflection on subjective measures such as FHS and HRQoL, which results in realistic or even overestimated change scores. However, estimating responsiveness for the interval of 7–14 months, in which AOM frequency was not affected by recall bias since al episodes were physician diagnosed, yielded similar results. This indicates that recall bias appears not to have influenced responsiveness substantially.
Secondly, in assessing test–retest reliability, two different modes of questionnaire administration were used: completion at the clinic versus home completion. The possible intention to give more socially desirable answers at the clinic as well as other effects such as being more distracted when filling in the questionnaires at home, may have caused differences in questionnaire scores between the first (test) and second (retest) assessment. Although this impact may be larger for single item questionnaires such as the NRSs compared to multiple item questionnaires, and might explain their somewhat smaller ICCs, the impact on the ICCs appears to be small.
Thirdly, during the trial, 8 children (4.2%) in the pneumococcal vaccine group and 13 (6.7%) in the control vaccine group were lost to follow-up. One child switched from the control to the pneumococcal vaccine group. It is unlikely that these small numbers of dropouts and crossovers influenced the trial results.
Furthermore, indices of validity and reliability are not fixed characteristics of FHS and HRQoL questionnaires but are influenced by the study design, intervention, and study population in particular. Our study population had significantly severe ear disease with frequent episodes and was older than the average child with AOM. Assessment of reliability and validity of the questionnaires in populations with less severe disease may present more ceiling effects and lack of discriminant validity. Therefore, the results of this study should only be generalized to paediatric populations with moderately to seriously severe recurrent acute ear-infections at an older age (approximately 14–54 months).
Finally, of all questionnaires in this study, only the FFQ demonstrated a reliability that meets the minimal required reliability coefficients for individual assessment of HRQoL. Although some authors suggest to use FHS and HRQoL questionnaires for individual assessment in clinical practice as well [31
], we do not support this approach. It is suggested that routine use of these questionnaires would facilitate detection and discussion of psychological issues and help guide decisions regarding, for example, referral. However, considering the complexity and many pitfalls of reproducibility and responsiveness assessment, individual use of HRQoL and FHS questionnaires as part of the follow-up of individuals is not reliable nor valid.
Recommendations for clinical use
In conclusion, generic (RAND, FSQ Generic and FSQ Specific) as well as disease-specific (OM-6, FFQ, and, to a lesser extent, NRS Caregiver) questionnaires demonstrated similar and high reliability and adequate construct and discriminant validity as well as responsiveness to justify use in clinical studies of children with rAOM. However, NRS as used in this study may be less adequate for assessment of HRQoL in this population. The TAIQOL, the only true generic HRQoL questionnaire, unfortunately showed a poor discriminant validity and sensitivity to change, needing extensive revision before further use in clinical outcome studies in children with otitis media. Using both a generic questionnaire (RAND or FSQ) and the OM-6 in clinical studies regarding FHS in children with rAOM is recommended, as it would combine the merits of both generalisability and sensitivity in outcome assessment and facilitate head-to-head comparisons of their performance in various paediatric populations with OM.
More studies are needed assessing responsiveness of paediatric QoL questionnaires by multiple, distribution as well as anchor-based, methods to increase our appreciation of minimal clinically important changes in various paediatric conditions. Further studies on factors such as questionnaire structure and content, disease severity, co-morbidity and other population characteristics that may influence sensitivity to change or responsiveness besides the type of questionnaire (generic versus disease-specific) may increase our appreciation of the complex dynamics in HRQoL and FHS assessment.