Food allergy affects almost 4% of the general population in westernized countries [
1], and it is the primary cause of anaphylaxis presenting to emergency departments [
2]. The only proven therapy is careful avoidance of the causal food(s) and provision of medication for emergency treatment [
3]. Consequently, patients often fear an allergic reaction and are continuously faced with dietary and social restrictions in their daily lives, which can have a negative impact on quality of life [
To measure Health-Related Quality of Life (HRQL), disease-specific questionnaires are significantly more sensitive than generic ones, and they are important for estimating the general burden of food allergy as well as measuring the response to interventions or future treatments. However, generic HRQL instruments allow comparison of the burden of disease between patient populations with different diseases [
12]. Recently, as part of the EuroPrevall project, the first self-administered HRQL questionnaires specific for food allergy have been developed and validated: the Food Allergy Quality of Life Questionnaire-Child Form, -Teenager Form and -Adult Form (FAQLQ-CF, -TF, -AF). The FAQLQs showed good validity, internal consistency and discriminative abilities [
16], but test-retest reliability was not extensively investigated.
Reliability measures are important to ensure that what the questionnaire is measuring is dependable and repeatable [
12] and that it allows sample sizes to be determined for clinical trials [
17]. The aim of this study was therefore to assess the test-retest reliability of the self-administered FAQLQ-CF, -TF and -AF.
This article describes the evaluation of the test-retest reliability of the recently developed self-administered FAQLQ-CF, -TF and -AF. Overall, reliability was considered to be excellent for the FAQLQs as measured with the ICC and CCC. Additionally, Bland–Altman plots showed that mean differences were all close to zero, supporting the high reliability of the FAQLQs.
In this study we used ICCs calculated by a one-way ANOVA, CCCs and Bland-Altman plots to assess test-retest reliability. However, different methods can be used to assess test-retest reliability, and there is much discussion in literature on the best way to do this [
20]. A disadvantage of the ICC is that if patient groups are very homogeneous, the ICC tends to be low, because the ICC compares variance among patients to total variance. If patient groups are very heterogeneous, the ICC tends to be high. Thus, the ICC would only generalise to similar populations. Additionally, the one-way ICC does not take into account the order in which observations were taken [
29]. Therefore, the CCC is a useful additional measure. The CCC takes into account not only mean differences between the first and second measurement, such as ICCs calculated by a one-way ANOVA
, but also takes into account variance differences between the first and second measurement by reducing the magnitude of the resulting test-retest reliability estimate. In addition, the CCC is a better tool to distinguish between bias and imprecision [
29]. There can be large differences in ICC and CCC scores, especially in studies with heterogeneous groups. The similar scores we found in our study reflect that both coefficients worked very well in this population and that results can be generalised to other groups. Bland-Altman plots are very illustrative in assessing test-retest agreement. They were useful to identify some extreme and outlying differences, to analyse the magnitude of the measurement error, which was small, and to visualise a possible relationship between the difference and the mean of both scores [
This study may also have some limitations. Firstly, the sample sizes were relatively small. However, we found that the reliability of the questionnaires was very high, which indicates that the sample sizes were adequate and that a greater number of patients would probably not have influenced the outcomes. Another limitation may be that the majority of adults in this study was female. However, we did not find significant differences in the test-retest reliably outcomes between men and women (data not shown). Therefore, we think that the imbalance between men and women did not influence the generalisability of the results of the FAQLQ-AF. Finally, the significant correlation between the first and second measurement of the FAQLQ-AF (Fig.
1C) and between the mean of both scores and the differences of both scores of the FAQLQ-AF (Fig.
2C) was an unexpected finding. We think this correlation might be due to an outlier. This assumption was supported by a re-analysis excluding this outlier, which showed that the correlation was no longer significant.
In summary, the FAQLQs clearly showed excellent reliability and are thus promising measures in evaluative studies in patients with food allergy, but also in monitoring individual patients. The high test-retest reliability supports the value of the FAQLQs for clinical trials with relatively small sample sizes. We recommend the use of the FAQLQs in clinical trials of current management strategies of food allergy, and they may also be useful when new treatments become available. Currently, the longitudinal validity of the FAQLQs and the validity of several other European language versions of the FAQLQs are being investigated.
This work was funded by the EU through the EuroPrevall project (FOOD-CT-2005-514000).