Large scale population-based studies focusing on infectious diseases are scarce. This may be explained by methodological obstacles concerning infectious diseases. Many common infectious diseases are acute in nature, requiring an intensified collection of symptomatic data and/or collection of biosamples for pathogen identification. Apart from laboratory methods to detect past and/or current infections in the German National Cohort (GNC), a self-administered questionnaire will be used to collect additional data on selected infectious diseases. For this reason, a panel of experts involving infectious disease epidemiologists, medical doctors, microbiologists, public health specialists and others were invited to work on the development of an infectious disease questionnaire. An initial version of this questionnaire was applied during Pretest 1 of the GNC and is evaluated in a separate publication in this issue (Sievers et al.). We further modified the questionnaire for application in Pretest 2 by including additional questions (e.g. questions about self-assessed vulnerability to infections). Since this new instrument for susceptibility to infections has not been evaluated before, the aim of this study was to evaluate the respective items. Additionally, to further test the applicability of the questionnaire in the GNC we examined test–retest reliability of the questionnaire in Pretest 2.

Methods

Validation study

This part of the study was conducted in two study centres (Hamburg and Hanover) during the Pretest 2 phase of the GNC. In both study centres, a self-administered paper-based questionnaire was applied. In Hamburg, participants were asked to fill in the questionnaire on infectious diseases during their regular visit at the study centre. In Hanover, the first of the two questionnaires of the reliability study was used to evaluate the items on self-assessed vulnerability.

Reliability study

Test–retest reliability was examined in Hanover only, i.e. by administering the infectious disease questionnaire on two separate occasions. The questionnaire was mailed twice to the participants. The second copy of the questionnaire was sent to the participants upon receipt of the first questionnaire in the study centre but no earlier than one week after sending the first copy. The second questionnaire contained a question about disease occurrence since completion of the first questionnaire in order to exclude those participants from analysis who experienced an intercurrent illness between both questionnaires which might influence the response. Participants who specified a recent relevant disease episode were excluded from the calculation of agreement for disease-based (IN1–IN7) and symptom-based (F1–F3) items as well as the question on antibiotic intake (A1).

Infectious disease questionnaire

The complete questionnaire can be found online (see supplement).

Self-reported infections (IN1–7) and self-reported symptoms (F1–3)

The questionnaire contained seven questions asking about the frequency of upper and lower respiratory infections (U/LRT), gastrointestinal tract infections (GIT) and infections of the bladder or the kidney and skin infections in the last 12 months. There were six answer categories: no infection, once, twice, three-to-four times, more than four times and the category “I don’t know”. In addition, we asked about the frequencies of three syndrome-related outcomes (cough lasting over 4 weeks, fever and diarrhoea) in the last 12 months. Questions about outpatient treatment and hospitalisation (answer categories “yes”, “no”, “I don’t know”) were included for infections of the upper and lower respiratory tract (IN1a, IN1b, IN2a, IN2b) and of the gastrointestinal tract (IN3a, IN3b).

Self-assessed vulnerability to infections (IH1–5)

Self-assessed vulnerability to several selected infections (upper and lower respiratory tract infections, gastrointestinal tract infections, infections of the bladder or the kidney, and skin infections) was asked by questions like “Compared to individuals in my age group I have infections of e.g. the upper respiratory tract” … “far less frequently”, “less frequently”, “approximately equally frequently”, “more often” or “much more often”.

Other questions

In addition to these thematically connected item blocks we asked one question on antibiotic intake in the last 12 months (A1) with six answer categories (never, once, twice, three-to-four times, more than four times and “I don’t know”) and questions on influenza vaccination (see Schultze, Akmatov, Castell et al. in this issue). The reliability study used only the general item on this vaccination (V1).

Definitions

Migration status was defined by either both parents not born in Germany, or one parent not born in Germany and interviewee not living in Germany since birth, or German not being native language [1]. Household net equivalent income per month was calculated from the original data using midpoint estimates of group levels; the highest group (≥ 8000 €) was set to 10,000 €. To account for household size, weighting was done according to [2]. School education was grouped as recommended by [3].

Statistical analysis

Validation of self-assessed vulnerability

Internal consistency of the five-item instrument “self-assessed vulnerability to infections” (IH1–5) was examined by Chronbach’s alpha (α). An explanatory factor analysis with the Varimax rotation method was applied to construct the scores of self-reported frequency of infections (IN1–7) and self-assessed vulnerability to infections. The Kaiser–Meyer–Olkin measure was employed to check the sampling adequacy of both scores. Spearman’s correlation was used to examine the correlation between the two scores. Furthermore, the score of self-assessed infection vulnerability was divided into five groups of equal size (i.e. quintiles). We then grouped the second, third and fourth quintiles into one group, resulting in three groups (lowest quintile classified as “less prone to infections”, second, third and fourth quintiles as “similarly prone to infections” and highest quintile as “more prone to infections” compared to peers).

Reliability study

Participants who filled in both questionnaires within less than 5 days or more than 14 days were excluded from the reliability analysis so that, on one hand, memory effects would be reduced and, on the other hand, the stability of the attributes was approximately ensured [4]. Reliability was quantified using Cohen’s kappa (κ) [4]. For ordinal scales (answer categories: never/ once/ twice/ three-to-four times/ more than four times) linear weighted κ wasused to take magnitudes in disagreement into account. Linear weighting was chosen over quadratic weighting because it increases less with the number of categories [4]. The weighting matrix for weighted κ was calculated based on the formula \( linear\text{ }weight=1-\frac{\left| i- \right.\left. j \right|}{k-1}\, \) where i and j are specific row and column categories and k is the overall number of categories [4]. If e.g. only 4 of 7 categories were used by the study population the matrix was modified to maintain the same weight of a given cell of the contingency table. In addition to the weighted κ, unweighted κ was calculated to account also for missing values (in one questionnaire) and the category “I don’t know”. In this case, answers were treated as on nominal scale. Only unweighted κ was computed for items with answer categories on nominal scale (e.g. IN1a). Observations with missing values for a given item in both questionnaires were excluded for calculating unweighted κ. Observed agreement (%) is also shown to regard the dependence of κ from the distribution of data [5]. In case of weighted κ we adjusted percentage agreement using the specific weighting matrix applied for κ itself. Confidence intervals (95 %) for κ were calculated according to Reichenheim [6], using bias corrected bootstrap estimates. We computed Krippendorf’s α (nominal scale) using the R package “irr” (version 0.84) and Bennett’s S according to [7]. Global Χ2 test was used to compare observed and expected proportions. Statistical analyses were conducted with Microsoft Excel 2010 (Microsoft Corp), STATA 12 IC (StataCorp LP), R 3.1.0 (The R Foundation for Statistical Computing) and IBM SPSS Statistics (version 20).

Results

The characteristics of the study populations are listed in Table 1.

Table 1 Characteristics of the study population

Self-assessed vulnerability to infections

The evaluation of the internal consistency of the five-item instrument of self-assessed vulnerability to infections yielded a Cronbach’s α of 0.78. Removal of each item one at a time resulted in a decrease of the measure (ranging between 0.72 and 0.76), indicating that each item contributed well to the topic. The factor analysis yielded evidence of one factor (based on Eigenvalues > 1, see distribution in Fig. 1a, where higher values of the score indicate higher vulnerability to infections). All participants who were classified as “less prone to infections” reported to have different infectious diseases far less frequently than their peers (Table 2, second column). The two other groups were more heterogeneous in terms of self-compared vulnerability (Table 2, third and fourth columns). However, about 20 % of the participants in the “more prone to infections” group stated in agreement with their classification to have infections of bladder or kidney more often than their peers. Similarly, about one third of participants in the same group stated to have infections of the upper respiratory tract more often than their peers. Participants in the “less prone to infections” group were less likely to report infections in the past 12 months [IN1–7; exemplified by upper respiratory tract (Fig. 2a) and gastrointestinal tract (Fig. 2b) infections] than participants in the “more prone to infections” group. The reported frequencies of infections by participants who were classified as “similarly prone to infections” covered all frequency categories (see Fig. 2). The Spearman’s correlation of the two scores (self-reported frequencies and self-assessed vulnerability) yielded a value of 0.50 (p < 0.0001).

Table 2 Variables used to create the score of self-assessed vulnerability to infections compared to peersa
Fig. 1
figure 1

Distribution of a the self-assessed infection vulnerability score and b the score of self-reported frequency of infections. The scores were derived by using factor analysis (see the Methods section)

Fig. 2
figure 2

Relative self-reported frequency in the last 12 months of a upper respiratory tract and b gastrointestinal tract infections according to the three groups of self-assessed infection vulnerability

Reliability study

Of 146 participants for whom both questionnaires were available, 88 could be included in the analysis (Fig. 3). The proportions of men and women and the age distribution (10-year groups) did not depart significantly from the intended proportions within the GNC of 50 % (p = 0.18) or 10 %–10 %–26.7 %–26.7 %–26.7 % (see [8], p = 0.25), respectively.

Fig. 3
figure 3

Flow chart of the reliability study

Median duration of completing the first questionnaire was 7 min [interquartile range (IQR) 5–10, range 2–59, n = 87], the second questionnaire took 50 % of participants 5 min (IQR 4–7, range 2–40, n = 87). The median time interval between filling in both questionnaires was 8 days (IQR 7–10.5, n = 88). The weighted κ between answers of the first and the second questionnaire ranged between 0.65 and 0.87 for the items on infectious disease frequency in the last 12 months, for items on symptom occurrence in the past 12 months between 0.77 and 0.90, and for items on vulnerability compared to peers between 0.68 and 0.76. Antibiotic intake and influenza vaccination had κ values of 0.78 and 0.84, respectively. All percentage agreement, weighted/unweighted κ and Bennet’s S values are presented in Table 3. In addition to κ, the calculation of Krippendorf’s α for data on a nominal scale yielded identical values as the non-weighted κ (data not shown).

Table 3 Agreement of questionnaire one and two with weighted κ values for ordinal scales answers and unweighted if answers are on or interpreted as nominal scale (inclusion of “I don’t know” and missing values). Participants with a relevant disease episode between questionnaire one and two are excluded for disease- and symptom-based items as well as antibiotic intake

Discussion

Self-assessed vulnerability to infections

We evaluated questions on self-assessed vulnerability to infections and demonstrated that they were reliable; the instrument also showed a high internal consistency. Furthermore, we observed a moderate correlation between self-reported frequency of infections and self-assessed vulnerability to infections. Research has shown that self-assessment of health status may be a valid measure of a respondent’s objective health status. For example, a single-item global self-rated health measure is a widely used instrument and has found application in many studies [911]. Advantages such as simplicity and ease of administration explain this broad usage of such instruments. The above mentioned single-item measure of global self-rated health has also been shown to predict morbidity and mortality [12, 13]. To our knowledge, no studies have examined questions on self-assessed vulnerability with regard to common infections/infectious diseases. The five-item instrument on self-assessed vulnerability to infections seems to be a promising instrument. However, further evaluation of self-assessed infections is required to validate this instrument. For example, self-assessed vulnerability to infections presented here might reflect not only comparative susceptibility but also psychosocial conditions. Self-assessment of infection vulnerability may be influenced by subjective factors. Thus, this evaluation of infection vulnerability based on self-assessed questions alone should be treated with caution.

Reliability

We conducted a test–retest reliability study on an infectious disease questionnaire in Pretest 2 of the GNC in the Study Centre Hanover. Eighty-eight of Pretest 2 participants (random sample) could be included in our analysis. Cohen’s κ was calculated as the primary measure of agreement. To account for sampling, 95 % confidence intervals based on bootstrap estimation are shown indicating some relevant uncertainty of point estimates for a number of items (e.g. skin infections).

The short median duration of 7 or 5 min, respectively, for filling in the questionnaire supports the feasibility of its application. A process of habituation and learning cannot be ruled out since filling it in for the second time was generally faster. Non-transparent behavior like having a copy of the first questionnaire at home and using it as reference might influence duration and reliability measures as well. Yet in principle, the time interval in our study between the two copies of the questionnaire should suffice to avoid memory effects in order to ensure independent ratings on one hand and to guarantee stability of evaluated items on the other hand.

Our test–retest reliability study indicates that in general the questionnaire on self-reported infectious diseases is reliable. In order to evaluate agreement based on calculation of κ, the following categories are often used [4]: ≤ 0 (two items) poor, 0.01–0.2 slight (one item), 0.21–0.4 fair (no item), 0.41–0.60 moderate (two items), 0.61–0.80 substantial (12 items), and 0.81–1 almost perfect agreement (six items), choosing the higher κ if two are available for a given item.

Since linear weighting leads to more conservative κ estimates than quadratic weighting [14], the presented values for κ might underestimate agreement of questionnaire one and two on ordinal scales. By excluding observations with missing values for a given item in both questionnaires for calculating unweighted, i.e. nominal, κ this might underestimate the true agreement beyond chance as well.

κ depends not only on subjects’ agreement per se but also on frequency of categories and distribution of agreement and disagreement [4]. These characteristics mean that interpreting κ is not straight forward [4, 15]. To contextualise κ, percentage agreement is reported as well. This makes it possible to account for situations in which κ is low despite high percentage agreement due to the distribution of classifications, a situation called one of the intrinsic paradoxes of κ [4]. Problems due to distribution of marginals are generally part of our data since most mentioned diseases/symptoms/conditions do not occur evenly distributed over answer categories in the population. This might particularly apply to IN1b (hospitalisation for URT infection, κ = 0, percentage agreement 97.3 %), IN2b (hospitalisation for LRT infection, κ = 0.16, percentage agreement 89.0 %), and IN3b (hospitalisation for GIT infection κ = −0.02, percentage agreement 94.5 %), explaining the discrepancy between high percentage agreement and poor κ, and indicating that κ might be spuriously low in these cases. For IN2b, κ increases from 0.16 to 1.0 and from 89 % crude agreement to 100 % if observations with missing values for this item are excluded (left n = 65). This shows how slight changes in analysis strategy could influence κ substantially.

An alternative reliability coefficient, Krippendorf’s α, a very flexible measure of disagreement [16], adds no further information to the calculation of κ resulting in identical point estimates. A further addition to Cohen’s κ, Bennett’s S, can be regarded as generalisation of Byrt’s prevalence-adjusted bias-adjusted k (PABAK) [17]. Since S remodels the observed agreement [7] the results relate closely to the reported percentage agreement in Table 3 and adds to the notion of a spuriously low κ in the aforementioned cases.

In summary, despite a methodologically conservative approach the overall reliability of the infectious disease questionnaire using the answer categories on an ordinal scale or on a nominal scale and including “I don’t know” and missing values as separate categories can be interpreted as “moderate” to “very good” if κ and percentage agreement or Bennet’s S are both taken into account. Thus, measurement error and uncertainty of subjects’ own classification should be reasonably low.

Conclusion

A five-item instrument on self-assessed vulnerability to infections seems to be promising. However, further evaluation of the instrument regarding, e.g. psychosocial influences, is needed. Thus, evaluation of infection vulnerability based on self-assessed questions alone should be treated with caution. The questionnaire on self-reported infectious diseases used in Pretest 2 of the GNC is a moderately reliable instrument and thus can be applied in future studies on infectious diseases.

Lessons learned for the main recruitment phase of the GNC:

A modified version of the infectious disease questionnaire will be used on Level 1 of the GNC.