Participants
A macular edema is an accumulation of fluid in the macula impairing vision and can lead to severe visual impairment. It is a heterogeneous disease with different possible etiologies. This study focuses on two different etiologies: diabetes and retinal vein occlusion, the most common retinal vascular diseases [
23]. Diabetic macular edema represents the most prevalent subtype, affecting an estimated 5.5% of the global diabetic population [
24]. While diabetic macular edema develops slowly and occurs mostly bilateral, the onset of retinal vein occlusion is sudden and most frequently only one eye is affected. Most patients with a macular edema are treated with intravitreal injections, which means monthly visits at the hospital. Macular edemas due to diabetes and retinal vein occlusion can also be treated by patients’ health behavior. Both types of macular edema have similar risk factors and are due to vascular risk factors.
This study enrolled patients with center-involving macular edema resulting from diabetes or retinal vein occlusion. They had to be aged 18 or older, speak German well enough for questionnaire comprehension, and possess hearing abilities for verbal communication. Exclusion criteria involved cognitive impairment.
Data collection
After signing informed consent, each patient enrolled was asked to complete the questionnaires once. This took place in-between medical routine eye-examinations. Because of these examinations patients could not read questionnaires by themselves. Thus, questionnaires were read out loud by one of four trained interviewers (MH, VW, AK, JG) using a standard procedure to guarantee objectivity and comparability of each interview. The training included practicing the interview procedure and familiarizing with outcome measures. Possible barriers and difficulties encountered during interviews were discussed, and appropriate behavior rehearsed. For instance, guiding patients to respond using a scale rather than providing a narrative or story was practiced, without influencing their answers towards any particular category. Possible answers were printed out in big font and located in front of patients. Participants were pseudonymized with the web-based pseudonymization tool ‘iPSN’ [
25]. LimeSurvey [
26] was used to gather and store answers of participants with their pseudonym.
Outcome measures
The PAM® survey [
13] assesses knowledge, skills, and confidence of patients in managing their own health. The items are rated on a Likert-type scale with five response categories from “Disagree strongly” to “Agree strongly” and “Not applicable”. Answers to the 13 items are summed up and transformed to a scale between 0 and 100 [
13]. The German version of the PAM® survey (PAM-13D) [
5] has been a reliable and valid questionnaire, showing a Cronbach’s
α of 0.84, factorial structure and a trait–trait correlation of
r = 0.43 between the score of the PAM-13D and general self-efficacy [
5].
Furthermore, the trait well-being inventory mood level scale [
27] was used to measure general mood and general quality of life. Cronbach’s
α is 0.83 for the scale general mood and 0.87 for the scale general quality of life [
27]. Quality of life was conceptualized as an overall measure of life satisfaction, focusing on the cognitive dimension of subjective well-being encompassing beliefs about the present, past, and future, rather than specific aspects of life satisfaction in distinct domains. Strong positive correlations with life-satisfaction indicated construct validity [
27]. Moreover, we assessed subjective belief to successfully cope with new demanding situations by own strength by the general self-efficacy scale [
28]. In several German samples, a Cronbach’s
α between 0.80 and 0.90 was found [
28]. Validity is given by correlations of self-efficacy with various other constructs, such as negative correlations with depression, anxiety, and burnout. For these questionnaires used in the study, mean score over item answers were built. Additionally, self-perceived health status was rated in five categories (“Very bad”, “Bad”, “Moderate”, “Good”, “Very good”) [
29]. Moreover, demographic data like sex, age, and education were assessed. Net income was investigated in five categories, corresponding to the income quintiles for elderly in Austria (
www.statistik.at).
Data analysis
To achieve the study objective of investigating the psychometric properties of the PAM, a minimum of 500 patients were required to obtain stable estimates [
30], further information are included in the Supplementary Files.
Categorical data are presented as absolute and relative frequencies, continuous data as means and standard deviations or medians and interquartile ranges, as appropriate. To gain the PAM® score, the answer category “Not applicable” was transformed into missing values and the raw scores of the PAM-13D were summed up and transferred to a scale between 0 and 100, according to the algorithm by Insignia Health, the company licensing the questionnaire (
https://www.insigniahealth.com/products/pam).
To analyze the PAM-13D, we used item response theory (IRT), a set of statistical models describing the relationship between questionnaire items and person ability. Ability is observed through the answers given to questionnaire items. With higher person ability, higher categories are chosen (e.g., “Agree strongly”). Firstly, assumptions of IRT analysis were examined. The generalized partial credit model (GPCM) was chosen out of different IRT models based on fit-indices, LR-tests and Vuong tests (see Supplementary Table 1). This model estimates two different parameters for each item: item difficulty and item discrimination. Item difficulty describes how easy persons agree with the item. Item discrimination describes how well an item distinguishes persons with high and low ability.
To evaluate model fit, we used root mean square error of approximation (RMSEA), standardized root mean square residual (SRMSR), Tucker–Lewis index (TLI), comparative fit index (CFI). Good fit was defined as a RMSEA < 0.05, SRMSR ≤ 0.08 and > 0.9 for TLI and CFI [
31]. Sample adjusted Bayesian information criterion (SABIC) and Akaike information criterion corrected (AIC
c) were examined as well, smaller values indicate a better model fit. We used infit and outfit statistics to evaluate item to model fit. The range of 0.5–1.5 is efficient for measurement, while the area between − 1.9 and 1.9 describes reasonable predictability. Values ≤ − 2 indicate data are too predictable [
32]. Moreover, the relationship of choosing between answer categories of items and ability of patients is shown in a wright map. We used a test information curve to show the amount of ability measured over the ability range and to estimate the standard error. The GPCM was used to calculate the number of empirical distinguishable groups by the separation index [
33]. To ensure answer behavior was not influenced by interviewers, differential item functioning (DIF) was assessed. The likelihood-ratio
χ2 test was used to detect DIF. McFadden’s pseudo-
R2 and non-compensatory differential item functioning (NCDIF) were used as a measure of DIF magnitude.
Furthermore, floor and ceiling effects were defined as > 15% of answers in the lowest or highest answer category, respectively [
34]. Additionally, item difficulty was investigated according to classical test theory (CTT), represented as item mean. For assessing CTT reliability, we used Cronbach’s
α (inner consistency). A value above 0.7 indicates acceptable reliability [
35]. Indications for construct validity were gained through trait–trait correlations between the PAM® score and other questionnaires. It was expected that patient activation would be moderately negatively associated with self-related health status [
36‐
38], moderately positively associated with self-efficacy [
5], quality of life [
9,
39,
40], and general mood [
9,
41,
42], and weakly correlated with perceived social support [
42,
43]. Correlations coefficients were judged as small if > 0.10, as medium if > 0.30 and as large if > 0.50 [
44]. Group differences were evaluated by means of an Analysis of Variance. In case the overall comparison was significant, Tukey's HSD test was used for specific group differences.
Statistical analysis was performed using R studio version 4.1.1 [
45] using the packages mirt [
46] and lordif [
47].