Participants
All patients receiving HD and CAPD were recruited to participate in the study. The inclusion criteria of the participants were patients aged ≥ 18 years old, diagnosed with end-stage renal disease and undergoing HD or CAPD treatment for at least three months, able to speak Indonesian, and agreeing to participate in the study. The exclusion criteria were patients with mental illness or cognitive impairment.
The minimum sample size was determined based on a number of guidelines. A sample of 100 subjects in each HD and CAPD group was able to detect a statistically significant difference between groups by independent
t-tests based on 80% power (
p = 0.05, two-tailed) with Cohen’s effect size of 0.4 [
17]. In order to confirm the structural validity using confirmatory factor analysis, a minimum sample size of 315 participants (with missing data) or 265 participants (without missing data) was needed for a power of 0.80 [
18]. Another consideration, based on the ratio of the number of items and participants to perform factor analysis, the minimum number of participants required to validate KDQOL-36 with 36 items was 360 participants (the number of items multiplied by 10) [
19]. Therefore, the minimum number of participants in the study was 360 participants and each HD and CAPD group had to have at least 100 participants.
Study procedure
The standard translation process consists of translation, pilot-testing, and psychometric analysis to estimate the validity and reliability [
21,
22]. In order to conform to this standard, before the psychometric analysis of KDQOL-36 Bahasa Indonesia, recommendations suggested the involvement of two to six experts in the pilot-testing process [
23]. In this study, five experts (consisting of two nephrologists, an academician experienced in the validation of instruments, and two dialysis nurses) assessed the clarity of each item of KDQOL-36 Bahasa Indonesia. Clarity means that items can be clearly described without confusion [
24,
25]. The clarity scale was “clear” and “not clear”. If an expert stated that an item was not clear, additional recommendations by the expert were required.
After this step, interviews were conducted with ten patients undergoing dialysis with different education levels, balanced for the number of HD/CAPD patients and age to assess the clarity and interpretation of each item. The participants were asked whether they could understand each item and explain the meaning of each item using their own words [
26]. Based on pilot testing from experts and patients, three items were revised, namely item number 18 (from “Sakit dada?” to “Nyeri dada?”), item number 28b (from “Masalah dengan jalur/tempat masuknya kateter Anda?” to “Masalah di sekitar perut Anda tempat masuknya kateter?”), and item number 35 (from “Kehidupan hubungan intim Anda?” to “Aktivitas seks Anda?”).
Measurement of psychometric properties was conducted by distributing the instrument to at least 360 participants to assess the validity and reliability (internal consistency) of KDQOL-36 Bahasa Indonesia in three hospitals. Test–retest reliability was also conducted to assess reliability by repeating the measurement process on the same subjects after 2 weeks in at least 30 patients [
17,
27].
Statistical analysis
Descriptive statistics were used to compare the socio-demographic characteristics of patients on HD and CAPD. Differences in characteristics between groups were tested using the χ2 test for categorical variables, independent t-tests were used for continuous variables with normal distribution, or Mann–Whitney tests for not-normally distributed continuous variables.
The KDQOL™-36 scoring program (v.20) was used for scoring PCS, MCS, and kidney-specific domains (burden, symptoms, and effects of kidney disease). The KDQOL™-36 scoring program (v.20) is designed as an Excel spreadsheet, consisting of five sheets: Raw, Convert, Score, Scale, and Stats, developed by RAND Health Care, while the copyright was owned by UCLA Division of General Internal Medicine and Health Services Research [
12].
The validity was assessed by structural, convergent, and known-group validity. A confirmatory factor analysis was used to confirm the structural validity, and model fit was determined based on the model’s Chi-squared statistic (
χ2), the root mean square error of approximation (RMSEA), the comparative fit index (CFI) and Tucker-Lewis index (TLI). A non-significant Chi-squared statistic, lower value of RMSEA, higher CFI and TLI indicate better goodness-of-fit. Confirmatory factor analysis indicated acceptable fit if the Chi-squared statistic was non-significant, RMSEA < 0.07 (sample size more than 250 participants), CFI and TLI > 0.95 [
28]. Nonetheless, when sample size is large enough, the Chi-squared statistic is likely to be significant and leads to the rejection of models even when the residuals are very small and the model has good model fit.
The KDQOL-36 items have ordered categorical responses; therefore, confirmatory factor analysis was evaluated using the diagonally weighted least squares estimator. The analysis was conducted using the lavaan package in R [
29]. The generic and kidney-specific disease domains were analyzed separately in confirmatory factor analysis. Based on the previous publications, the generic domains of KDQOL-36 have a good fit for two latent variables (PCS and MCS) [
30,
31], while kidney-specific disease domains have three latent variables (burden, symptoms, and effects of kidney disease) [
17,
32]. Each latent variable was allowed to correlate with one another. Variances for latent variables were set to one, while loading factors on other domains were fixed to zero (Supplements 1 and 2). The results were reported based on standardized parameter estimates.
Exploratory factor analysis of kidney-specific domains was also carried out. A loading factor of > 0.4 indicates a good relationship between an item and the underlying factor [
19], while a loading factor in the range of 0.30–0.40 meets the minimal level for interpretation of structure [
28]. Exploratory factor analysis was conducted using the psych package in R, and the weighted least squared and polychoric correlations were used to estimate exploratory factor analysis [
33]. The number of factors to be extracted was determined using the parallel analysis (Supplement 3).
The convergent validity was assessed using Pearson’s correlation. Since both the kidney-specific domains, generic domains, and EQ-5D measure different aspects of HRQOL, we hypothesized that the correlations would be positive and weak to moderate. The EQ-5D index score was calculated using the Indonesian value set [
34]. The correlation was classified as very weak (< 0.20), weak (0.20–0.39), moderate (0.40–0.59), strong (0.60–0.79), and very strong (> 0.80) [
35]. Known-group validity was assessed by comparing scores on generic and kidney-specific domains between subgroups based on dialysis type (patients undergoing CAPD were hypothesized to have better HRQOL than HD), and whether the patient had diabetes (patients with diabetes were hypothesized to have lower HRQOL than patients without diabetes) [
36]. The effect sizes were calculated and classified according to Cohen as small (0.2), medium (0.5), or large (0.8) [
37].
Reliability was assessed using the test–retest reliability and internal consistency [
17]. Test–retest reliability was assessed using intraclass correlation coefficients (ICC), and ICC should be reported including the following items: model, type, and definition selections [
38]. In this study, ICC was measured based on the test–retest method, so ICC was calculated using a two-way mixed-effects model, single rater, and absolute agreement. An ICC value between 0.5 and 0.75 is considered as moderate and 0.75–0.9 as good [
38]. The difference between the baseline and two-week retest was assessed using paired
t-tests. A domain with a Cronbach’s alpha value ≥ 0.7 indicates acceptable internal consistency [
19]. The Cronbach’s alpha values were not calculated for PCS and MCS due to the nature of scoring for SF-12 and items with different level options [
32].
Besides a Cronbach’s alpha, McDonald’s omega hierarchical (
ωh) and total (
ωt) were reported to estimate internal consistency. Omega was estimated using the psych package in R [
39]. Although there is no generally accepted guideline to determine the minimum levels of omega for clinical decision-making [
40],
ωt value should meet the same criteria as Cronbach’s alpha standard (≥ 0.7). Similarly,
ωh value should be at least 0.50 but 0.8 would be preferred [
40,
41]. The main benefit of using omega over Cronbach’s alpha is that omega is estimated within a factorial model and represents more realistic assumptions [
42].
Percentages of ceiling and floor effects were assessed. Ceiling effects are estimated as being the percentage of respondents with scores of 100, while floor effects are the percentage of respondents having a score of 0. Ceiling and floor effects should be less than 20% to ensure that the scale captures the full range of potential responses within the population, and that changes over time can be detected [
43].
All statistical analysis was performed in SPSS Version 26.0, except for factor analysis and omega estimation, which used R. A p-value lower than 0.05 was considered a significant difference.