Psychometric evaluation of the Nocturia Impact (NI) Diary was conducted to support its use as a trial endpoint.
As part of a randomized, controlled Phase 2 clinical trial investigating a novel drug candidate for nocturnal polyuria, adult nocturia patients completed the NI Diary and a voiding diary for three nights preceding their clinic visit at Baseline and Weeks 1, 4, 8, and 12 (end of treatment). Exit interviews were conducted to obtain patient impressions of the NI Diary.
A total of N = 302 participants were included. Confirmatory factor analysis (CFA) indicated that the 11-item measure is unidimensional with values of CFI, TLI, and RMSEA meeting relevant thresholds. Good internal consistency (Cronbach’s α 0.941) and test–retest reliability (intra-class correlation coefficients 0.730–0.880). Convergent validity with two reference measures was demonstrated with strong correlations of 0.573–0.730 were shown. Significant differences (P = 0.0018, standardized effect size = 0.372) between groups defined by number of night-time voids supported known-groups validity. Exit interviews in 66 patients indicated all participants experienced improvement in at least 1 NI Diary item and that a 1-point improvement on the item response scale and 1-void reduction per night (associated with an average best cut point on ROC analysis of − 11.6) constituted meaningful improvement. Anchor and distribution-based analyses identified a meaningful change threshold of − 15 to − 18 points on the NI Diary.
The NI Diary is a reliable and valid patient-reported psychometric instrument which is fit-for-purpose to evaluate the impact of nocturia on patient quality of life in the clinical trial setting.
Trial registration number and registration date NCT03201419; June 28, 2017.
The online version contains supplementary material available at https://doi.org/10.1007/s11136-021-03060-4.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Plain English summary
Waking up in the middle of the night to urinate is a condition called nocturia and can be very bothersome. To get an idea of how much of an impact this condition has on a person’s life, we tested a NI Diary where patients with this condition can answer a series of 12 questions pertaining to how this condition affects them. During a 12-week clinical trial in which participants received a novel drug candidate, participants were asked on five occasions to answer these questions every evening for three consecutive nights preceding the visit to the clinic. A number of measurement tests were conducted on the diary to ensure it reliably assesses the severity of nocturia and its impact on quality of life (QoL) in the patient population. In exit interviews, participants expressed support for the usefulness of this Diary to reflect their views. This Diary may become a valuable tool for use in clinical trials and real-world studies.
Nocturia, or waking to pass urine during the main sleep period , is a highly prevalent lower urinary tract syndrome affecting men and women of all ages, with higher rates in older populations [2, 3]. Although nocturia can have multiple causes, the most common is nocturnal polyuria—overproduction of urine at night . Lifestyle modifications are the first intervention for the management of nocturia but as symptoms progress, such measures may be inadequate, and pharmacotherapy warranted [3, 5‐10]. Nocturia has a pronounced negative impact on patient QoL [8, 11‐13] and is associated with reduced work productivity, more frequent physician visits, socioeconomic burden [5, 6, 10, 12, 14], sleep impairment [15, 16], higher risk of falls and fractures, depression, and increased mortality [3, 17, 18]. There has, however, been an unmet need for a validated, reliable, and specific patient-reported instrument to assess the impact of nocturia on patient QoL. The most frequently used symptom-specific nocturia questionnaire, the Nocturia QoL (N-QoL) was validated only in males , with the content validity reexamined subsequently . However, the measure did not meet the Food and Drug Administration (FDA) 2009 guidance  for content validity and the recall period (14 days or 1 month) was considered too long for a fluctuating disease . To provide a more acceptable patient-reported outcome (PRO) measure for use in clinical trials, a 12-item Nocturia Impact (NI) Diary  was developed in dialogue with the FDA to measure the daily symptom impact of nocturia, to be used in conjunction with a nocturnal voiding diary. The NI Diary has 11 core items assessing impacts such as sleep disturbance, emotional disturbance, and fatigue, and a single overall QoL item. An earlier study with a small number of patients supported its psychometric properties . The current study extends this work, investigating the reliability, validity, and interpretability of the NI Diary in a larger sample, using a range of evaluations (see Fig. 1).
A randomized, double-blind, placebo-controlled, multicenter Phase 2 clinical trial (NCT03201419; DAWN)  of patients with nocturia was conducted to investigate the safety and efficacy of a novel drug for nocturnal polyuria (Fig. S1 of the Online Resource). The current study is an independent, treatment-agnostic psychometric evaluation of the NI Diary performed to support the interpretation of the NI Diary as an endpoint in this trial. Patients completed the NI Diary and the nocturnal voiding diary for three nights preceding each visit at the clinic at Baseline, Week 1, 4, 8, and 12 (end of treatment).
Participants for this analysis were from the intent-to-treat (ITT) population from the trial and had completed the NI Diary at baseline. The sample size determination for the clinical trial was based on different dose–response scenarios indicating a required range of 60–75 patients per arm to achieve 80% to 90% power for the primary endpoint (reduction in nocturnal voids). The sample size of 302 patients exceeds the conservative minimum sample size of 10 patients per item for factor analysis , as well as providing sufficient power (80%) to detect, at two-sided P < .05, typical psychometric endpoints [25‐29]. See the Online Resource for full details, including inclusion/exclusion criteria and ethics approval.
The NI Diary© is a 12-item questionnaire with 11 core items and a single overall QoL impact question (Q12) that assesses the daily symptom impact of nocturia . The NI Diary was completed in the evenings with the recall periods “thinking over the day” (items 1–6), “thinking about last night” (items 7–8) and “overall” (items 10–12). Each item is rated on a 5-point response scale from 0 to 5 (“not at all”; “slightly”; “moderately”; “quite a bit”; “a great deal”). Q12 of the NI Diary, evaluating the overall impact of nocturia, is used separately. The NI Diary total score, the sum of questions 1 to 11, has a range of 0 (lowest severity) to 44 (greatest severity). Both the total score and Q12 were transformed to a 0–100 scale. The total score was computed only if all items were answered, otherwise, it was defined as missing. Missing values were not imputed. For the purposes of this analysis, the total scores at each timepoint were averaged over the three nights, except for assessing quality of completion and confirmatory factor analysis (CFA).
Night-time voiding diary
The night-time voiding diary required participants to record the time of sleep, any awakenings for voiding, and the number of voids. The number of voids recorded over the three nights before the clinic visit was averaged to use in all reported analyses.
Patient Global Impression (PGI): Severity and Improvement
The PGI-Severity (PGI-S) is a patient rating of their current severity of nocturia reported as “none (1)”; “mild (2)”; “moderate (3)”; or “severe (4).” PGI-Improvement (PGI-I) provides a patient-rated summary of change in nocturia since starting study treatment reported as “very much better”; “much better”; “a little better”; “no change”; “a little worse”; “much worse”; or “very much worse”; coded 1 to 7, with higher scores reflecting poorer condition . Full question details are provided in the Online Resource.
Exit interviews in 66 patients were conducted by trained interviewers and consisted of 4 parts discussing: (1) experience of living with nocturia and its impacts; (2) the NI Diary and what constitutes meaningful change (assessed in terms of interpreting the response scale by item); (3) change in nocturnal voids and what constitutes meaningful change; (4) completion of PGI-S and PGI-I questions. Additional information, including sample size determination are in the Online Resource.
Quality of completion
The percentage of completion of the NI Diary items and the total score was described for the three nights preceding a clinic visit.
Item distribution and floor and ceiling effects
For each NI Diary item at each timepoint, the frequency and percentage of endorsements were presented for each response option. Floor effects (worst possible score on the scale) and/or ceiling effects (best possible score on the scale) were benchmarked at 20%.
Inter-item correlations and item-total correlations
Inter-item Spearman’s ρ correlations and corrected item-total polyserial correlations were calculated for NI Diary items at Baseline; the threshold of acceptable internal consistency set at ≥ 0.40 to ≤ 0.90 for inter-item correlations, and ≥ 0.40 for item-total correlations .
Item discrimination indices and curves
Item discrimination indices and curves were produced for each NI Diary item. The item discrimination index (calculation described in Online Resource) is a measure of how well an item differentiates between levels of severity, or in the case of the NI Diary, levels of impact. The discrimination index ranges from + 1 to − 1, with acceptable ranges > 0.60. The curves are presented for each response option with the percentage of participants choosing each option (y-axis) plotted against NI Diary total scores (x-axis).
Confirmatory factor analysis
CFA was conducted to test if the data support unidimensionality of the 11-item measure (item Q12, assessing the global QoL, is scored separately in accordance with the theoretical model)  Baseline data collected at Night 1 were used. CFA with weighted least square mean and variance estimators designed to handle ordinal data were computed and evaluated based on pre-defined thresholds considered to indicate close model fit: root mean square error of approximation (RMSEA) “poor” ≥ 0.113, “mediocre” = 0.094–0.113, “fair” = 0.066–0.094, “close” = 0.032–0.066, “excellent” ≤ 0.032 (because the RMSEA is interpreted as “the lower value, the better”, one only needs to consider the upper bound of the 90% CI); comparative fit index (CFI) of ≥ 0.95; Tucker–Lewis Index (TLI) of ≥ 0.95; and a standardized root mean residual (SRMR) of ≤ 0.08 . Additionally, modification indices (MIs), quantified as the decrease in the χ2 value, indicated how model fit could be improved.
Internal consistency reliability of the NI Diary (Cronbach’s α coefficient) was evaluated using Baseline data. Values > 0.70 are considered to be indicative of adequate internal consistency .
Test–retest reliability was assessed using the Shrout–Fleiss intra-class correlation coefficient (ICC2,1)  (see Online Resource). An ICC of ≥ 0.70 is considered to be indicative of acceptable test–retest reliability [30, 36, 37]. Test–retest reliability was computed for the three sub-samples of patients showing little or no change between Baseline and Week 1 (see Online Resource).
Convergent validity was assessed at Baseline in terms of Spearman’s correlations between the NI Diary and reference measures of the Insomnia Severity Index (ISI)  and bother of night-time urination frequency , with low convergent validity indicated if the coefficient is < 0.4, moderate if ≥ 0.4 to 0.7, and large if ≥ 0.7 [36, 37, 40]. Moderate-to-strong correlations between nocturia and sleep deficiency were hypothesized.
Construct validity was evaluated using the known-groups method. NI Diary scores at Baseline were compared among groups of participants differing on the number of nocturnal voids per night (0 to < 3 voids versus ≥ 3 voids) , using grouped t-tests. The extent of known-groups validity was considered by considering the extent or magnitude of the differences, using between-group effect size (ES) estimates, alongside the statistical significance of the difference in NI Diary mean scores (2-tailed P-value of < .05).
Interpretation of scores: meaningful change threshold (MCT)
The MCT on a PRO is the within-patient change in scores associated with what a patient perceives as a meaningful treatment benefit [41, 42]. The MCT was estimated using the pooled, treatment-agnostic, blinded data. Both distribution and anchor-based methods were used, with multiple anchor-based analytic methods utilized across five selected anchors (see Online Resource). As is standard practice , results were triangulated across the various methods, including the findings from the exit interviews, to arrive at an estimate(s) of MCT [43, 44].
The change in the 11-item NI Diary score was calculated from Baseline to Week 12. Potential anchors, also measured as the change to Week 12, were: PGI-I, PGI-S, NI Diary Q12, the number of nocturnal voids, and PGI-I exit interview improvement . Only anchors correlating with the change in NI Diary score above the 0.35 threshold were used in the analyses [44, 45]. A detailed description of change category derivation for each anchor is included in the Online Resource. Paired sample t-tests were used to evaluate the within-subject differences in NI Diary change scores between Baseline and Week 12 within each category [40, 43, 46], with the uncertainty in the estimate of mean change within each group captured by the 95% CI. The within-subject changes were expressed as standardized ES (SES) and interpreted based on Cohen’s recommendations: small change (SES = 0.20), moderate change (SES = 0.50), and large change (SES = 0.80) [45, 47].
Cumulative distribution function (CDF) curves
CDF curves of the change in NI Diary scores from Baseline to Week 12 presented NI Diary change within each anchor category. Absolute change from Baseline in NI Diary total score was expressed on the x-axis, and percentage of patients with a value at least equal to that value on the y-axis. Adequate separation between no change and “improved” categories was considered to indicate meaningfulness of the “improved” category.
Receiver operating characteristic (ROC) curves
ROC curves were an additional anchor-based approach used to determine the best cut point (BCP) in NI Diary change score (from Baseline to Week 12) for identifying participants who reported an average reduction of nocturnal voids of ≥ 0.5, ≥ 1, ≥ 1.5, and ≥ 2.5 during the 12-week period; the BCP was expected to increase the greater the number of nocturnal voids. The main criterion used to identify the BCP was the distance to the 0, 1 point (d(0,)), although an average across the cut points from three criteria (including sensitivity minus specificity and Youden’s Index) was also taken.
A distribution-based approach for defining changes beyond measurement error was used to support the MCT estimated using the anchor-based approach. The estimated MCT must be greater than measurement error to rule out the possibility of participants being classified as a responder by chance [21, 42]. Distribution-based estimates were calculated as half the standard deviation (SD) at baseline and the standard error of measurement (SEM) (using Cronbach’s α as the reliability estimate), where SEM = SD √(1 − reliability) .
Participant baseline demographics
Participant demographics are shown in Table 1. The mean age of participants was 58.8 years, and a higher proportion were women (60% female). Most participants were white (88%) and non-Hispanic (65%).
Study population N = 302
Exit interview population N = 66
Sex, n (%)
Race, n (%)
American Indian or Alaska Native
Black or African American
Ethnicity, n (%)
Hispanic or Latino
Not Hispanic or Latino
Quality of completion
For all individual items no more than 11.8% of item responses were missing. Completion of all three diary nights was good at Baseline and Week 12 (n = 253/302 (84%) and n = 248/300 (83%), respectively). Few participants did not complete it at all (5 at Baseline and 4 at Week 12).
Floor and ceiling effects
Floor and ceiling effects at Baseline, Week 1, and Week 12 are shown in Table 2.
Floor and ceiling effects
Item 1: Difficult to concentrate
Item 2: Low in energy and/or tired
Item 3: Unable to be productive or complete daily activities
Item 4: Avoid participating in activities
Item 5: Irritable or moody
Item 6: Limit your fluid intake
Item 7: Lay awake after using the bathroom at night
Item 8: Worried about tripping or falling
Item 9: Got too little sleep
Item 10: Worry that the nocturia will get worse
Item 11: Concerned with where the bathroom is
Item 12: Does nocturia presently impact your life?
Item-total correlations and inter-item correlations
Corrected polyserial item-total correlations for the NI Diary total score ranged from 0.607 to 0.841 indicating good internal consistency. Inter-item Spearman’s correlations ranged from 0.427 to 0.844 at Baseline, demonstrating that NI Diary items shared enough variance to be considered to measure the same latent concept (NI) yet, with the lack of perfect correlation, assessing different aspects of this concept.
Item discrimination indices and curves
Discrimination indices for all items were close to or above the + 0.6 threshold, with a range of 0.535 (Item 6) to 0.915 (Item 9) indicating very good discrimination of all items. For most items, discrimination curves for all five response options differentiated well between different levels of severity (total scores). Figure 2 shows the Item 5 (irritable or moody) discrimination curve as an example; curves for other items are presented in Fig. S2 of the Online Resource.
Confirmatory factor analysis
The initial model with 1–11 items showed modest fit (Table 3). MIs suggested adding residual correlations between items 4 (avoided participating in activities) and 3 (unable to complete work and personal daily activities) and items 7 (lying awake after using the bathroom at night) and 9 (had too little sleep). After this adjustment (see Fig. 3) the model with 1–11 items shows excellent CFI, TLI, and fair RMSEA (with upper CI bordering mediocre fit). The good fit of this unidimensional model provided an additional support to the theoretical assumption ) for scoring items 1–11 separate from the item 12 assessing global QoL.
Fit indices for CFA model for NI Diary at Baseline (Night 1 data)
RMSEA (90% CI)
Range of standardized factor loadings
Unadjusted 11-item model
Adjusted 11-item model (with 2 residual correlations)
Cronbach’s α for the 11-item NI Diary was 0.941 notably greater than the 0.70 threshold. Additionally, the range of Cronbach’s α when a given item is removed ranged from 0.932 to 0.942 indicating that every item contributed to the high internal consistency.
The ICC (see Online Resource) for those who endorsed the “No change” response on the PGI-I at Week 1 (n = 33) was 0.880 [95% CI 0.777, 0.939]; for those who had no more than + / − 1 point of change between baseline and Week 1 on the PGI-S (n = 216) it was 0.730 [95% CI 0.661, 0.786]; and for those with no more than + / − 1 change in the average number of nocturnal voids between baseline and Week 1 (n = 125) it was 0.806 [95% CI 0.735, 0.859], indicating relatively high test–retest reliability of the NI Diary.
The baseline NI Diary demonstrated a high correlation with the baseline ISI (a measure assessing the severity of sleep-onset and sleep maintenance difficulties)  (Spearman’s ρ = 0.730), and a moderate correlation with the baseline bother rating of night-time urination frequency  (Spearman’s ρ = 0.587). The moderate-to-high correlation coefficients were as expected, confirming the convergent validity of the NI Diary.
NI Diary mean scores were significantly higher in the group with ≥ 3 versus the group with 0–2 nocturnal voids (49.6 vs. 41.5, respectively; P = .0018), with the SES of − 0.37 indicating a difference of moderate magnitude, those with a higher number of voids reporting higher scores, i.e., impact, on the NI Diary.
Interpretation of scores: MCT
Before entering the trial, more than half of participants (n = 39–62) reported experiencing each NI Diary item except item 8 ‘Worried about tripping or falling’ (n = 29). All participants reported improvement in at least one item of the NI Diary over the trial period. Fifty-three participants (80.3%) reported improvement in nocturnal urinations throughout the trial, none reported worsening, and 13 (19.7%) reported no change. Those reporting higher levels of improvement in the PGI-I experienced a greater reduction in nocturnal voids, with 81% of participants considering that a 1-point improvement on each NI Diary item response scale was meaningful. For instance, for the Tiredness question, one participant stated a 1-point difference means “Um, just that I'm getting more sleep and I'm not as tired”. A reduction of 1 void per night was considered to be meaningful (n = 30; 45.5%; see Online Resource Table S7). Across global rating responses (i.e., PGI-S, PGI-I) patients described the response categories to mean: “A little better” (sleeping more, less tired), “Much better” (Sleep more, less tired, mood improved, better concentration, work productivity better), and “Very much better” (Sleep more, less tired, mood improved, less impact on daily activities, better concentration, less avoidance of activities, easier falling back asleep, improved work productivity).
Correlation between the endpoint and anchors
The polyserial or Spearman’s correlation coefficients between change scores from Baseline to Week 12 for the NI Diary total score and the anchors were: (1) nocturnal voids, 0.389; (2) PGI-S, 0.669; (3) PGI-I, 0.639; (4) PGI-I Exit Interviews, 0.540; and (5) NI Diary Q12, 0.858, each greater than the benchmark value of 0.35 and thus all were used in the anchor-based analyses.
For each of the anchors, monotonic improvements in the mean change in NI Diary total scores were generally observed for each level of categorical improvement on the anchor (see Tables S1. S2, S3, S4, S5 of the Online Resource). The SES of change in the NI Diary total score for each of the “1-category” (or equivalent) change groups was > 0.50 for each anchor, indicating at least a moderate degree of change in this group (Table 4).
Within-subject change in NI Diary Total Score “No Change” and “1-Category”a Improvement Anchor Groups (Extracted Tables S3, S4, S5, S6, S7)
95% CI of Mean
SES of Changec
> − 0.5 to < 0.5
− 8.0 (14.62)
(− 14.65, − 1.34)
> − 1.5 to − 0.5
− 14.7 (21.22)
− 19.68, − 9.78
> − 2.5 to − 1.5
− 22.0 (24.22)
(− 27.31, − 16.74)
No Change (0)
− 6.0 (13.91)
(− 9.65, − 2.40)
1-Point Improvement (− 1)
− 17.4 (18.61)
(− 21.13, − 13.74)
− 1.5 (10.39)
(− 5.15, 2.22)
A Little Better
− 8.0 (15.34)
− 12.55, − 3.44
− 20.4 (20.92)
(− 25.23, − 15.54)
(− 5.80, 7.31)
A Little Better
− 8.2 (14.02)
− 18.21, 1.85
− 21.9 (16.61)
(− 30.40, − 13.32)
NI Diary Q12
No Change (0)
− 7.3 (11.32)
(− 9.55, − 5.01)
1-Category Improvement (− 1)
− 18.7 (13.81)
(− 22.06, − 15.33)
There was some degree of overlap in the 95% CIs for true mean change between the “1-category” and “no change” groups for the two anchors of change in nocturnal voids and PGI-I (the non-overlapping 95% CIs for the other anchors indicated that the groups were distinct). Consequently, both “1-category” and “2-category” change in these anchors were considered. These overlaps can, however, be explained by the “no change” nocturnal voids category including only 21 patients and the PGI-I anchors being limited by having no “moderately better” category.
The change in NI Diary total scores for the “1-category” change groups are summarized for each anchor in Table S6; the mean change scores range from − 8.0 (PGI-I) to − 18.7 (NI Diary Q12), and the median change scores from − 5.9 (PGI-I) to − 18.9 (NI Diary Q12). It is important to note that the exit interview patient reports of a reduction of 1 void per night being meaningful is consistent with the choice of the “1 category” − 0.5 to − 1.5 nocturnal void reduction category to indicate meaningful change, with a mean (median) NI Diary total score change of − 14.7 (− 10.6). The much larger mean (median) changes in the “Much better” category of − 20.4 (− 16.7) and − 21.9 (− 21.2) for the PGI-I and PGI-I Interview, respectively, versus those in the “A little better” category of − 8.0 (− 5.9) and − 8.2 (− 5.9), indicate that these values are likely to provide an overestimate of meaningful change. The mean NI Diary total score change across all 4 “A little better” and “Much better” PGI-I mean change values is − 15.0. The average 95% CI for true mean change across each anchor, within each “1-category” anchor change category, is − 8.0 to − 18.7.
Cumulative distribution function
A visual inspection of the CDF curves for each anchor revealed adequate separation between the “1-category” improvement category and the no change category for each anchor (Fig. S3 of the Online Resource), suggesting that the “1-category” improvement category is appropriate for assessing meaningful change. Maximum separation between the curves was achieved at NI Diary change scores of between approximately − 10 and − 20; generally, the median change within the “1-category” improvement group.
The findings from the ROC analyses were consistent with those from the other anchor-based methods, with the BCPs increasing the greater the average reduction of nocturnal voids. The BCP at d(0, 1) in the NI Diary change score for identifying participants who reported an average reduction of nocturnal voids of ≥ 0.5 was − 6.82; for ≥ 1.0 it was -9.47; for ≥ 1.5 it was − 17.4; and for ≥ 2.5 it was − 24.2. Given that the patients in the exit interviews reported that a reduction of 1 nocturnal void was meaningful, the ROC curve for identifying participants who reported an average reduction of nocturnal voids of at least 1.0 [BCP = − 9.47 for d(0, 1) and − 11.6 overall] is presented in Fig. S4 of the Online Resource.
Using NI Diary total scores at Baseline, the 0.5 SD value was 10.90 and SEM 5.30, these providing lower bound estimates for the MCT.
Triangulation of results across anchor- and distribution-based data and exit interviews
The findings from the exit interviews indicated that a 1-point improvement in each NI Diary item is considered meaningful to patients; in the 11-item scale this would equate to an overall change of 11 points. This is consistent with the distribution-based estimates, with the value of 11 being larger than both 10.90 (0.5 SD) and 5.30 (SEM) and thus above measurement error. In the exit interviews the patients reported that an improvement of 1 void per night was meaningful; the ROC BCPs linked to this level of improvement were − 9.47 and − 11.6. The BCP from a ROC analysis would be expected to provide a lower bound for the MCT as it is the value that best distinguishes those who improve from those who do not. These findings suggest that a minimum MCT in the range of 10–11 points would be most likely to identify patients who have experienced a meaningful improvement in their symptoms. The anchor-based within-category change data support these findings with the average mean change across all anchors in the “1-category” improvement group of − 14.0 points, ranging from − 8.0 in the “A little Better” PGI-I category to − 18.7 in the NI Diary Q12 (and the average 95% CI also being − 8.0 to − 18.7). Taking into account the maximum separation observed in the CDF curves between − 10 and − 20 and the non-overlapping CIs for the “no-change” and “1-category” improvement groups, a conservative reduction of 15 to 18 points was taken as the MCT (in line with the smallest median change score in the non-overlapping groups of − 14.8). Thus, taking a reduction of 15–18 points in the NI Diary total score as the MCT would be consistent with all the results presented, anchor- and distribution-based as well as the patient perspective provided in the exit interviews.
This study has provided additional psychometric evidence to support the validity and reliability of the NI Diary, together with an estimate of meaningful change, thus enhancing the interpretation of improvement on the NI Diary. The CFA supported the hypothesized unidimensionality of the 11-item NI Diary and the scoring algorithm. This was further evidenced by high internal consistency reliability of the measure and with inter-item correlations in the range 0.40–0.90 indicating that items were generally not redundant or overlapping. Item discrimination curves indicated response categories were adequately separated. A proposed MCT in the range of 15‒18 points for the standardized NI Diary total score was determined by triangulating information from the within-category change for all five anchors with the findings from the ROC analysis and distribution-based methods, together with findings from the exit interviews, and provides a conservative estimate of meaningful change.
All analyses were conducted following the FDA Guidance for development and validation of patient outcomes . However, a few limitations exist for the analyses presented. Incorporating post hoc correlated residuals in the CFA model (justified by similar item wording), nearly always improved model fit, but at the possible expense of generalizability of the model and with implications for the equal weighting of items within a sum score [49, 50]. When models are modified based on MIs (which often can be unstable), cross-validation of results is highly recommended in another sample to test validity of the modified model) [51, 52]. The limitation in this study stems from the lack of such cross-validation using a different sample. Within the Exit Interview, what constitutes meaningful change was only queried for the NI Diary and nocturnal voids, thus no claims about meaningfulness of change from the patient’s perspective can be made for the PGI-S or PGI-I categories of change, although those scales were debriefed with patients in work preceding the inclusion in the clinical trial.
Determining what constitutes a meaningful change on an instrument requires linking meaningfulness from the patient’s perspective with statistical determination of response thresholds that may be interpreted as a treatment benefit. This is the first psychometric validation and examination of response thresholds for the NI Diary using a mixed methods approach with clinical trial data. While there are benefits of applying multiple anchors and multiple analytic methods, there are no clear and concise guidelines for how to interpret these results and determine a threshold, especially if threshold values vary between anchors. Moreover, the thresholds are sample dependent and thus require further validation using comparable datasets.
Despite these limitations, this research presents parameters for interpreting the scores in the nocturia patient population. Exit interviews demonstrated that patient impressions on the NI Diary were in alignment with quantitative psychometric data, thus providing support for the use of NI Diary in both clinical trial and real-world studies. Overall, these findings provide substantive evidence that the NI Diary is fit-for-purpose for deriving patient-relevant endpoints in clinical research for nocturia.
We are grateful to Anders Malmberg, PhD (previously at Ferring Pharmaceuticals) for valuable input to the psychometric analysis plan; to Amlan RayChaudhury, PhD (Clinical Outcomes Solutions, Chicago, USA) for assistance with writing and preparing this manuscript under the authors’ guidance; and to Sam Rousell (Clinical Outcomes Solutions, Folkestone, UK) for programming assistance. This study was funded by Ferring Pharmaceuticals.
Conflict of interest
Clinical Outcomes Solutions received funding from Ferring Pharmaceuticals A/S to conduct this study. No other conflicts to report.
Ethics approval in connection with the DAWN trial was obtained by Institutional Review Boards, Independent Ethics Committees, or local Research Ethics Boards in each country/region where the trial was conducted.
Informed consent was obtained from all participants.
Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.