Introduction

Neck pain is a common musculoskeletal complaint in western societies [11]. In the majority of cases the pathological basis for neck pain is unclear and complaints are labeled as ‘non-specific’ or ‘mechanical’ [4]. Neck pain may result in disability, limitations in activities and restrictions in participation in daily living and work [32, 34]. Self-reported disability in patients with neck pain is often measured by means of region-specific and generic questionnaires [25]. Questionnaires should have good psychometric qualities, including validity [25, 27]. Three aspects of validity will be tested in this study. Content validity is the extent to which items of the questionnaire reflect all aspects of the construct to be measured [25, 27]. Internal consistency is the extent to which all items measure the same construct [25, 27]. Construct validity is the extent to which a questionnaire is convergent and/or divergent correlated with other tests that are presumed to measure a similar or different construct [25, 27].

The most frequently used neck disability questionnaires are the Neck Pain and Disability Scale (NPAD) [34] and the Neck Disability Index (NDI) [32], which are validated in several languages [2, 3, 6, 8, 16, 17, 19, 21, 22, 29, 35]. The validity of the Dutch Language Versions (DLV) of the NPAD and NDI has not been studied. The aim of this study was to investigate the validity of the NPAD–DLV and the NDI–DLV in patients with non-specific chronic neck pain (CNP) in an outpatient tertiary rehabilitation setting. A priori hypotheses were defined (Text box 1) and outlined in “Materials and methods”.

Table 1 Text box 1. Hypothesis for examining validity of the NPAD–DLV and NDI–DLV

Materials and Methods

Study sample

Patients with CNP were recruited from referrals by general practitioners or medical specialists for rehabilitation treatment in the Center for Rehabilitation at the University Medical Center Groningen, The Netherlands. Inclusion criteria for this study were non-specific chronic neck pain (>3 months duration), admitted for outpatient rehabilitation, age between 18 and 65 years, and sufficient knowledge of the Dutch language (to complete questionnaires). Exclusion criteria were status post surgery in the cervical region, cardiovascular or pulmonary diseases severely diminishing physical capacity, pregnancy, addiction to drugs, and extensive psychological or behavioral problems.

Procedures

Prior to the first visit patients filled out a baseline questionnaire assessing clinical characteristics including visual analog scale (VAS)pain and VASdisability. During the first visit a review of the medical history and a physical examination was performed. A second visit was scheduled, depending on the length of the waiting list and patient availability, 2–9 weeks after the first visit, but prior to the start of the rehabilitation program. During the second visit the patients filled out the NPAD–DLV, the NDI–DLV and the Short-Form-36 Health Survey (SF-36). All patients signed informed consent for their data to be used for research purposes. Data were gathered between November 2006 and October 2009.

Measurements

The NPAD consists of 20 items divided into 4 dimensions; neck problems; pain intensity; emotion and cognition; and interference with life activities [34]. Each item has a VAS of 100 mm with numeric anchors at 0, 1, 2, 3, 4 and 5 (each 20 mm apart). Item scores range from 0 (no pain or activity limitation) to 5 (as much pain as possible or maximal limitation). The total NPAD score ranges from 0 to 100 points. Higher scores indicate greater disability [34]. The NPAD has shown to be a valid and responsive measure of disability in other languages [3, 6, 8, 17, 19, 21, 22, 29, 34, 35]. The NPAD–DLV was used in this study; the reproducibility is acceptable [15].

The NDI consists of ten items: pain intensity, personal care, lifting, reading, headaches, concentration, work, driving, sleeping, and recreation [32]. Each item has six different assertions expressing progressive levels of pain or limitation in activities. Item scores range from 0 (no pain or limitation) to 5 (as much pain as possible or maximal limitation). The total NDI score ranges from 0 to 50 points. Higher scores indicate greater disability [32]. The NDI has shown to be a valid and responsive measure of disability in different languages [2, 8, 17, 19, 20, 22, 26, 32, 33, 35]. The NDI–DLV [16] was used in this study; the reproducibility [15, 33] and responsiveness are acceptable [26, 33].

The SF-36 is a questionnaire assessing general health of the past 4 weeks in 8 domains: physical functioning, physical role restriction, bodily pain, general health, vitality, social functioning, emotional role restriction, and mental health [12]. Scores for each domain range from 0 to 100, with higher scores indicating higher levels of functioning or well-being. The Dutch language version of the SF-36 has shown to be reliable and valid [1].

The VASpain is a horizontal line, 100 mm in length, anchored by word descriptors at each end (0: no pain, 100: worst pain possible). Patients are asked to draw a vertical mark across the horizontal line that best represents the pain level. The VASpain is a commonly used assessment instrument with proven reliability and validity [9].

The VASdisability was evaluated by the question ‘how much does your neck pain restrict you in your daily activities?’ (ADL, housekeeping, work, hobby, recreation, sport and social activities). The scoring procedures are similar to the VASpain. The anchoring word descriptors are 0: no restriction and 100: worst possible restriction. The reliability and validity of the VASdisability were assessed in patients with chronic musculoskeletal pain [5].

Hypotheses

Hypotheses are listed in Text box 1 and for the most part based on previous studies as described below.

Content validity

A normal distribution of the total scores of the NPAD–DLV and NDI–DLV was expected (Hypothesis 1), a good completeness of item responses (Hypothesis 2), and no floor and ceiling effects in item responses were expected (Hypothesis 3) [6, 7, 17, 19, 34, 35]. It was expected that scores on the NDI in a tertiary rehabilitation setting would be significantly higher than those in a Dutch primary care setting (Hypothesis 4) [6, 14, 19, 20, 26, 33, 34]. No Dutch data are available for comparison of the NPAD–DLV.

Internal consistency

It was expected that Cronbach’s alphas of the NPAD–DLV and NDI–DLV would be ≥0.70 (Hypothesis 5) and that Item–total score correlations would be fair to moderate (Hypothesis 6) [6, 8, 17, 1922, 29, 32, 34, 35].

Construct validity

A fair to moderate correlation with all eight SF-36 domains was expected (Hypothesis 7) [8, 2022]. It was expected that the NPAD–DLV and NDI–DLV had a fair to moderate correlation with VASpain [2, 3, 7, 13, 17, 2022, 35] and a moderate correlation with VASdisability [17, 35] (Hypotheses 8 and 10). Because four questions of the NPAD are pain-oriented a stronger correlation between the NPAD–DLV and the VASpain was expected than between the NDI–DLV and the VASpain (Hypothesis 9). No significant differences between sexes or age groups (below and above mean age of the study population) were expected (Hypotheses 11 and 12) [20, 32]. Significantly higher NPAD–DLV and NDI–DLV scores were expected for patients who were in litigation or who were receiving workers compensation because of their neck problems than for patients who were not in litigation or who received no workers compensation (Hypotheses 13 and 14) [18, 28]. A moderate-to-good correlation between the total scores of the NPAD–DLV and NDI–DLV was expected (Hypothesis 15) [2, 10, 22, 35]. All hypotheses are operative for both the NPAD–DLV and NDI–DLV with exception of hypotheses 4, 9 and 15; in total this results in 27 hypotheses.

Data analyses and criteria

Normality of the total scores was analyzed using the Kolmogorov–Smirnov test and PP plots. Floor and ceiling effects were considered to be present if more than 15% of respondents achieved the lowest or highest possible score for items [6]. When ≥75% of the items did not have floor or ceiling effects, these questionnaires were considered to have no floor or ceiling effects. Internal consistency was assessed with Cronbach’s alphas and values ≥0.7 are considered adequate [24]. Standardized item- total score Spearman correlations of the NPAD–DLV and NDI–DLV were analyzed by calculating correlation coefficients between each item and the sum of all other items excluding the item investigated. Independent t-tests were used to analyze differences NPAD–DLV and NDI–DLV total scores between tertiary and primary care patients, patients younger or older than the mean age of the study population, men and women, patients with or without litigation, and with or without workers compensation. Pearson correlations were used to determine the strength of the relationship between the total scores of the NPAD–DLV and NDI–DLV and the SF-36 domain scores, VASpain and VASdisability and also between the total scores of NPAD–DLV and NDI–DLV. The construct validity was interpreted as good when at least 75% of the results corresponded with the hypotheses [30]. Correlations were interpreted as follows: 0.75 ≤ r ≤ 1.0 as good, 0.50 ≤ r < 0.75 moderate, 0.25 ≤ r < 0.50 fair, and 0.00 ≤ r < 0.25 little or no [27]. All statistical analyses were performed with SPSS software, version 16.0. The critical values for significance were set at p < 0.05.

Results

A total of 391 patients with CNP were referred to the Center for Rehabilitation between November 2006 and October 2009 of which 129 were admitted for rehabilitation. A total of 125 patients fulfilled inclusion criteria. During the waiting period 13 patients decided not to start with the rehabilitation program because of lack of time, waiting period too long, problems with insurance company, and further diagnostic procedures. Clinical characteristics of the patients (n = 112) are presented in Table 1.

Table 1 Patient characteristics (n = 112)

Content validity

NPAD–DLV and NDI–DLV were normally distributed. Therefore, hypothesis 1 was not rejected. Mean scores for individual items for the NPAD–DLV ranged from 1.7 to 4.2 (Table 2) and for the NDI–DLV from 0.7 to 2.8 (Table 3). In total 22 (1%) of 2,240 NPAD–DLV items and 15 (1%) of 1,120 NDI–DLV items were missing; therefore hypothesis 2 was not rejected (Tables 2, 3). Floor effects were <10% for all NPAD–DLV items. Ceiling effects were <13% for all NPAD–DLV items; therefore hypothesis 3 was not rejected (Table 2). For the NDI–DLV the items ‘personal care’ and ‘sleeping’ had floor effects, with respectively 44 and 19% of the patients scoring the lowest possible value. A ceiling effect was present for ‘headaches’ (19% of patients scored highest). Because 8 out of 10 NDI–DLV items did not have floor effects and 9 out of 10 did not have ceiling effects, hypothesis 3 was not rejected (Table 3). The total NDI–DLV score was 21.5 (Table 1). This score is significantly higher than the total scores in a Dutch primary care setting (t (293) = 8.2 (95% CI 5.3–8.7) [26] and t (297) = 8.3, (95% CI 5.3–8.7) [33]); therefore hypothesis 4 was not rejected.

Table 2 Descriptive data and distribution of responses for each item in the NPAD–DLV (n = 112) and Spearman correlation between item scores and total score
Table 3 Descriptive data and distribution of responses for each item in the NDI–DLV (n = 112) and Spearman correlations between item score and total score

Internal consistency

The Cronbach’s alphas of the NPAD–DLV and the NDI–DLV were respectively 0.93 and 0.83; therefore hypothesis 5 was not rejected. The strength of all Item–total correlations ranged from r = 0.45 to r = 0.73 (NPAD–DLV) and from r = 0.40 to r = 0.64 (NDI–DLV) (Tables 2, 3). Because all Item–total correlations fell within the hypothesized ranges, hypothesis 6 was not rejected.

Construct validity

Correlations between the total scores and SF-36, VASpain, and VASdisability are presented in Table 4. Differences between age groups, sexes, litigation status, and workers’ compensation are presented in Table 5. Hypotheses 7–13 were not rejected. Hypothesis 14 was rejected for the NPAD–DLV and not rejected for the NDI–DLV. The relationship between total scores of NPAD–DLV and NDI–DLV is presented in Fig. 1. The strength of the correlation between the NPAD–DLV and NDI–DLV was r = 0.77 (Table 4); therefore hypothesis 15 was not rejected.

Table 4 Construct validity of the NPAD–DLV and NDI–DLV (Pearson correlations)
Table 5 Results of independent t-tests for the comparison of age ≤39 versus age >39, male versus female, litigation versus no litigation, workers compensation (WC) versus no WC
Fig. 1
figure 1

Scatterplot showing total scores of NDI–DLV and NPAD–DLV

Discussion

In this study the validity of the DLV of the NPAD and the NDI was tested with the use of pre-defined hypotheses. Because 26 of the 27 (96%) pre-defined hypotheses were not rejected the validity of the NPAD–DLV and NDI–DLV was interpreted as good. The current study was conducted in a university setting and is therefore representative for patients with CNP in a tertiary referral center. The sample size in our study was similar to those of other validity studies [6, 17, 35]. In the current study more women (63%) than men (37%) were included; this is similar to other validity studies [6, 10, 14, 17, 2022, 29, 32, 34, 35], where the female proportions ranged from 54 to 83%. The mean age in our study was relatively young (39 years) in comparison with other validity studies, where the mean age of patients ranged from 38 to 65 years [6, 10, 14, 17, 2022, 29, 32, 34, 35].

The normality of the total scores and the completeness of item responses were similar to other studies [6, 19, 21, 29, 35]. Floor and ceiling effects were not found in two studies [22, 35], while in two other studies floor effects for NPAD (6 items [19] and 14 items [6]) and NDI (3 items [19]) and ceiling effects for the NDI (1 item [19]) were found. The lower scores for most of the items for the NPAD [6, 19] and NDI [6] in those studies may explain the differences in floor effects with the present study. It is of interest that in the German study [6] (n = 108 of which n = 80 after atlantoaxial screw fixation and n = 28 with CNP) in the subgroup of patients with CNP much less items (3 in stead of 14) had floor effects. The Korean study [19] (n = 180) consisted of patients treated in physiotherapy departments of private hospitals or clinics.

We calculated a single Cronbach’s alpha for the NPAD–DLV and NDI–DLV because their factor structure (1, 2, 3, or 4 factors for NPAD and 1 or 2 factors for NDI) is unclear and because in the original English versions single Cronbach’s alphas for the total scales were calculated [6, 8, 2123, 29, 31, 32, 35]. In the present study Cronbach’s alpha for the NPAD–DLV was high (0.93). Other studies also found high values of Cronbach’s alpha (range: 0.93–0.97) [6, 17, 19, 21, 22, 29, 34] indicating redundancy of items. Cronbach’s alpha for the NDI–DLV in the present study (0.83) also falls within the range (0.74–0.92) reported by others [8, 17, 19, 20, 22, 32]. The variation in the Item–total score correlations for the NPAD–DLV and the NDI–DLV observed in the present study is similar with the variation found in other language versions (0.45–0.91 for the NPAD [6, 29, 34] and 0.45–0.84 for the NDI [20, 32]).

There is no established gold standard for assessment of neck pain disability. Therefore, criterion validity of the NPAD and NDI could not be analyzed [24]. To test the construct validity, comparisons were made with other constructs known to be associated with neck pain, neck pain related disability or generic health. The differences in the strength of the relationship between NPAD–DLV and NDI–DLV and all eight SF-36 domains with previous studies may be explained by differences such as patient setting, nature of neck condition, pain duration, and amount of neck pain related disability of the study samples [8, 2022]. In the present study the correlation between the NPAD–DLV and VASpain was slightly higher than for the NDI–DLV and VASpain as hypothesized [17, 35]. The correlation of the NPAD–DLV and NDI–DLV with VASdisability in the present study was similar with that of other studies [17, 35]. The correlation between NPAD–DLV and NDI–DLV (r = 0.77) was similar with other studies (0.66–0.86), suggesting that these questionnaires measure comparable constructs [2, 10, 22, 35].

A potential limitation of this study was that the sample consisted largely of patients with moderate neck pain and disability. Although this may be expected in this tertiary rehabilitation setting, the validity of the NPAD–DLV and NDI–DLV should also be tested in general practice populations. Furthermore, the period between the baseline assessment and the second assessment was variable and the stability of VASpain and VASdisability between first and second assessment was assumed but not formally assessed. All our patients with CNP started rehabilitation after completing the waiting period, indicating that their health status had not changed substantially [15]. Therefore, although we cannot be sure, this suggests that the potential impact of this weakness is unlikely to be substantial [15]. Finally, the hypotheses and the cut-off points that were used in the current study were based on previous studies without a methodically and qualitatively analysis of the validity of these studies.

A strength of this study is that to the author’s knowledge for the first time a validity study is performed for the NPAD as well as the NDI in relation with SF-36 domain scores, VASpain and VASdisability. Another strength is that the validity of the questionnaires is tested using explicit pre-defined hypotheses. The advantage of this method is its explicitness and transparency. Because the results are presented in detail, readers can develop and test their own hypotheses and perhaps interpret the same results differently. Further study with the NPAD–DLV is necessary to assess other measurement properties, such as responsiveness and minimally important change.

Conclusion

The NPAD–DLV and NDI–DLV are valid questionnaires to measure self-reported disability in patients with CNP within an outpatient tertiary rehabilitation setting.