Introduction

The SF-36 is a generic health status measurement tool, which has been widely used for research in spinal cord injury (SCI) as well as for many other disease groups. It comprises eight domains: physical functioning (PF), physical role limitation, emotional role limitation, bodily pain, general health, vitality, social functioning and mental health; physical and mental composite scale scores (PCS, MCS) can also be derived from these domains.1 When used in SCI, the SF-36 has been demonstrated to have sufficient discrimination to compare the health status of those with SCI with that of other health populations2 and to be able to detect disease-state change for urinary tract infection (UTI) within the SCI population.3

The SF-36 is not without problems when used in SCI and other severely disabled populations. Particular problems have been demonstrated with the PF domain including significant floor effects due to the inability of many patients to perform some of the physical tasks described. This has the effect of limiting responsiveness and creates problems when correlating the PF domain with other SF-36 domains.3 Another major difficulty has been content validity and acceptability within the SCI population. PF questions that relate specifically to walking and stair climbing (SF-36 items 3d, 3e and 3g–i) may be considered insulting or irrelevant for some SCI individuals.3, 4, 5, 6, 7 Tate et al.7 and Meyers and Andresen8 have suggested replacing the words ‘walk’ and ‘climb’ with ‘go’ and ‘go up’, respectively, to address this problem. Tate et al.7 states that construct validity remains adequate with these changes, but to date there has been no published validation study of this type of modification in SCI populations compared with the standard SF-36. In this paper, we aim to validate a modified SF-36 against the standard SF-36 within an SCI sample. We determine whether this modification improves internal SCI discriminant validity, responsiveness to disease-state change and physical floor effects while retaining comparability with other population groups. Supplementary PF, PCS and MCS scores (PFww, PCSww and MCSww, respectively) were thus generated.

Methods

The SF-36 scores were collected as part of the spinal-injured neuropathic bladder antisepsis randomized controlled trial (RCT) in patients with SCI and neuropathic bladder. Between November 2000 and August 2002, 543 eligible patients (mostly community dwelling) were invited to participate in the study, of whom 305 (56%) agreed.9 Subjects completed the standard SF-36 plus three additional questions, replacing the word ‘walk’ with ‘wheel’ for PF questions 9–11 (items 3g–i) of the SF-36. The original questions were also asked, allowing coding to either SF-36 walk-wheel (SF-36ww) or the original SF-36 (Box 1).

Measurements were made at baseline, on enrolment in the RCT and then either on development of the first UTI or (if no UTI occurred) at 6-month follow-up. The characteristics and inclusion criteria of this SCI sample have been described previously.2, 9 The SF-36ww was collected with the assistance of a research officer. This included physically assisting completion of the questionnaire where necessary.

Content validity was assessed by a retrospective review of datasheets for unprompted comments or indications of problems recorded by the patient or assistant on the data collection form during the course of administering the SF-36. The number of participants who made a comment on either baseline or follow-up questionnaire (for the walking questions only) was recorded.

Discriminant validity was determined by comparing participants with paraplegia and tetraplegia. Effect sizes were assessed using the formula: effect size (ES)=(m1m2)/s1, where m1=reference group mean, m2=comparison mean, s1=reference group standard deviation.10 Internal consistency was assessed using Cronbach's α.11, 12

Responsiveness was assessed in the subset of patients who developed a UTI during the trial. UTI onset is a suitable condition for assessing responsiveness in the patient population because it is a common condition in SCI that leads to clinically relevant consequences to test health-state changes. Responsiveness was analysed by calculating change scores and standardized response means13 (SRM=mean change/standard deviation of change) for those patients (n=138) who developed a UTI during the course of the clinical trial. The number of patients necessary to detect the (paired sample) change in health status associated with developing a UTI was calculated using the derived formula:14, 15

where α=0.05 and β=0.20: Z1−α/2=1.96, Z1−β=0.84 and N is the number of patients who subsequently develop a UTI. The total number of patients required for a study is then obtained by dividing N by the proportion expected to get a UTI.

Results

All participants had SCI and neuropathic bladder. The mean age was 44 with a mean elapsed time since SCI of 14 years. Participants were mostly male (83%), 55% were tetraplegic and 49% had complete spinal injury (by ASIA Impairment Scale definition19). There were no post-randomization losses.

Content validity

Retrospective review of the SF-36 data entry sheets found that 20 of 305 participants (7%) had marked the SF-36 physical activity question (3g–i) as not applicable or problematic. The SF-36ww modification (3g ww–i ww) was problematic in just 9 of 305 participants (3%). Reasons for problems with the SF-36ww were mainly enforced ‘bed rest’ (six subjects). One participant stated that the use of an electric wheelchair led to problems interpreting the SF-36ww questions, whereas the remaining two responses had no reasons recorded.

Table 1 shows a comparison of the baseline responses to the standard SF-36 physical activity questions (3g–i) and the equivalent SF-36ww items (3g ww–i ww). The floor effects of the standard SF-36 walking questions are clearly demonstrated, with 93–96% of this sample being maximally limited at healthy (non-UTI) baseline. In comparison, SF-36ww scores were more evenly distributed among the response categories, with 10–26% of participants being maximally limited, and 54–80% not being limited at all.

Table 1 Cross-sectional comparisons at baseline of the SF-36 physical activity question (question 3): walking (q3g–i) compared to the SF-36ww wheeling items (q3gww–q3iww)

SF-36ww summary statistics at baseline

The simple change of ‘walk’ to ‘wheel’ in the physical activity questions (3g–i) increased the overall mean baseline PF score from 18 (s.d.=18.8) to 39 (s.d.=22.4) and the PCS from 33 (s.d.=7.7) to 37 (s.d.=8.3), whereas the MCS was only slightly altered from 56 (s.d.=12.1) to 54 (s.d.=11.9). Using the paired t-test, these differences were all statistically significant (P<0.001).

Ceiling and floor effects

The overall sample (N=305) had a large floor effect of 29% (that is, subjects recorded ‘1-limited a lot’ for every item in Box 1). Post-modification (PFww) improved to 8%. The tetraplegic subgroup (N=167) accounted for most of the floor effect, which was substantially reduced from 49 to 14% with the modification. The paraplegic subgroup (N=138) contributed little to the floor effect in the standard PF domain scores, but this effect was further reduced from 4.3 to 1% using the PFww.

Discriminant validity and internal consistency

Table 2 shows that the ability of the SF-36ww to discriminate tetraplegia from paraplegia is similar to that of the standard SF-36. Mean differences and effect sizes in the PF domain and the PCS and MCS composite scores between the groups were similar for the SF-36 and the SF-36ww. Cronbach's α was slightly better for the PFww scores (0.85) than for the standard PF domain scores (0.83), demonstrating good internal consistency for the walk-wheel modified PFww domain.

Table 2 Discriminant validity of SF-36ww domain scores for patients with SCI (paraplegia compared to tetraplegia) at baseline (n=305): mean modified (ww)b and standard physical function domain and composite scoresa

Responsiveness of the SF-36ww to disease-state change

The scores of the 138 patients who went on to develop a UTI were analysed for responsiveness. Table 3 demonstrates that, when compared with the standard SF-36, the SF-36ww modification almost doubled the SRM (our indicator for responsiveness) in the PF domain for all neurological levels (from SRM=0.36–0.68) and increased the PCS responsiveness by 24% (from SRM=0.58–0.72). A slight decrease in responsiveness in the modified mental composite score (MCSww) was noted. When the sample was stratified into paraplegic and tetraplegic neurological levels, the least responsive domain was the standard PF domain in the tetraplegic group. With the walk-wheel modification, the responsiveness of this group improved by over five times (from SRM=0.11–0.58, n=77). In contrast, the responsiveness for the paraplegic group increased by only 12% (from SRM=0.77–0.86, n=61).

Table 3 Responsiveness: mean SF-36ww domain and composite scores for patients with SCI at baseline and after they developed a UTI (n=138)

We used the SRMs to calculate the sample sizes required to detect a change in health status (as reflected by changes in the PF domain and the PCS and MCS scores) associated with UTI. These sample sizes assume that all subjects will develop a UTI. The results in Table 4 demonstrate the difficulty in detecting disease-state change using the PF domain in tetraplegic persons and the marked improvement that the SF-36ww modification has on sample size (N=611 vs 24). Over all neurological groups, the SF-36ww had a smaller but still marked reduction on sample size (N=60 vs 17).

Table 4 Sample size (of patients who develop UTI) necessary to detect changes in health status associated with UTI: comparison of standard SF-36 and modified SF-36ww

To determine sample size estimates for a study in which only some of the subjects will develop UTI, it is necessary to divide the numbers in Table 4 by the proportion expected to get UTI. For example, using the above PF and PFww results (all neurological levels) and our 45% UTI rate gives a total standard SF-36 sample size of 133 (60/0.45) compared to 38 (17/0.45) for the SF-36ww modification.

Discussion

The SF-36 is widely used as a health status measure across many disease groups including SCI.16 This is despite criticism of the content validity and floor effects of the physical domain of the SF-36.7, 17 Our SF-36ww modification differs from that of Meyers and Andresen,8 Andresen and Meyers17 and Tate et al.7 in that it alters only the walking items and ignores modification of the items about climbing stairs. Our justification for this decision is that the climbing tasks are more likely to be affected by the environment, whereas the locomotion items are more likely to depend on transportable devices; that is, if a wheelchair is the main mode of locomotion, the subject is likely to travel with it. This reduces problematic situations where scores on health status scales may alter simply by being away from a suitable environment, such as when people travel on holidays. The SF-36ww modification is simple, contains only one task type and is quick to perform. We acknowledged that, while pragmatic, our solution is not as complete philosophically as that suggested by Andresen and Meyers and Tate et al. Further studies to review any additional effect of modifying the stair-climbing variables (3d and e) should be performed to see if this also improves responsiveness.

We found that asking participants both the standard SF-36 and the modified SF-36ww items in sequence (in essence asking the problematic physical questions twice, with modified wording to maintain broader SF-36 compatibility) was less annoying to participants than asking questions about walking in isolation. Participants using the SF-36ww now have a relevant additional response to all of the questions about walking. While these three additional SF-36ww items appeared to enhance the acceptability of the questionnaire, a weakness of this study is that our retrospective analysis of content validity is likely to have underestimated the actual number of participants who experienced problems during its administration. Respondents had to feel strongly enough about a question to complain as this involved recording a comment on a datasheet, with or without assistance. As a result, additional studies are necessary to clarify content validity issues related to the modification. However, overall such minor modifications are likely to be as acceptable and feasible in application as existing standard versions of the SF-36. The additional three questions did not appreciably increase the completion time of the questionnaire.

On the basis of retrospective review of recorded comments, participants completing the SF-36ww questionnaire should have the following additional information made clear in a preface:

  1. 1)

    That the wheelchair questions are to be completed by the main mode of wheelchair used by the participants (for example, if the patient uses both an electric and manual wheelchair, they should score the chair they use the most at the time of assessment), and;

  2. 2)

    That patients in situations such as complete bed rest should score based on their current restrictions.

Overall, the SF-36ww modification is quick to implement and is attractive in that it goes a good way toward addressing the content validity problems of the standard SF-36 in the SCI population. Including both the standard and the modified versions of the three walking items retains comparability with disease groups external to SCI by the ability to code to standard SF-36 PF, PCS and MCS values as required. This also allows for powerful interpretations of the underlying SF-36 to be maintained such as utility estimates through SF-6D transformation.6, 18

Reassuringly, the discriminant validity between tetraplegic and paraplegic subgroups for the SF-36ww was almost identical to that of published validation assessments using the standard SF-36.3 The slightly better self-reported mental health in the tetraplegic vs paraplegic group has been reported previously and reflects the negative weighting given to the PF domain in calculation of the MCS.1 These problems were not rectified by the SF-36ww modification, so there is no advantage or disadvantage in the area of internal discriminant validity. Likewise, the internal consistency of the PFww domain, as demonstrated by Cronbach's α, remained similar to that of the PF domain in the standard SF-36.

In addition to improved content validity, the major benefit of the SF-36ww modification over its predecessor when used in SCI populations is the impact on the responsiveness of the health status measure to incident disease states. The standard PF domain scores were poorly responsive and most heavily influenced by the floor effect in the tetraplegic group.3 The standard SF-36 PF domain scores should not be expected to detect change in disease states over time, particularly where a significant proportion of a sample are tetraplegic patients who predominantly utilize a wheelchair for locomotion. The SF-36ww (walk-wheel) modification significantly improved the floor effect in the tetraplegic group, thereby enabling it to be a useful tool to detect within-group clinical change over time.

Accordingly, the SF-36ww health status measure is likely to be useful in studies and clinical management of medical conditions associated with profound physical disability where a significant proportion of the sample are likely to utilize a wheelchair for some or all of their locomotion and where disease-state change is of interest. Further validation studies will be required in populations without SCI, such as those with latter stage neuromuscular disorders.

We have provided a guide to the sample size calculations required in clinical trials and practise where health status change over time is of key concern. The SF-36ww modification demonstrates a clear advantage in study power, particularly for the tetraplegic subgroup. To determine actual sample size estimates from our figures it is necessary to divide the figures in Table 4 by the proportion expected to get a UTI (or other condition). Given the differences demonstrated between paraplegic and tetraplegic populations, if the proportion of each is likely to differ from our sample (55% tetraplegic), it would be necessary to find the sample size for paraplegics and tetraplegics separately and calculate the actual sample size required after estimating the proportion of each likely to be enrolled.

Conclusion

The SF-36ww is a simple modification, which substantially addresses the known problems of acceptability, content validity and floor effects of the standard SF-36 physical domains within populations with SCI while retaining discriminant validity and internal consistency. We demonstrated improved responsiveness for disease-state change within a sample with SCI that will enhance the power of future studies to assess the effect of disease progression, treatment and prevention. The application of the SF-36ww to other populations with profound physical disabilities warrants investigation.