Introduction

The determination of a work disability pension for patients with long-term medical impairments is of great social and financial importance. Part of the evaluation of a work disability pension is the assessment of the functional limitations of the patient. In the literature, several instruments and their psychometric properties for assessing functional limitations are described, for instance, self-report questionnaires and performance-based functional testing [14]. In most countries the actual assessment of functional limitations is carried out by a medical doctor [5, 6]. The assessment by physicians can be based on written information (e.g., from the patient or treating physician) or can be conducted by an examination in person. In international literature a poor agreement among physicians on functional disability exists [711]. A tremendous variation in disability rating recommended by physicians given the same set of facts was found [7]. To decrease this variation the United States Social Security Administration (SSA) planned to “develop functional assessment instruments that are standardized, accurately measure an individual’s functional abilities ands that are universally accepted by the public, the advocacy community, and health-care professionals” [7, 12, 13].

In the Netherlands, an employer has to pay wages for 2 years if an employee is unable to work due to physical or mental disability. After these 2 years, the patient can apply for a work disability pension. Specialized insurance physicians assess the patient’s functional limitations in work as part of the application for a work disability pension. Their judgment is based on information from treating physicians, along with their own observations, physical examination and an interview with the patient. To a large extent, the assessment is based on the interview in which attention is given to activity limitations and participation, in addition to standard medical history-taking [14]. The assessed functional limitations are registered in a standardized list, the Functional Ability List (FAL) [15].

Although in the Netherlands three semi-structured interview models for the assessment of functional limitations are available, in daily practice insurance physicians do not use a fixed model with claimants applying for disability pension [14, 16]. One of the models available is the Disability Assessment Structured Interview (DASI) [17]. This is a semi-structured interview in which the three levels of functioning—impairment, activity limitation and participation—are mapped in a structured way in accordance with the International Classification of Functioning, Disability and Health (ICF) [18]. In Box 1 the general domains covered in the DASI are described.

Box 1 Domains covered in the DASI interview

Two important characteristics of the DASI are its semi-structured way of interviewing the patient, and its method of inquiring about specific and detailed examples of limitations and concrete activities which the patient still undertakes.

Two important criteria for evaluating work-related assessments are the validity and reliability of the instruments used [19, 20]. Validity is the extent to which an instrument measures what is intended to be measured [21, 22]. Content validity is the degree to which the test items represent the performance domain the test is intended to measure, and it is usually determined by a panel of experts examining the relationship between the test objectives and the test items, or by detailed knowledge of the normal practices used. Concurrent validity examines the correlation between a new measure and an accepted measure given to the same subjects [22, 23]. Reliability involves the extent to which a test or measurement is consistent and free from error [24].

In spite of the fact that assessments of functional limitations in the Netherlands are mainly based on an interview, almost no information is available on the reliability and validity of the interview as an instrument to assess functional limitations. In studies where the assessing physicians interview the patients themselves, a low inter-rater reliability was found [25, 26]. In studies where physicians based their assessments on written reports or on video recordings of DASI interviews, reasonable to good inter-rater reliability was found [27, 28].

Given the immense consequences of functional assessments, it is of importance to examine the psychometric properties of such an instrument. In order to fill this gap, the aim of the present study is to evaluate in a real-life situation:

  1. 1.

    the inter-rater reliability between physicians with and without DASI training.

  2. 2.

    the content and concurrent validity of the DASI.

  3. 3.

    the patient’s opinion of those physicians who used and those who did not use the DASI.

The DASI method was chosen over other methods because it is a well-described method which is based on the ICF and it is the only method that has had some study done on its psychometric qualities.

Methods

Physicians

At four out of a total of 17 branches of the Dutch Social Security Office, four insurance physicians were invited to participate in the study, resulting in 16 physicians voluntarily cooperating in this study. In each of the four locations, two insurance physicians were randomly assigned to the intervention group and two were assigned to the control group. No significant difference in the average length of time spent in professional practice between the physicians in the intervention group (15.5 years, range 7–28 years) and the control group (14.6 years, range 9–21 years) was present.

Training

The intervention group was given a 3-day DASI training session over a three-week period. The first week consisted of 2 days of instruction and practice. After demonstration of an item from the DASI by an instructor and an actress, the eight physicians practiced the items of the DASI in groups of three physicians. The role of the patient, physician and observer alternated. The next week the physicians practiced the method on their regular patients and made a video recording of the DASI interview. On the third day of the training session, in the third week, their video recordings were analyzed and assessed.

The control group did not receive any training and examined patients as usual.

Patients

A total of 443 patients who applied for social disability benefit after 21 months of sick leave were asked to cooperate, of them 236 agreed (53%). Only patients with at least lower back or lower extremity problems were selected in order to obtain a homogeneous group with sufficient filling of items of the FAL (see Instruments). Of the patients who agreed to cooperate, 26% were included (n = 62), 36% were diagnosed as mental complaints (n = 85), and 38% had another diagnosis such as neck and upper extremity complaints, heart and lung diseases or cancer (n = 89).

Instruments

The Functional Ability List (FAL) [15] is an instrument to record functional limitations and is used in social security assessments in the Netherlands. All Dutch insurance physicians are trained and experienced in using the FAL. The FAL contains six domains in which 70 mental and physical items are addressed, and for each item the seriousness can be indicated. One example is the item “lifting or carrying”, where the insurance physician has to choose from four gradations:

Lifting or carrying

0 normal, can carry or lift about 15 kg (toddler)

1 slightly limited, can carry or lift about 10 kg (small toddler)

2 limited, can carry or lift about 5 kg (bag of potatoes)

3 severely limited, can carry or lift about 1 kg (1 l of milk)

The content validity of the DASI was assessed using a self-structured questionnaire which was filled out by the physicians who had undergone DASI training. The questionnaire contained eight questions with fixed response alternatives on a five-point ordinal rating scale. In addition, it was also possible for the physicians to make additional comments about the DASI. The questionnaire contained questions about whether the instrument was adequate for the intended purpose, whether anything essential was missing or whether any part of the instrument was irrelevant (Table 2).

In addition, the patients filled out a questionnaire that is routinely used by the Dutch Social Security Office to measure patient satisfaction with the behavioral aspects of physicians [29]. Lastly, the patients gave an indication of the duration of the interview.

Procedure

Patients (n = 62) were interviewed and examined independently by two physicians from the same group (intervention group or control group) on the same day, between June and November 2008. The patients were randomly assigned to either the intervention or the control group so as to be able to compare similar groups. The physicians recorded their assessment of those work limitations to be found in the physical items of the Functional Ability List (FAL), and provided a detailed report containing information on the interview, including their judgment and the reasons for their judgment. Furthermore, we examined whether the patients did end up qualifying for a disability benefit.

After using the DASI in daily practice, the physicians were asked to give their opinions of the DASI by filling out the questionnaire. After the interview and examination, the patients were asked their opinions as to how satisfied they were with the behavioral aspects of the physicians, also by filling out a questionnaire.

Analysis

The “linear-weighted observed percentage agreement” on the FAL items was taken as a measurement of inter-rater reliability within each of the two groups of insurance physicians [30, 31]. Due to the fact that the marginal distribution of the variables was skewed, the computation of an agreement index based on Cohen’s kappa could not be used. One requirement for the use of this index is that the marginals should have more or less the same frequency. If not, this will result in an overestimation of the expected agreement [32]. The statistical software package AGREE 7.3 [33] was used for the calculation of these values. In general, a percentage agreement of 60–80% is considered reasonable to good; more than 80% is considered excellent [34].

The concurrent validity was examined by comparing the mean scores on the FAL items of the intervention and the control groups. The Mann–Whitney test, a non-parametric test that is used to compare two independent groups, was used for the between-group differences in the mean scores on the FAL items.

Results

A total of 62 patients were assessed by two physicians, 32 in the intervention and 30 in the control group. There were no significant differences between the groups in terms of age, gender, terms of employment and diagnosis. The mean age of the patients in the intervention group was 49.8 years (range 30–64 years), and in the control group, 46.3 years (range 35–63 years). In the intervention group 47% of the patients were female, and in the control group, 37%. Before registering sick, the patients in the intervention group worked for an average of 31.6 h a week (range 8–40 h), and in the control group, 33.0 h a week (range 13–40 h). In the intervention group, nine patients had lower-extremity problems (e.g., fractured ankle, gonarthrosis or peripheral arterial disease), 15 had lower-back problems (e.g., lumbar spinal stenosis, chronic non-specific lower back pain or herniated disc) and eight patients presented more general complaints (e.g., rheumatoid arthritis, fibromyalgia or somatoform disorder). In the control group, eight patients had lower-extremity problems, 14 had lower-back problems and eight had general complaints.

Table 1 presents the “linear weighted percentage agreement” between the physicians and the “mean scores” on the items of the Functional Ability List in the control and intervention groups.

Table 1 “Linear weighted percentage agreement” between the physicians (columns 1–2) and “Mean scores” (columns 3–4) on items of the functional ability list in the intervention (n = 32) and control (n = 30) groups

Physicians from the intervention group showed a mean percentage agreement of 80.6% (range 59–100%), and the control group, 83.6% (range 67–97%). Except for the item “frequent heavy lifting,” there were no differences in agreement percentages between the intervention and control groups.

In 19 out of the 21 items on the FAL the physicians of the intervention group indicated more serious functional limitation scores in their assessments compared to the control group. For nine of these items, there were significant differences (P < 0.05). Concerning the daily number of hours a patient could function, the physicians in the intervention group gave limitations in 31% of the patients; in 40% of these patients the physicians were in agreement on this. In the control group, the physicians indicated a limitation in hours of daily functioning of 23% in their patients; in 29% of these patients the physicians were in agreement on this.

In the intervention group, 18 out of 32 patients (56%) qualified for a work disability benefit, while in the control group 13 out of 30 patients (43%) did; this did not represent a significant difference (P = 0.31).

Table 2 presents the opinion of the eight physicians of the intervention group concerning the DASI. All physicians were in agreement that the DASI was an acceptable tool in daily practice, one which gives an objective view of the patient and enough information to assess functional ability.

Table 2 The physicians’ opinions (n = 8) on the DASI (intervention group) in percentages

As an added value of the DASI, the physicians mentioned in particular the structuring of the interview and collecting detailed information on the functioning of the patient. One physician mentioned that the DASI mainly collected information from the patient, but that the assessment of this information into functional abilities was not addressed.

In their reports the physicians of the intervention group mentioned an average of 6.7 functional limitations as experienced by the patient (range 4–10). In the control group, an average of 4.4 functional limitations were mentioned (range 0–7) (P < 0.05). In the case of functional limitations, 71% of the intervention group indicated the intensity of the limitations experienced, for instance, by giving an example of the limitation in daily life. In the control group, this was 40%.

The patients’ satisfaction report score for physicians of both the intervention and control groups in their interviews was 7.7 on a scale from 1 to 10. Moreover, no differences between the two groups were found in terms of answers to the questions concerning behavioral aspects of the physicians (listening, empathy, meticulousness and professionalism).

According to the patients, the duration of the interview and the physical examination was on average 45–60 min in the intervention as well as in the control group (range <30 min to >60 min).

Discussion

Although accurate determination of work disability status is crucial for the health and well-being of patients and their families, the reliability and validity of bureaucratic approaches is poor and cumbersome and extraordinarily expensive. This study of the DASI demonstrates that a semi-structured interview might hold great promise as an inexpensive solution to this problem. We studied inter-rater reliability, and both content and concurrent validity, along with the patient’s opinion of the DASI.

Reliability

Up till now, no real life studies of inter-observer agreement among physicians in assessing functional limitations had been conducted. We hypothesized that agreement between physicians in the control group would be low and, in the DASI group, that it would be acceptable. In the end, we found an overall inter-rater reliability for the items of the FAL in the intervention group that was reasonable to good, and for some dimensions even excellent. Contrary to our expectations, the agreement in the existing practice was satisfactory too, and DASI training did not improve agreement between physicians. One explanation for this may be the fact that we used patients with relatively straightforward lower back or lower extremity problems. Possibly those patients with more complicated problems and those with mental problems might produce less satisfactory results. Because agreement between physicians in international literature is found to be very poor [711], another explanation may be that the satisfactory agreement in existing practice is specific for the Dutch context. In the Netherlands specially trained insurance physicians assess the functional disabilities in patients. These physicians all had an interview-training, in which they were taught to ask for activity limitations and participation in addition to standard medical history-taking. This education is not always common in other countries, and may be the explanation of the relatively good agreement between the physicians.

We found a low inter-rater agreement concerning the daily number of hours a patient could function. The daily number of hours a patient can function according to the physician often has very important consequences for a work disability benefit. Therefore, the low inter-rater reliability found is undesirable. Insurance physicians in the Netherlands have a guideline for “reduced working hours” [35] at their disposal, but unfortunately this guideline cannot prevent the differences in outcome between the physicians. The satisfactory inter-rater reliability on the items of the FAL and the low inter-rater agreement concerning the daily number of hours a patient could function which were found in this study are comparable to Dutch studies conducted in a more controlled environment where physicians did not see patients face to face, but made an assessment based on video recordings or written reports of DASI patient interviews [27, 28].

Validity

Preferably, validity is assessed by comparing the measurement studied to a gold standard. For assessing functional limitations, however, no gold standard is available. Different methods for assessment, for instance, self-assessment questionnaires, clinical examination and performance tests, lead to different outcomes [36]. From the reports made by the physicians in this study, it appeared that the same information could lead to different outcomes. One example was the assessment of a 56-year-old patient with depression and lower-back problems as a result of a somatoform disorder. One physician assessed no functional limitations when considering the diagnosis and an absence of objective functional defects. The other physician assessed the same patient and concluded the patient was limited in lifting ability (10 kg maximum), sitting (1 h maximum) and walking (half an hour maximum) because the patient made a genuine impression, and offered a plausible and consistent story. The question might be raised as to whether consistency in a patient’s behavior together with the functional limitations experienced should in fact be leading factors in the assessment, this despite the fact that there might be no actual objective medical findings present. In this light, part of the assessment of functional limitations would seem to lie in the realm of a social rather than a medical concept.

This study showed a satisfactory content validity for the DASI. Without a single exception, the physicians agreed on the fact that the DASI was an acceptable tool in daily practice and one which gave an objective view of the patient and enough information in order to assess functional abilities. Seven out of eight physicians found the DASI to be an even better basis for the assessment of functional limitations than the interview they usually applied.

For assessing concurrent validity, the outcomes of two measurements administered to the same patients were compared. In this study, we compared the outcomes of the intervention group and the control group in different groups of patients. Because patients were randomly allocated to the intervention group and the control group, however, the groups were comparable. This is supported by the fact that there were no significant differences for both groups in terms of age, gender, terms of employment and diagnosis. We found that in almost half of the items of the FAL, the physicians using the DASI gave substantially more severe functional limitations in their assessments than did the control group. But this did not lead to an increased number of patients who qualified for a disability benefit. One explanation for the more severe functional limitations may be that the DASI focuses more attention on problems concerning activities and functional limitations as compared to “care as usual.” That this did not lead to more disability benefits can be explained by the Dutch system for determining the benefit. An occupational expert investigates what jobs the patient is theoretically still able to perform in light of the limitations. The earning capacity will determine the disability benefit. Apparently the more severe functional limitations did not lead to a greater loss in earning capacity. The literature describes the fact that insurance physicians show limited attention to the detailed information regarding the functional limitations the patients experience [14]. This is in line with the findings in this study where the physicians in the intervention group reported significantly more severe functional limitations.

Even though no differences between the intervention and the control groups were found in disability benefit outcome, the difference in outcome for the functional limitations was important because functional limitations are needed for reintegration into appropriate work. Because of the lack of a gold standard, it is unknown whether the more severe ratings in the DASI group are more valid than those of the “usual care” physicians. The physicians in the intervention group found the DASI to be a better basis for the assessment than the interview they usually applied. Therefore, we think the DASI contributes to a more thorough assessment of the functional limitations.

Patients’ Opinions

The patients’ mean report score for satisfaction with the DASI was 7.7 on a scale from 1 to 10. The same score was found for the interviews of the physicians in the control group. Apparently the DASI did not improve or worsen patient satisfaction.

Study Limitations

Some limitations of this study should be noted. The physicians knew they were being monitored; this might have influenced their assessments. However, it was practically impossible to conduct a study which would not have had this disadvantage. Furthermore, the assessments were aimed at physically based functional limitations; mentally based functional limitations might well present a rather different outcome. Finally, although the physicians in the intervention group received DASI training, it is possible they did not implement this in daily practice. We studied the reports of the assessments to check whether the physicians who received DASI training actually performed the interview as it was taught. One important characteristic of the DASI is the presence of concrete and detailed information on functional limitations as experienced by the patient. The reports on those physicians who received the training contained more functional limitations and more detailed information on this point, indicating that the intervention group actually performed what they had been trained to do.

The DASI in Daily Practice

Several tools are used to assess the functional abilities of people with medical impairments, but no single currently existing test provides a valid measurement of functional limitations [12, 37]. Functional capacity tests and questionnaires alone cannot properly assess functional limitations without an appraisal of the outcome of these tests. A combination of specialized physicians and instruments such as functional capacity tests and questionnaires looks the most promising. It might be useful to provide self-report questionnaires about function to the patient before the DASI in order to increase the efficiency and specificity of the interview. Then, clinical examination and a semi-structured interview, like the DASI, could be conducted by a physician or, in part, even by a trained nurse [28]. Based on this information, individually selected functional capacity tests could be conducted to confirm or disconfirm the initial results of the interview. Guidelines and protocols might narrow down further the differences in assessment among physicians [25].

Future Research

Further research into the value of guidelines and protocols—especially where the assessment of limitations as to the number of hours a patient can function daily is a factor—as well as additional studies concerning the use of the DASI in mental-function limitations may be useful. Concurrent validity can be assessed by comparing outcomes of the DASI with self-report questionnaires and functional capacity tests. For research into the validity of instruments to assess functional limitations, a gold standard is needed. A gold standard might be approached by looking for consensus among a number of physicians after medical examination, an interview protocol, questionnaires and performance tests.

Conclusion

In conclusion, we would state that the DASI is a tool with a reasonable to good inter-rater reliability and content validity, and that it appears to be acceptable to both patients and physicians. The DASI did not improve inter-observer agreement beyond that of usual interview procedures used in the Netherlands. The DASI would seem to be a worthwhile tool for collecting self-reported information in order to assess functional limitations in claimants. Because the physicians who used the DASI assessed more functional limitations as compared to usual practice, further research into the interpretation of the self-reported information is needed.