What is new?
- •
The MOSES-Combi questionnaire demonstrates that it is possible to integrate patient and provider items on mobility and self-care to unidimensional scales that meet the requirements of a one-parameter item-response theory (IRT) model and are reliable.
- •
This is, thus, an instrument that can be used for various measurement models with different data sources and provides measurements that are on one scale and can be compared directly. This idea can be transferred to other applications.
In the measurement of mobility and self-care in chronic diseases, the question arises as to how activity limitations can be measured. Performance tests based on the standardized observation of activities, reports from patients, and external assessments by providers (physicians, therapists, nurses, and others) are possible; the last two methods (e.g., Refs. [1], [2]) are especially widespread, as they allow a wide range of activity limitations to be included and are economically feasible.
Two problems arise when using patient and provider assessments that should be dealt with in detail:
- 1.
Because patient and provider assessments on mobility and self-care generally correlate [3], but do not show good agreement, the question arises regarding the relationship of the two assessments to each other and how the differences between the two perspectives can be explained [4].
- 2.
As not all patients with chronic illnesses are capable of giving their own assessment (e.g., there are problems for very old patients or the cognitively impaired [5], [6]), only a provider assessment is possible for both of these groups. This gives rise to the question of how to solve the dilemma between selecting a sample (patients without their own assessments are omitted from the analysis) and selecting a method (patient assessments are not available; only the provider is surveyed) in this situation (see also Ref. [7]).
Snow et al. [8] introduced the differentiation between a proxy data measurement model (proxy model) and an other rater data measurement model (other rater model). In the proxy model, the provider's assessment is considered a mere substitute for the patient assessment, and in the other rater model, the provider's assessment is used to consider another perspective with respect to the construct being studied (e.g., mobility). We assume that in the assessment of mobility and self-care (in the sense of the “International Classification of Functioning, Disability and Health” [ICF]—cf. Refs. [9], [10]), patient data are not considered a “gold standard,” but rather, both patient and provider assessments are indicators that can contribute to an adequate assessment of the latent variables to be measured. Empirical findings to support this position show that patients and providers take different aspects of the construct to be assessed into consideration (cf. Refs. [4], [11]): patients, for example, are more likely to consider the efforts and pain involved in carrying out an activity or the physical and psychological impact they feel when an activity is limited, providers are more likely to consider the observable capacity to carry out an activity.
We will use the other rater model. In this model, the provider instrument should assess a “proxy–proxy perspective” (cf. Ref. [12]). In other words, the provider should assess his own view of the patient's limitations of mobility and self-care, not as he thinks the patient assesses them (i.e., the “proxy-patient perspective,” which corresponds to the proxy model).
If the problem of the discrepancy between patient and provider assessments described earlier is to be studied in an other rater model, it first must be ensured that the data that must necessarily be acquired using two different instruments (a patient and a provider procedure) are actually comparable. The differences in measurements may not be caused by different method properties of the instruments, but only by different response behavior of the assessors. This means that the patient and provider instruments used should measure the same things and show the same scaling. We assume that this requirement is best met if the patient and the provider items belong to one unidimensional scale that meets the requirements of a 1-p item-response theory (IRT) model [13], [14]. We know of no study that has implemented the idea of such a scale design for assessing mobility and self-care or related constructs (such as quality of life) in a psychometric test of a concrete instrument.
Studies that have examined discrepancies between patient and provider data on the basis of analog instruments and use item-response models (e.g., Refs. [15], [16]) treat the corresponding patient and provider items not as different items to be calibrated jointly in one scale, but as two measurements of one and the same item. The advantage of the approach chosen for this study is that, if integration is successful, because of the properties of the 1-p IRT model (cf. Refs. [17], [18]), different subsets of the items (and thus, the patient items on the one hand and the provider items on the other) can be presented independently of one another, and the resulting measurements can still be on one scale and directly comparable. The second problem described earlier (dilemma between selecting a sample and a method) is, thus, solved: for patients who are able to answer a questionnaire, all items are included (patient and provider items, PAT–PRO measurement model), and for patients who are not able to answer a questionnaire, only the provider items (PRO measurement model) can be used. The resulting person parameters are still directly comparable. The disadvantage of the PRO measurement model is its higher measurement error.
Likewise, if the provider cannot provide an assessment, it is possible to acquire data from the patient at least (PAT measurement model) and still combine the data with data sets that include a complete assessment from patient and provider.
For the attempt to combine patient and provider items on mobility and self-care in a unidimensional scale, an instrument should be used whose patient and provider versions were developed in accordance with an IRT model and were tested (each on its own) psychometrically. In addition, the instrument should be oriented toward the structure of the ICF, because this allows the contents of questionnaires on activities to be standardized [19], thus allowing a theory-driven generation of items. The provider version should also include a “proxy–proxy perspective” (described earlier).
All conditions named are met in the MOSES questionnaire we developed, and we, therefore, used it. The MOSES questionnaire is available in an analogous patient version (MOSES-Patient: 58 items) [20] and provider version (MOSES-Provider: 47 items) [21] (see Methods).
The following three hypotheses are examined in this study:
- 1.
The scale-wise integration of patient and provider items of the MOSES questionnaire (referred to from now on as MOSES-Combi) leads to unidimensional scales (possibly after selection of some items) that meet the requirements of a 1-p IRT model (Rasch model, Masters' partial credit model [PCM]), are reliable, and show no differential item functioning (DIF) with respect to age and gender.
- 2.
The person parameters set in the PAT–PRO measurement model show at least moderate agreement (ICCs > 0.40) with the person parameters set in the PRO measurement model and PAT measurement model (so that in the event one data source is not available, use of the remaining data source can be justified).
- 3.
The PAT–PRO measurement model results in a marked increase in measurement accuracy over the PRO measurement model or the PAT measurement model. The measurement error is reduced in the mid-ranges of the scales by at least 25%. (This means that, if possible, both data sources should be used.)