Original Article
Integration of patient and provider assessments of mobility and self-care resulted in unidimensional item-response theory scales

https://doi.org/10.1016/j.jclinepi.2008.11.014Get rights and content

Abstract

Objective

The objective of this study was to develop a questionnaire that could integrate patient and provider items on mobility and self-care into unidimensional scales. The instrument should be suitable for various measurement models (patient and provider data [PAT–PRO], only patient data [PAT], only provider data [PRO]).

Study Design and Setting

The existing instruments, MOSES-Patient and MOSES-Provider, were integrated into the MOSES-Combi and completed by a total of 1,019 neurology, cardiac, or musculoskeletal patients and/or their physicians (MOSES = acronym for “mobilty and self-care”).

Results

After selection of 18 items, all 12 scales of the MOSES-Combi (87 items) were largely unidimensional, met the standards for a 1-parameter item-response theory (IRT) model, were sufficiently reliable, and showed no differential item functioning (DIF) for age or gender. The person parameters set in the PAT–PRO measurement model show at least moderate, but usually substantial, agreement with those set in the PRO and PAT measurement models.

Conclusion

The advantages of the MOSES-Combi are that it can be used for various measurement models and is suitable for studying agreement between patient and provider assessments because of its psychometric properties (same scaling for patient and provider items). Integration of various data sources in an IRT scale can be extended to other assessments.

Introduction

What is new?

  • The MOSES-Combi questionnaire demonstrates that it is possible to integrate patient and provider items on mobility and self-care to unidimensional scales that meet the requirements of a one-parameter item-response theory (IRT) model and are reliable.

  • This is, thus, an instrument that can be used for various measurement models with different data sources and provides measurements that are on one scale and can be compared directly. This idea can be transferred to other applications.

In the measurement of mobility and self-care in chronic diseases, the question arises as to how activity limitations can be measured. Performance tests based on the standardized observation of activities, reports from patients, and external assessments by providers (physicians, therapists, nurses, and others) are possible; the last two methods (e.g., Refs. [1], [2]) are especially widespread, as they allow a wide range of activity limitations to be included and are economically feasible.

Two problems arise when using patient and provider assessments that should be dealt with in detail:

  • 1.

    Because patient and provider assessments on mobility and self-care generally correlate [3], but do not show good agreement, the question arises regarding the relationship of the two assessments to each other and how the differences between the two perspectives can be explained [4].

  • 2.

    As not all patients with chronic illnesses are capable of giving their own assessment (e.g., there are problems for very old patients or the cognitively impaired [5], [6]), only a provider assessment is possible for both of these groups. This gives rise to the question of how to solve the dilemma between selecting a sample (patients without their own assessments are omitted from the analysis) and selecting a method (patient assessments are not available; only the provider is surveyed) in this situation (see also Ref. [7]).

Snow et al. [8] introduced the differentiation between a proxy data measurement model (proxy model) and an other rater data measurement model (other rater model). In the proxy model, the provider's assessment is considered a mere substitute for the patient assessment, and in the other rater model, the provider's assessment is used to consider another perspective with respect to the construct being studied (e.g., mobility). We assume that in the assessment of mobility and self-care (in the sense of the “International Classification of Functioning, Disability and Health” [ICF]—cf. Refs. [9], [10]), patient data are not considered a “gold standard,” but rather, both patient and provider assessments are indicators that can contribute to an adequate assessment of the latent variables to be measured. Empirical findings to support this position show that patients and providers take different aspects of the construct to be assessed into consideration (cf. Refs. [4], [11]): patients, for example, are more likely to consider the efforts and pain involved in carrying out an activity or the physical and psychological impact they feel when an activity is limited, providers are more likely to consider the observable capacity to carry out an activity.

We will use the other rater model. In this model, the provider instrument should assess a “proxy–proxy perspective” (cf. Ref. [12]). In other words, the provider should assess his own view of the patient's limitations of mobility and self-care, not as he thinks the patient assesses them (i.e., the “proxy-patient perspective,” which corresponds to the proxy model).

If the problem of the discrepancy between patient and provider assessments described earlier is to be studied in an other rater model, it first must be ensured that the data that must necessarily be acquired using two different instruments (a patient and a provider procedure) are actually comparable. The differences in measurements may not be caused by different method properties of the instruments, but only by different response behavior of the assessors. This means that the patient and provider instruments used should measure the same things and show the same scaling. We assume that this requirement is best met if the patient and the provider items belong to one unidimensional scale that meets the requirements of a 1-p item-response theory (IRT) model [13], [14]. We know of no study that has implemented the idea of such a scale design for assessing mobility and self-care or related constructs (such as quality of life) in a psychometric test of a concrete instrument.

Studies that have examined discrepancies between patient and provider data on the basis of analog instruments and use item-response models (e.g., Refs. [15], [16]) treat the corresponding patient and provider items not as different items to be calibrated jointly in one scale, but as two measurements of one and the same item. The advantage of the approach chosen for this study is that, if integration is successful, because of the properties of the 1-p IRT model (cf. Refs. [17], [18]), different subsets of the items (and thus, the patient items on the one hand and the provider items on the other) can be presented independently of one another, and the resulting measurements can still be on one scale and directly comparable. The second problem described earlier (dilemma between selecting a sample and a method) is, thus, solved: for patients who are able to answer a questionnaire, all items are included (patient and provider items, PAT–PRO measurement model), and for patients who are not able to answer a questionnaire, only the provider items (PRO measurement model) can be used. The resulting person parameters are still directly comparable. The disadvantage of the PRO measurement model is its higher measurement error.

Likewise, if the provider cannot provide an assessment, it is possible to acquire data from the patient at least (PAT measurement model) and still combine the data with data sets that include a complete assessment from patient and provider.

For the attempt to combine patient and provider items on mobility and self-care in a unidimensional scale, an instrument should be used whose patient and provider versions were developed in accordance with an IRT model and were tested (each on its own) psychometrically. In addition, the instrument should be oriented toward the structure of the ICF, because this allows the contents of questionnaires on activities to be standardized [19], thus allowing a theory-driven generation of items. The provider version should also include a “proxy–proxy perspective” (described earlier).

All conditions named are met in the MOSES questionnaire we developed, and we, therefore, used it. The MOSES questionnaire is available in an analogous patient version (MOSES-Patient: 58 items) [20] and provider version (MOSES-Provider: 47 items) [21] (see Methods).

The following three hypotheses are examined in this study:

  • 1.

    The scale-wise integration of patient and provider items of the MOSES questionnaire (referred to from now on as MOSES-Combi) leads to unidimensional scales (possibly after selection of some items) that meet the requirements of a 1-p IRT model (Rasch model, Masters' partial credit model [PCM]), are reliable, and show no differential item functioning (DIF) with respect to age and gender.

  • 2.

    The person parameters set in the PAT–PRO measurement model show at least moderate agreement (ICCs > 0.40) with the person parameters set in the PRO measurement model and PAT measurement model (so that in the event one data source is not available, use of the remaining data source can be justified).

  • 3.

    The PAT–PRO measurement model results in a marked increase in measurement accuracy over the PRO measurement model or the PAT measurement model. The measurement error is reduced in the mid-ranges of the scales by at least 25%. (This means that, if possible, both data sources should be used.)

Section snippets

Sample

In 30 rehabilitation centers in Germany, the MOSES-Patient questionnaire was given to 718 patients with musculoskeletal illnesses, 281 patients with cardiac illnesses, and 599 neurology rehabilitation patients at the beginning of rehabilitation. Of these, 549, 212, and 258 patients, respectively, filled out the questionnaire. The dropout rate was, thus, 23.5%, 24.6%, and 59.6%, respectively. The most important reasons given for not responding were refusal (approximately 60% of the dropouts in

Unidimensionality

Particularly in the scales “use of hands and arms” and “walking (without equipment),” problems arose concerning unidimensionality and, in parts, low loadings on the respective factor. A total of 14 of the 105 items were removed (see Appendix on the journal's website at www.elsevier.com). After item selection, there was a dominant factor in the exploratory factor analyses that explained 40.0–64.7% of the variance (median of the 12 scales = 58.4%). The medians of the loadings of items on the

Discussion

The three hypotheses examined were confirmed. After removing a total of 18 items, all 12 scales of the MOSES-Combi are largely unidimensional, meet the requirements of the 1-p IRT model, are sufficiently reliable, and show no DIF with respect to age and gender. In all scales, the person parameters set in the PAT–PRO measurement model show at least “moderate,” but usually “substantial” agreement with the person parameters set in the PAT measurement model and PRO measurement model. The PAT–PRO

Acknowledgments

The project to develop the MOSES questionnaires was conducted with the financial support of the central associations of statutory health insurers in Germany.

The authors wish to thank Ms. Annette Fleitz for her support in data management and the cooperating clinics for their support in data collection: Kirnitzschtal Klinik Bad Schandau, Reha-Klinik Eisenmoorbad, Sanitas Klinikum Sachsenhof, Klinik am Brunnenberg, Klinik am Tharandter Wald, Klinik am See, Fachklinik Wolletzsee, Klinik Malchower

References (41)

  • A.E. Ball et al.

    Problems in using health survey questionnaires in older patients with physical disabilities

    Gerontology

    (2001)
  • L. von Essen

    Proxy ratings of patient quality of life

    Acta Oncol

    (2004)
  • A.L. Snow et al.

    Proxies and other external raters: methodological considerations

    Health Serv Res

    (2005)
  • A.M. Jette

    Toward a common language for function, disability, and health

    Phys Ther

    (2006)
  • G. Stucki et al.

    Foreword. Applying the ICF in medicine

    J Rehabil Med

    (2004)
  • C. Neville et al.

    Learning from discordance in patient and physician global assessments of systematic lupus erythematosus disease activity

    J Rheumatol

    (2000)
  • A.S. Pickard et al.

    Proxy evaluation of health-related quality of life

    Med Care

    (2005)
  • M. Edelen et al.

    Applying item response theory (IRT) modeling to questionnaire development, evaluation, and refinement

    J Rehabil Med

    (2007)
  • K.J. Conrad et al.

    International Conference on Objective Measurement. Applications of Rasch analysis in health care

    Med Care

    (2004)
  • S. Gauggel et al.

    Patient-staff agreement on Barthel Index scores at admission and discharge in a sample of elderly stroke patients

    Rehabil Psychol

    (2004)
  • View full text