main-content

Gepubliceerd in:

01-12-2008

# Patient-reported outcomes and the mandate of measurement

Auteur: Gary Donaldson

Gepubliceerd in: Quality of Life Research | Uitgave 10/2008

• Optie A:
• Optie B:

## Abstract

### Purpose

Coherent clinical care depends on answering a basic question: is a patient getting worse, getting better, or staying about the same? This can prove surprisingly difficult to answer confidently. Patient-reported outcomes (PROs) could potentially help by providing quantifiable evidence. But quantifiable evidence is not necessarily good evidence, as this article details.

### Method

The fundamental mandate of measurement requires that errors in making an assessment be smaller than the distinctions to be measured. This mandate implies that numerical observations of patients may be poor measurements.

### Results

Individual assessments require high measurement precision and reliability. Group-averaged comparisons cancel out measurement error, but individual PROs do not. Individual PROs generate numbers, to be sure, but the numbers may fall short of what we should demand of measurements. When typical errors of measurement are large, it is not possible to answer confidently even the modest question of whether a patient is getting worse or getting better.

### Conclusion

This article explains some theory behind the mandate of measurement, provides several examples based on clinical research, and suggests strategies to measure and monitor individual patient outcomes more precisely. These include more frequent low-burden assessments, more realistic confidence levels, and strengthened measurement that integrates population data.
Voetnoten
1
Under the “usual” assumptions of constant variance and conditional independence.

2
The order-of-magnitude difference in variability between individual and averaged data captured by Figs. 1 and 2 is completely representative. In 30 years’ experience with patient-reported subjective ratings and standardized questionnaires collected longitudinally, I have never failed to observe it. That the discrepancy still surprises owes to the fact that journals seldom publish individual trajectories, leaving readers with the impression that mean trend lines are representative of individuals.

3
The confidence intervals for rating scales such as these become smaller near the limits of the scales. This issue is related to floor and ceiling effects that pose additional measurement problems beyond the scope of this paper. To illustrate the ideas, I ignore restriction-of-range issues, and assume confidence intervals located in the middle ranges of the scales, where they are largely constant. Similarly, I do not address the issue of discrete (numerical ratings) versus continuous (visual analog) formats.

4
Some scales now available are capable of very precise measurement if length and patient burden are not concerns. Dynamic adaptive testing methods work well to generate efficient measurement while minimizing burden, but would still require several questions to achieve very high levels of precision. Methods based on item response theory are in general more sophisticated and efficient than classical psychometric approaches, but for the purposes of this paper the differences are minor ones and not central to the main points.

5
The linear trend always represents the average rate-of-change, even when the data suggest nonlinearity. Subtle modeling issues notwithstanding, the linear trend is an excellent summary measure when the clinical question concerns whether patients are “getting better” or “getting worse.”

6
In general, standard errors for any weighted combination of single assessments is given by the matrix formula $$(c'\Uptheta c)^{1/2}$$, where c is a weighted contrast or difference, and Θ is the sampling error covariance matrix over the repeated assessments of an individual. In the typical case, the diagonal elements of Θ are squared SEMs, and the off-diagonal elements are zero, but more general scenarios are possible (e.g., autocorrelation or heterogeneity in the SEM over time).

7
In fact, it is the maximum likelihood estimate. But in what follows I try to rely on ordinary language meaning and to minimize technical statistical vocabulary. In the same vein, I use “likely” as an intuitive term meaning “a good guess” without intending either Bayesian or frequentist subtleties, and use the noun “estimate” to mean an informed guess of a person’s true but unknown value.

8
This particular representation is more natural in a Bayesian than a frequentist interpretation, but the same points can be made equivalently in either framework. The curve simply shows that good guesses for the unknown true value are closer to the sample measurement, while poorer guesses are farther away, on either interpretation.

9
For example, when exceeding a clinical threshold would invoke aggressive and risky therapy, it may be important to be nearly certain that the true value exceeds the threshold.

Literatuur
1.
Hays, R. D., Brodsky, M., Johnston, M. F., Spritzer, K. L., & Hui, K. K. (2005). Evaluating the statistical significance of health-related quality-of-life change in individual patients. Evaluation & the Health Professions, 28(2), 160–171. doi: 10.​1177/​0163278705275339​. CrossRef
2.
Nunnally, J. C., & Bernstein, I. H. (1994). Psychometric theory. New York: McGraw-Hill.
3.
Schubert, M. M., Williams, B. E., Lloid, M. E., Donaldson, G., & Chapko, M. K. (1992). Clinical assessment scale for the rating of oral mucosal changes associated with bone marrow transplantation. Development of an oral mucositis index. Cancer, 69(10), 2469–2477. doi:10.1002/1097-0142(19920515)69:10<2469::AID-CNCR2820691015>3.0.CO;2-W.
4.
Syrjala, K. L., Donaldson, G. W., Davis, M. W., Kippes, M. E., & Carr, J. E. (1995). Relaxation and imagery and cognitive-behavioral training reduce pain during cancer treatment: A controlled clinical trial. Pain, 63(2), 189–198. doi: 10.​1016/​0304-3959(95)00039-U.
5.
Joint Commission on Accreditation of Healthcare Organizations. Pain standards for 2001, (2001).
6.
Chapman, C. R., Nakamura, Y., Donaldson, G. W., Jacobson, R. C., Bradshaw, D. H., Flores, L., et al. (2001). Sensory and affective dimensions of phasic pain are indistinguishable in the self-report and psychophysiology of normal laboratory subjects. The Journal of Pain, 2(5), 279–294. doi: 10.​1054/​jpai.​2001.​25529.
7.
Coda, B. A., O’Sullivan, B., Donaldson, G., Bohl, S., Chapman, C. R., & Shen, D. D. (1997). Comparative efficacy of patient-controlled administration of morphine, hydromorphone, or sufentanil for the treatment of oral mucositis pain following bone marrow transplantation. Pain, 72(3), 333–346. doi: 10.​1016/​S0304-3959(97)00059-6.
8.
Donaldson, G. W., Chapman, C. R., Nakamura, Y., Bradshaw, D. H., Jacobson, R. C., & Chapman, C. N. (2003). Pain and the defense response: Structural equation modeling reveals a coordinated psychophysiological response to increasing painful stimulation. Pain, 102(1–2), 97–108. doi: 10.​1016/​s0304-3959(02)00351-2.
9.
Fosnocht, D. E., Chapman, C. R., Swanson, E. R., & Donaldson, G. W. (2005). Correlation of change in visual analog scale with pain relief in the ed. The American Journal of Emergency Medicine, 23, 55–59. doi: 10.​1016/​j.​ajem.​2004.​09.​024.
10.
Fosnocht, D. E., Swanson, E. R., Donaldson, G. W., Blackburn, C. C., & Chapman, C. R. (2003). Pain medication use before ed arrival. The American Journal of Emergency Medicine, 21, 435–437. doi: 10.​1016/​S0735-6757(03)00092-5.
11.
Rowley, S. D., Donaldson, G., Lilleby, K., Bensinger, W. I., & Appelbaum, F. R. (2001). Experiences of donors enrolled in a randomized study of allogeneic bone marrow or peripheral blood stem cell transplantation. Blood, 97(9), 2541–2548. doi: 10.​1182/​blood.​V97.​9.​2541.
12.
Laird, N. M., Donnelly, C., & Ware, J. H. (1992). Longitudinal studies with continuous responses. Statistical Methods in Medical Research, 1(3), 225–247. doi: 10.​1177/​0962280292001003​02.
13.
Laird, N. M., & Ware, J. H. (1982). Random-effects models for longitudinal data. Biometrics, 38(4), 963–974. doi: 10.​2307/​2529876.
14.
Littell, R. C., Milliken, G. A., Stroup, W. W., & Wolfinger, R. D. (1996). Sas system for mixed models. Cary, NC: SAS Institute inc.
15.
Cleveland, W. S. (1985). The elements of graphing data. Monterey, CA: Wadsworth.
16.
17.
Louis, T.A., & Zeger, S.L. (2007).Effective communication of standard errors and confidence intervals, Johns Hopkins University Department of Biostatistics Working Papers.
18.
Donaldson, G. W., & Moinpour, C. M. (2002). Individual differences in quality-of-life treatment response. Medical Care, 40(6 Suppl), III39–III53. doi: 10.​1097/​00005650-200206001-00007. PubMed
19.
McIntosh, M. W., & Urban, N. (2003). A parametric empirical bayes method for cancer screening using longitudinal observations of a biomarker. Biostatistics (Oxford, England), 17, 27–40. doi: 10.​1093/​biostatistics/​4.​1.​27.
20.
McIntosh, M. W., Urban, N., & Karlan, B. (2002). Generating longitudinal screening alorithms using novel biomarkers for disease. Cancer Epidemiology, Biomarkers & Prevention, 11, 159–166.
Metagegevens
Titel
Patient-reported outcomes and the mandate of measurement
Auteur
Gary Donaldson
Publicatiedatum
01-12-2008
Uitgeverij
Springer Netherlands
Gepubliceerd in
Quality of Life Research / Uitgave 10/2008
Print ISSN: 0962-9343
Elektronisch ISSN: 1573-2649
DOI
https://doi.org/10.1007/s11136-008-9408-4

Naar de uitgave