Introduction
Methods
Study design
Population and setting
Study procedures
Analysis
Results
Study sample
Characteristic | Clinicians (N = 233) | Researchers (N = 248) | ||
---|---|---|---|---|
Age (median, range) | 41 | (25–78) | 43 | (23–73) |
Years in practice (average, std. deviation) | 16.5 | (12.5) | – | – |
N
| (%) |
N
| (%) | |
---|---|---|---|---|
Gender | ||||
Female | 108 | (54.8) | 131 | (63.3) |
Male | 89 | (45.2) | 76 | (36.7) |
Missing | 36 | 41 | ||
Race | ||||
White | 145 | (73.6) | 162 | (78.6) |
Black | 3 | (1.5) | 5 | (2.4) |
Asian | 35 | (17.8) | 29 | (14.1) |
Other | 14 | (7.1) | 10 | (4.9) |
Missing | 36 | 42 | ||
Country | ||||
USA | 98 | (50.0) | 87 | (43.1) |
Netherlands | 0 | 0 | 12 | (5.9) |
UK | 6 | (3.1) | 10 | (5.0) |
Canada | 6 | (3.1) | 16 | (7.9) |
Other | 86 | (43.9) | 77 | (38.1) |
Missing | 37 | 46 | ||
Clinician specialty | ||||
Medical oncology | 108 | (54.5) | – | – |
Radiation oncology | 13 | (6.6) | ||
Surgical/gynecologic/urologic oncology | 17 | (8.6) | ||
Oncology nurse practitioner or assistant | 10 | (5.1) | ||
Other | 50 | (25.3) | ||
Missing | 35 | |||
Researcher expertise (more than one may apply) | ||||
Patient perspective | – | – | 22 | (8.9) |
Clinician | 26 | (10.5) | ||
Clinician scientist | 60 | (24.2) | ||
PRO assessment/psychology/sociology | 93 | (37.5) | ||
Clinical trials methods/analysis | 52 | (21.0) | ||
Psychometrics | 55 | (22.2) | ||
Health policy or public health | 37 | (14.9) | ||
Journal editor | 8 | (3.2) | ||
Frequent journal reviewer | 52 | (21.0) | ||
Regulator or administrator | 8 | (3.2) | ||
Other | 23 | (9.3) | ||
PRO research experience | ||||
Current student | – | – | 23 | (11.1) |
Current post-doc | 17 | (8.2) | ||
< 5 years’ experience | 39 | (18.8) | ||
5–10 years’ experience | 50 | (24.2) | ||
> 10 years’ experience | 78 | (37.7) | ||
Missing | 41 |
Findings for line-graph formats
Clinicians | Researchers | |||||
---|---|---|---|---|---|---|
“More” line type
N = 78 | “Normed” line type
N = 77 | “Better” line type
N = 78 | “More” line type
N = 83 | “Normed” line type
N = 83 | “Better” line type
N = 82 | |
N (%) |
N (%) |
N (%) |
N (%) |
N (%) |
N (%) | |
Accuracy of interpretation | ||||||
Physical activities (function) | ||||||
Treatment Xa
| 60 (76.9) | 61 (79.2) | 61 (78.2) | 72 (86.7) | 71 (85.5) | 66 (80.5) |
Treatment Y | 0 | 4 (5.2) | 2 (2.6) | 3 (3.6) | 1 (1.2) | 2 (2.4) |
About the same | 8 (10.3) | 5 (6.5) | 11 (14.1) | 5 (6.0) | 5 (6.0) | 8 (9.8) |
Missing | 10 (12.8) | 7 (9.1) | 4 (5.1) | 3 (3.6) | 6 (7.2) | 6 (7.3) |
Pain (symptom domain) | ||||||
Treatment X | 9 (11.5) | 13 (16.9) | 7 (9.0) | 13 (15.7) | 23 (27.7) | 6 (7.3) |
Treatment Ya
| 53 (67.9) | 50 (64.9) | 64 (82.1) | 59 (17.1) | 48 (57.8) | 62 (75.6) |
About the same | 3 (3.8) | 6 (7.8) | 3 (3.8) | 5 (6) | 5 (6.0) | 6 (7.3) |
Missing | 13 (16.7) | 8 (10.4) | 4 (5.1) | 6 (7.2) | 7 (8.4) | 8 (9.8) |
Number of questions “correct” | ||||||
Both questions | 48 (61.5) | 47 (61.0) | 53 (67.9) | 53 (63.9) | 45 (54.2) | 56 (68.3) |
One question | 17 (21.8) | 17 (22.1) | 19 (24.4) | 25 (30.1) | 29 (34.9) | 16 (19.5) |
Neither question | 13 (16.7) | 13 (16.9) | 6 (7.7) | 5 (6) | 9 (10.8) | 10 (12.2) |
Number of questions “incorrect”b
| ||||||
Both questions | 0 | 0 | 0 | 0 | 0 | 0 |
One question | 9 (11.5) | 17 (22.1) | 9 (11.5) | 16 (19.3) | 24 (28.9) | 8 (9.8) |
Neither question | 69 (88.5) | 60 (77.9) | 69 (88.5) | 67 (80.7) | 59 (71.1) | 74 (90.2) |
Clarity ratings: plain line graphs | ||||||
Very clear | 29 (46.8) | 17 (25.4) | 32 (44.4) | 28 (37.3) | 28 (37.3) | 34 (48.6) |
Somewhat clear | 23 (37.1) | 34 (50.7) | 28 (38.9) | 36 (48.0) | 32 (42.7) | 16 (22.9) |
Somewhat confusing | 10 (16.1) | 15 (22.4) | 11 (15.3) | 11 (14.7) | 12 (16.0) | 18 (25.7) |
Very confusing | 0 | 1 (1.5) | 1 (1.4) | 0 | 3 (4) | 2 (2.9) |
Missing | 16 | 10 | 6 | 8 | 8 | 12 |
Clarity ratings: lines with confidence limits | ||||||
Very clear | 20 (31.2) | 15 (22.1) | 30 (41.1) | 24 (32.9) | 22 (29.3) | 32 (47.8) |
Somewhat clear | 28 (43.8) | 30 (44.1) | 29 (39.7) | 31 (42.5) | 28 (37.3) | 24 (35.8) |
Somewhat confusing | 15 (23.4) | 22 (32.4) | 10 (13.7) | 16 (21.9) | 20 (26.7) | 8 (11.9) |
Very confusing | 1 (1.6) | 1 (1.5) | 4 (5.5) | 2 (2.7) | 5 (6.7) | 3 (4.5) |
Missing | 14 | 9 | 5 | 10 | 8 | 15 |
Clarity ratings: lines with clinical significance | ||||||
Very clear | 29 (46.8) | 22 (32.8) | 34 (47.2) | 35 (47.9) | 28 (37.8) | 32 (47.1) |
Somewhat clear | 28 (45.2) | 34 (50.7) | 27 (37.5) | 30 (41.1) | 32 (43.2) | 30 (44.1) |
Somewhat confusing | 5 (8.1) | 10 (14.9) | 10 (13.9) | 7 (9.6) | 13 (17.6) | 5 (7.4) |
Very confusing | 0 | 1 (1.5) | 1 (1.4) | 1 (1.4) | 1 (1.4) | 1 (1.5) |
Missing | 16 | 10 | 6 | 10 | 9 | 14 |
Comparison | Accuracy of interpretation | Format clarity ratings | ||||||
---|---|---|---|---|---|---|---|---|
Correct response | Incorrect response | Rated “somewhat” or “very” clear | Rated “very” clear | |||||
OR [95% CI] |
p
| OR [95% CI] |
p
| OR [95% CI] |
p
| OR [95% CI] |
p
| |
Model for line-graph formatsa
| ||||||||
Normed v. “More” | 0.80 [0.54–1.21] | .30 |
1.81 [1.07, 3.05]
|
.03
|
0.61 [0.43–0.86]
|
.005
|
0.66 [0.50–0.88]
|
.005
|
“Better” v. “More” | 1.25 [0.81–1.93] | .31 | 0.67 [0.35, 1.27] | .22 | 0.93 [0.65–1.35] | .72 | 1.27 [0.96–1.67] | .09 |
“Better” vs. normed |
1.55 [1.01–2.38]
|
.04
|
0.37 [0.2, 0.67]
|
.001
|
1.53 [1.09–2.14]
|
.01
|
1.91 [1.44–2.54]
|
< .001
|
Confidence limits vs. plain | 0.71 [0.47, 1.07] | .10 | 0.84 [0.48, 1.47] | .53 | 0.73 [0.53, 1.01] | .06 | 0.78 [0.59, 1.03] | .08 |
Asterisks vs. plain | 1.01 [0.66, 1.55] | .95 | 0.99 [0.57, 1.69] | .96 |
1.64 [1.13, 2.38]
|
.01
| 1.15 [0.87, 1.52] | .31 |
Confidence limits vs. asterisks | 0.7 [0.46, 1.06] | .09 | 0.85 [0.48, 1.49] | .57 |
0.44 [0.31, 0.63]
|
< .001
|
0.67 [0.51, 0.89]
|
.006
|
Clinicians vs. researchers | 0.95 [0.67, 1.34] | .77 | 0.75 [0.48, 1.19] | .22 | 1.02 [0.77, 1.35] | .92 | 0.87 [0.69, 1.1] | .24 |
Model for proportions changed formatsb
| ||||||||
Pie charts vs. bar graphs | 1.0 [0.85, 1.19] | .96 |
0.35 [0.2, 0.6]
|
< .001
| 1.12 [0.83, 1.51] | .47 | 1.2 [0.91, 1.59] | .20 |
Clinicians vs. researchers | 1.06 [0.78, 1.44] | .73 | 1.48 [0.82, 2.67] | .20 | 1.02 [0.75, 1.39] | .90 | 1.12 [0.84, 1.49] | .44 |
Clinicians | Researchers | |||||||
---|---|---|---|---|---|---|---|---|
Regular
N = 78 | Normed
N = 77 | Reversed
N = 78 | Alla
| Regular
N = 83 | Normed
N = 83 | Reversed
N = 82 | Allb
| |
N (%) |
N (%) |
N (%) |
N (%) |
N (%) |
N (%) | |||
Preferred format | ||||||||
Plain | 6 (9.8) | 14 (22.2) | 6 (9.0) | 26 (13.6) | 4 (5.9) | 8 (4.1) | 5 (8.2) | 17 (8.5) |
Lines with clinical significance | 27 (44.3) | 20 (31.7) | 19 (28.4) | 66 (34.6) | 32 (47.1) | 28 (38.9) | 26 (42.6) | 86 (42.8) |
Lines with confidence limits | 28 (45.9) | 29 (46.0) | 42 (62.7) | 99 (51.8) | 32 (47.1) | 36 (50.0) | 30 (49.2) | 98 (48.8) |
Missing | 17 | 14 | 11 | 42 | 15 | 11 | 21 | 47 |
Positive comments | Negative comments | Other insights | |
---|---|---|---|
Comments on line-graph type | |||
“More” line type | As plain graphs these are quite clear [C01]a
| They are somewhat confusing…whether it’s physical or fatigue is in one graph lower and in one graph higher…requires very close attention to detail [C04] | Regarding if higher is better or worse—need consistency [R]b
|
It’s confusing whether lines going up or down should have negative or positive connotations because it’s mixed on the four graphs [C07] | They would be better if have the same direction (for example up = better) [R] | ||
“Better” line type | Reviewing the graph, I understand the scale now and it was fairly simple to figure out [C09] | This one is more confusing in that severe fatigue is at the bottom as opposed to the top…my inclination would be that as fatigue worsens it would go up [C09] | It’s a bit unusual to have “reverse” scaling on pain and fatigue (i.e., lines going up means less pain or doing better) [R] |
The two lower graphs were harder to digest…I’d expect it was plotting pain level with higher on the y-axis reporting MORE pain [R] | |||
“Normed” line type | The graphs are quite clear and descriptive [C06] | The contrast between treatments is clear, but the magnitude of the effect is absent [R] | |
No anchors on the Y axis makes it hard to tell if differences are meaningful [R] | |||
Comments on line-graph variations | |||
Plain graphs | The graphs themselves are quite clear [C01] | There is no confidence interval, there is no asterisk, so my only reference point was this little thing that says P = 0.01…it’s hard to know at what time points that was determined at [C02] | I like that the treatment is linked to the specific lines (versus a traditional legend) [R] |
They’re a little bit easier to read [C07] | [The other formats] offer more statistical information that is helpful to the clinician [C05] | What does the p value represent? You have repeated measures, is it the difference at the final time point? [R] | |
The graphs are clear and if these were the only graphs provided to me I wouldn’t know what I was missing [C05] | |||
Indication of clinical significance | I believe the asterisk format is the easiest in showing patient results without the confidence intervals [C06] | The legend says that the difference is determined to be clinically important, although I don’t exactly understand what that means [C02] | Presumably, the method of determining a “clinically important difference” would be explained in the article text [R] |
The asterisks were actually helpful at the different months… because then I know if the differences were significant or not at that point in time [C07] | Using asterisk to denote clinical importance was confusing since it is often used in other studies to reflect statistical significance [R] | How am I supposed to determine clinically significant differences based on these graphs … What measure was used? Is there an established MCID? [R] | |
The “clinically important difference” was indicated by an asterisk. That was helpful [R] | They’ve eliminated the confidence intervals so I don’t really have a sense of how statistically different everything is [C04] | The graph does not indicate how the lines were estimated/calculated and as a result no information how the p-values were calculated [R] | |
Indication of confidence limits | The asterisks and the demarcation of the confidence interval at every time point is very helpful [C05] | I think the confidence intervals plus the asterisks are redundant to see which time points are clinically significant...they were a little harder to read [C07] | The * and p value meanings differ, and the curves therefore highlight the ambiguity concerning statistical significance and clinically important differences [R] |
It gave both standard errors to give you an idea of the disbursement of the data as well as the asterisk to demonstrate statistical significancec in the difference between them [C04] | There are a lot of lines and they’re overlapping so it’s a little bit less easy to interpret, so I think I like the simplicity of the (clinical significance) [C08] | May be difficult for some clinicians to interpret 95% confidence limits as currently explained [R] | |
What’s also quite interesting is that there is a significant clinical change between these two treatments, however, the confidence intervals themselves overlap [C01] | While the error bars are a little bit helpful for understanding the spread of the data better, visually it gets a bit cumbersome [C09] | ||
Comments on formats showing proportions changed | |||
Pie charts | I think it’s just easier for my brain to see and compare the two charts…the bar graph takes me a little bit longer to compare the treatments [C08] | It’s not a format that I’m used to seeing to have the data presented and so it did catch me off guard initially [C09] | At 9 months Treatment Y had 50% improvement, Treatment X had 40% improvement, but the P value is 0.10 demonstrating non-significance. I put down treatments are about the same, but I’m confused as to what the p value truly means [C03] |
A pie chart is always easier on the eye [R] | Bar graph is easier to describe patient results compared to the pie graph [C06] | What I find ambiguous is that…because we have three variables for each comparison, we don’t know if we’re comparing improved versus improved or the ratio of improved to same to worsened [C01] | |
The pie chart gives me percentages of patients stating improvement, worsening or about the same and I think that’s important information in discussions with the patient about treatment decision making [C05] | Please read any graphics design book that you can grab hold of …Never use pie charts! [R] | For the first question on physical function, the answer depended on whether an alpha of 0.05 or 0.10 was being considered. Since none was stated, I assumed the classical alpha = 0.05 [R] | |
Bar charts | (Bar charts)...can show each category, improved, about the same or worsened, head to head against the two treatments…for pie graphs you have to bounce back and forth to see the direct comparisons [C01] | The bar graph takes me a little bit longer to compare the two (treatments) [C08] | We’re given a test of significance but we don’t know what is being compared in that test [C01] |
Beauty is in the eye of the beholder I guess, but the bar graph…when you’re talking with patients I think clearly shows what you’re trying to describe [C06] | These bar charts are very confusing…there is no confidence interval at all…I don’t have a score to determine if patients improved how much did they improve; if patients worsened how much did they worsen [C03] | I think what’s confusing to me is that the p value is in the bottom right of each of the graphs, but it’s just unclear what was being compared [C02] | |
I find this graph to be much easier to read than the pie charts [R] | I find these bar charts to be difficult to interpret. They take more time and likely are going to be more prone to error in interpretation [C05] | Technically the overall p-value that probably derives from a Χ
2 type of test cannot be used to justify statements about individual comparisons as the two statements were phrased…instead the comparative residuals of the expected comparison could be used to highlight where differences occur [R] |
Findings for formats illustrating proportions changed
Clinicians | Researchers | |||||||
---|---|---|---|---|---|---|---|---|
Pie charts
N = 117 | Bar charts
N = 116 | Pie charts
N = 123 | Bar charts
N = 125 | |||||
N
| (%) |
N
| (%) |
N
| (%) |
N
| (%) | |
Accuracy: first format seen | ||||||||
Physical activities (function domain) | ||||||||
Treatment X | 0 | (0) | 4 | (3.4) | 0 | (0) | 2 | (1.6) |
Treatment Y | 69 | (59.0) | 69 | (59.5) | 95 | (77.2) | 70 | (56.0) |
About the samea
| 31 | (26.5) | 27 | (23.3) | 13 | (10.6) | 29 | (23.2) |
Missing | 17 | (14.5) | 16 | (13.8) | 15 | (12.2) | 24 | (19.2) |
Pain (symptom domain) | ||||||||
Treatment Xa
| 87 | (74.4) | 80 | (69) | 101 | (82.1) | 80 | (64.0) |
Treatment Y | 6 | (5.1) | 15 | (12.9) | 2 | (1.6) | 16 | (12.8) |
About the same | 7 | (6.0) | 5 | (4.3) | 5 | (4.1) | 5 | (4.0) |
Missing | 17 | (14.5) | 16 | (13.8) | 15 | (12.2) | 24 | (19.2) |
Number of questions correct | ||||||||
Both questions | 23 | (19.7) | 20 | (17.2) | 8 | (6.5) | 19 | (15.2) |
One question | 72 | (61.5) | 67 | (57.8) | 98 | (79.7) | 71 | (56.8) |
Neither question | 22 | (18.8) | 29 | (25.0) | 17 | (13.8) | 35 | (28.0) |
Number of questions incorrectb
| ||||||||
Both questions | 0 | (0) | 1 | (0.9) | 0 | (0) | 2 | (1.6) |
One question | 6 | (5.1) | 17 | (14.7) | 2 | (1.6) | 14 | (11.2) |
Neither question | 111 | (94.9) | 98 | (84.5) | 121 | (98.4) | 109 | (87.2) |
Clarity rating: first format seen | ||||||||
Very clear | 85 | (42.5) | 71 | (35.5) | 80 | (38.1) | 76 | (36.4) |
Somewhat clear | 63 | (31.5) | 68 | (34) | 70 | (33.3) | 73 | (34.9) |
Somewhat confusing | 37 | (18.5) | 48 | (24) | 49 | (23.3) | 51 | (24.4) |
Very confusing | 15 | (7.5) | 13 | (6.5) | 11 | (5.2) | 9 | (4.3) |
Missing | 33 | 33 | 38 | 39 | ||||
Format preference | ||||||||
Format preferred | 99 | (50.0) | 98 | (50.0) | 90 | (44.3) | 113 | (55.7) |
Missing | 36 | 45 |