Introduction
A patient-reported outcome (PRO) is any report coming directly from patients, without interpretation by physicians or others, about how they function or feel in relation to a health condition and its therapy. PRO measures (PROMs) are instruments that obtain these patient reports [
1]. PROMs capture issues important to patients, such as health-related quality of life (HRQoL), symptoms, or coping. These aspects are distinct from traditional endpoints such as survival, biological response, or observer-rated toxicity because they directly reflect the impact of disease and its treatment from the patient’s perspective [
2].
A number of guidelines for the use of PROs have been developed over the last decade. These include minimum standards for use of PROs in clinical research (International Society for Quality of Life Research [ISOQOL]) [
3], analyzing and reporting PRO results (Setting International Standards in Analyzing Patient-Reported Outcomes and Quality of Life Endpoints Data [SISAQOL]) [
4], Consolidated Standards of Reporting Trials (CONSORT)-PRO [
5]), and how to include PROs in protocols (Standard Protocol Items: Recommendations for Interventional Trials [SPIRIT-PRO]) [
6] and in drug development [
7]. In 2017, a preliminary report described the initial uptake of CONSORT-PRO from publication in 2013 to 2015 as high, with an increasing number of randomized controlled trials (RCTs) citing these guidelines [
8]. Whether CONSORT-PRO has continued to contribute to an improvement in the use and quality of PROs in clinical research, including non-randomized studies, remain to be shown.
PROs can measure both the benefits and side effects of the treatment. Consequently, they have the potential to facilitate patient involvement in treatment decision-making and discussions of what the patient is willing to tolerate [
2]. The use of PROs is particularly relevant to support treatment decisions in trials demonstrating a small or no difference in survival [
9] and to support health policy decisions, including prioritization and organization of health services. PROs are also important in the evaluation of treatments and care for elderly and patients with chronic diseases, who emphasize the maintenance of quality of life and good function [
10].
PROs were included in 27% of trials registered in ClinicalTrials.gov in 2007–2013 [
11] and 45% in the Australian New Zealand Clinical Trials Registry (ANZCTR) from 2005 to 2017 [
12]. However, most studies included PROs as secondary endpoints. Failure to report PROs may lead to under- or over-estimation of the effect of treatment [
9,
13‐
15]. Many studies using PROs have insufficient quality, for example using PROMs with limited psychometric properties. Also, studies that are poorly reported, for instance failing to explain how the PROMs were administered, using non-representative samples, or lacking information on how missing data were handled, can leave the reader in doubt of the quality of the data [
9,
13‐
21].
The primary aim of this review was to compare the number and compliance with PRO-specific criteria of published clinical studies conducted in Europe using PROs in 2008 versus 2018. Secondary aims were to describe the study designs, sample sizes, PROMs used, patient groups studied, and countries where the studies were conducted for each of the two years. We hypothesized that (1) the inclusion of PROs in clinical studies in Europe was higher in 2018 than in 2008 and that (2) a higher proportion of studies (absolute increase of at least 15%) complied with the selected PRO-specific criteria in 2018 compared to 2008.
Discussion
The main finding in this review was that the overall number of publications with PROs was higher in 2018 than in 2008. This may indicate an increasing interest in including patients’ perspectives in clinical research, which can facilitate patient involvement in treatment decision-making and provide guidance for health-care decisions [
2]. This finding supports previous reviews reporting increased numbers of clinical trials with PROs in ClinicalTrials.gov (2007–2013) [
11] and ANZCTR (2005–2017) [
12]. In the present study, a higher proportion of the identified publications from 2018 were ineligible for inclusion than in 2008. A higher proportion of the studies were “non-clinical studies,” e.g., protocols or methodological studies, and “not using PROM.” This may be due to more focus on assessment of validity and reliability of PROMs and more studies using qualitative research or patient-reported experience measures (PREMs) in 2018 than in 2008. More studies in the sample from 2018 included non-European patients, which may reflect an increase in the number of studies using PROs outside this region.
It is notable that only two RCTs, both published in 2018, complied with all CONSORT-PRO criteria [
25,
26]. Several criteria had a high compliance in both years such as 4a (i.e., eligibility criteria for participants), while some had a low compliance in both years, such as 14a (i.e., dates defining the periods of recruitment and follow-up). This may indicate that release of the CONSORT-PRO has had limited impact on reporting so far, which was also found in an earlier review on the topic [
19]. The reason for this is not clear, but worth noting; our review revealed some uncertainty or disagreement about the interpretation of the CONSORT-PRO criteria among the reviewers. Perhaps clinical researchers may perceive the CONSORT-PRO as too ambiguous or comprehensive and therefore fail to use it. It is worth noting that almost half of the publications were not the first (main) publication from the RCTs in question, and more information may have been published elsewhere. In addition, data collection started 3–14 years prior to publication in 2018, meaning that some studies were planned prior to the release of the latest version of the CONSORT-PRO in 2010, and this may have impacted the possibility to meet all criteria.
The proportion of all studies complying with the selected five PRO-specific criteria for reporting in 2008 and 2018 differed for only one criterion. Studies citing the CONSORT-PRO were associated with improved PRO reporting the first years after publication of the CONSORT-PRO extension [
8]. However, many studies may have been planned or conducted prior to the release of the CONSORT-PRO in 2013, up to 14 years prior to publication in one study. Still, the concepts described have been central in PRO research for many years. Other guidelines for PRO research [
3,
4,
6,
7] are also relatively new and may not have reached their full impact yet.
Almost all studies identified a PRO as outcome in the abstract irrespective of publication year. This is consistent with a previous review of RCTs in oncology where 81% identified a PRO [
17] and not surprising given the search terms used in the literature search. A PRO hypothesis, a criterion that only applies for RCTs (15% of the studies in this review), was stated in slightly more than half of the RCTs. This might be due to PROs being secondary aims or explorative endpoints of many studies [
17]. Still, stating a PRO hypothesis should be encouraged. Failure to report a pre-specified PRO hypothesis weakens study results as the reader may be in doubt of whether there is selective reporting or multiple testing [
6].
About 3/4 of all studies published in 2018 reported or cited evidence of PROM validity and reliability, while only about half of the RCTs completely defined pre-specified primary and secondary outcome measures (including how and when they were assessed). The proportion of studies reporting or citing evidence of PROM validity and reliability was similar to that reported for RCTs in oncology [
17], which may suggest that researchers regard validity and reliability as important regardless of study design. Having valid and reliable PROMs is a prerequisite to ensure robust study results that can be used in clinical practice [
5]. To ensure the readers’ confidence in the results, such information should be made available.
The proportion of RCTs that stated statistical approaches for dealing with missing data was similar in 2008 and 2018, although for both years, higher than reported for RCTs in oncology [
17]. Still, it is surprising that fewer than half of all studies reported this, as missing data lead to reduced power, is a potential source of bias and can result in misleading results [
5].
Discussing PRO-specific limitations and implications for generalizability of study findings and clinical practice was more prevalent in 2018 than in 2008. This may reflect an increased understanding of and focus on methodological issues in PRO research for interpretation of data. It may also reflect that researchers want to improve patient treatment through the use of PROs in clinical research.
In 45% of the studies in both 2008 and 2018, fewer than 100 patients were included. Sample size estimates are usually based on the primary endpoint, which may or may not be a PRO. Few included patients may also be due to rare patient groups, small study centers, or difficulty in recruiting patients for logistical or other reasons. This is a concern, because small samples can lead to underpowered studies, without the possibility to answer the research question of interest [
27], redundant research, and wasted resources. For rare patient groups and small centers, multicenter studies should be encouraged to increase the sample size and statistical power. However, there was no difference in the proportion of multicenter studies between 2008 and 2018.
The selected studies used many different PROMs. Some studies did not report a specific PROM, but used non-validated or ad hoc single items or questionnaires developed for their study. The use of non-validated questionnaires or single items from original questionnaires without validation is not recommended because of uncertainty whether the questionnaire measures what it is intended to [
2]. Moreover, the findings from such measures may be difficult to compare with other studies. The most commonly used PROMs in this review have been rigorously tested for reliability and validity, such as the EQ-5D and the SF-36. A large proportion of the included studies used the EQ-5D instrument. This was originally developed for use in health economic analyses [
28], but it is also used as a simple and short measure of HRQoL. The frequent use of EQ-5D in 2018 compared to 2008 may reflect an increase in the number of clinical registries, where this instrument often is included.
More than 70% of the studies included some description of compliance or dropout, such as the number of invited subjects and number who completed PROMs in cross-sectional studies, or dropouts during the study in RCTs and longitudinal/cohort studies. It is worrying that many studies failed to report mode of administration of PROM, as this may nurture many clinicians skepticism about PRO reliability [
29].
Several studies used more than one mode of administration, which has different advantages and disadvantages. As expected, more studies used electronic PROMs in 2018 than in 2008. Many investigators prefer electronic administration, as this facilitates data entry and reduces missing data compared with paper and pencil. On the other hand, there may be accessibility issues that introduce selection bias with electronic administration, while patients may feel less comfortable disclosing sensitive topics in the clinic [
29]. However, a meta-analysis reported that mode of administration does not seem to affect the patients’ response, i.e., increase bias and that the use of a mix of modes of administration may maximize response rates because different modes may be suitable for different patients or patient groups [
30,
31].
Only one study explicitly reported on any type of user (or patient) representation. Many funders or ethical review committees now request documentation on user representation in applications and protocols, but it is not required to report such collaboration in publications. Many of the reviewed studies were planned and conducted several years prior to publication, when such user representation was less common.
Strengths and limitations
This review has several strengths. It assessed the differences in the number and methodology of clinical studies using PROs in 2 years with ten years interval. Several important guidelines, such as the CONSORT-PRO criteria, were published during those 10 years and could have influenced the reporting of the publications. The review assessed diverse studies with different designs in different countries and in a wide range of patient groups. The findings could thus be used for comparison in the future.
Some limitations should be noted. The decision to use only the five PRO-specific criteria from the CONSORT-PRO extension and not the full checklist could be questioned. PRO-specific elaborations presented in the same publication could have been used in addition. Several other CONSORT-PRO criteria also apply to all study designs. However, the intention of this review was not to assess whether the studies complied with the full CONSORT-PRO checklist, but to evaluate the PRO methodology, and for this purpose, the five items were considered sufficient, together with evaluation of other key characteristics such as number of patients included, inclusion of user representatives, and type of PROM used. Several researchers with different backgrounds were involved in the review process. A number of publications included in part I of the review were excluded in part II after closer scrutiny. A more stringent preparation phase could have resulted in a more concise evaluation of eligibility in part I. Furthermore, the review revealed some uncertainty or disagreement about the interpretation and operationalization of the CONSORT-PRO criteria among the reviewers, and a pragmatic approach was chosen. For example, a study would be eligible if it reported on validity or reliability for one of the several PROMs included in the study, but not necessarily for the patient group in question or in the language of administration. Similarly, statistical approaches for dealing with missing data varied from the exclusion of respondents with missing data to the use of more advanced statistical analyses to accommodate missing data, such as linear mixed models. Due to this uncertainty, there were several clarifying discussions during the review process, which could have been exposed at an earlier stage or avoided if we had included pairwise review in the pilot study or in part I, or conducted a pilot test in part II. Finally, the review included studies conducted in European countries that may differ from those conducted in other parts of the world. In addition, publications by systematic search in only one database were included, and searches in other databases could have led to different results. Also, our random selection of 300 publications may not be entirely representative.
Conclusion
The number of clinical studies using PROs in Europe was higher in 2018 than in 2008, and a few methodological aspects seemed to have improved. Altogether, there was little difference between 2008 and 2018 in compliance with the PRO-specific criteria for reporting. Therefore, it seems that published guidelines have had limited impact on the reporting of clinical studies using PROs so far. The large variations in the methodology and reporting of published PRO research may limit the use of PROs to reach its full potential in terms of influence. Higher influence may facilitate the use of PROs to support treatment decisions, health policy, and improve patient care.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.