Introduction

Biologic therapies represent a recent addition to treatments for inflammatory joint diseases such as rheumatoid arthritis (RA) and psoriatic arthritis (PsA). While their efficacy has been established in a number of clinical trials and cost-effectiveness demonstrated in a number of assessments [1, 2], the evidence base is still associated with substantial uncertainty, and this poses a considerable challenge for decision-making in defining the role of different agents in the sequence of disease-modifying drugs used to manage these chronic diseases. A workshop to explore these challenges took place in 2010, the proceedings of which were disseminated in a series of papers [310]. A key finding of the workshop was that, despite the importance of economic models in guiding policy on the adoption of biologic therapies, there was no clear consensus on how the models should be structured, how they should be informed from data, or even which data were the most appropriate. Moreover, the differences between the models were sufficiently substantial to lead to contradictory recommendations. If consensus views were available beforehand on the desirable properties of the economic model, and the data sources that should inform it, this would assist model development and review to inform future policy decisions. With this in mind, a Consensus Working Party on decision models for biologic therapies in RA and PsA was formed to identify the current scope for consensus, and identify gaps in the evidence base where further research is needed to support future consensus.

Methods

The working party was set up to bring together key expertise as comprehensively as possible. Attendees included leading clinical experts, health economists involved in the development of the main existing cost-effectiveness models that have informed policy making in the UK, and key individuals from Health Technology Assessment organizations and significant funders of research (Table 1). Their remit was to: (1) frame and clarify the issues for which consensus needs to be sought; (2) set out, where possible, initial recommendations for consensus approaches for models, based on sound methodology, clinical judgment, and decision-maker preferences; and (3) set out an agenda for the research needed to achieve consensus where existing evidence is inconclusive. Four main topic areas for consensus were identified, with specific issues to address for each area (Table 2). Further details of these issues, and their representation in existing models, are presented elsewhere [8].

Table 1 Members of the Consensus Working Party
Table 2 Overview of topics and issues for consensus

Two 1-day working party meetings were held at the University of Birmingham in November 2011 and March 2012. Position papers describing each of the issues above were circulated prior to each meeting, with members given time to provide feedback and suggest additional considerations. These papers defined the agenda for each meeting, where consensus among participants was sought for each aspect, guided by an understanding of the clinical aspects of RA and PsA, and the principles of evidence-based medicine, as set out in documents such as the Cochrane handbook [11] and the UK National Institute for Health and Care Excellence (NICE) methods guide for technology appraisals [12].

Where divergent opinions remained, participants were asked to identify research programs whose results would lead to greater clarity and consensus in these areas. We present below, for each model aspect listed above, a summary of the consensus recommendations, the outcome of discussions held on this topic at the workshops, and recommendations for further research to enhance future consensus. A more detailed report of the background, process and outcomes for the consensus working party is available in online supplementary material. The report was reviewed by a separate independent panel of clinical experts, and their commentary is also available online.

This article is based on the discussions of the Consensus Working Party and does not involve any new studies of human or animal subjects performed by any of the authors.

Results

Modeling the Initial Response to Treatment

Summary of Consensus View

  • Disease Activity Score 28 (DAS28) should be used to represent initial response to treatment in RA.

  • Models should reflect current guidelines and withdraw treatment from patients with an inadequate response. The timing of this should follow current clinical guidelines, to aid comparison of results among models, although the impact of alternative stopping rules can be explored in sensitivity analysis.

  • Currently, robust evidence for effect modification has not been identified, and effect modification should not be included in evidence synthesis of initial response treatment effects.

  • Models should represent the cause for discontinuation of treatment (i.e., lack of response or adverse events).

  • Estimates of short-term response to biologics should be based on all relevant trials and derived using formal evidence synthesis methodology that respects randomization. Mapping functions should be used within the synthesis so that trials can be included even if they do not report DAS28.

  • Mapping functions should also be used to relate changes in DAS28 to changes in the measure used to represent long-term disease progression.

  • Response rates to the non-biologic comparator can be based on pooling control arms from biologic trials, although the comparability of trial and decision populations should be considered.

  • For PsA, PsA Response Criteria (PsARC) and Psoriasis Area Severity Index (PASI) should be used as outcome measures, although disease-specific measures currently in development may be used once they have been validated.

Outcome of Workshop Discussions

Modeling the Initial Treatment Phase

Stopping rules do not fully reflect the complexity of clinical decision-making at a patient level. However, their use within models is required to synthesize trial evidence, link short-term and long-term outcomes, and explore the cost-effectiveness implications of different guidance. Therefore, models should include such stopping rules, as long as it is recognized that they do not fully specify outcomes at a patient level. Currently, for RA, the most appropriate measure to base such stopping rules on is DAS28, because:

  • DAS28 most closely reflects clinical benefit of treatment in the short term.

  • Relatively small changes are still clinically meaningful to patients.

  • It is an absolute scale (although the related European League Against Rheumatism [EULAR] response categories depend on both absolute change in DAS and DAS at endpoint).

  • It is particularly appropriate for the UK, where it has received support from clinical experts in previous NICE appraisals, and is the basis of current NICE guidance.

American College of Rheumatology (ACR) 20/50/70 was considered problematic because it is a relative measure. However, given that it is commonly reported there is a clear need for mapping functions to characterize the relationship between the two measures, as it is not appropriate to exclude relevant studies solely because they do not report DAS28. For PsA, both outcomes (skin and joint symptoms) need to be considered when modeling the initial treatment phase. PsA is a heterogeneous condition, and there are types of PsA where DAS28 could be the most appropriate measure of response for joint symptoms. However, disease-specific measures for PsA are in development, so efforts to shift from PsARC are unlikely to be worthwhile.

Effect Modification

A number of factors are potential modifiers for relative effects of treatment on responder status. Mechanisms for effect modification include ‘treatment resistance’ (failure to respond to previous drugs may indicate a lesser chance of responding to the current drug) and ‘accumulated damage’ (disease duration is associated with joint damage). Effect modification may be more influential with ACR 20/50/70 response, as this is a relative response measure, sensitive to baseline disease activity, than with DAS28, which is an absolute measure.

Choice and Use of Evidence to Estimate Effect of Treatment on Initial Response

When performing a synthesis of evidence to inform modeled treatment effects, trials in biologic-naive patients should be analyzed separately from trials in patients with prior biologic exposure, as should trials in biologics with or without concomitant disease-modifying antirheumatic drugs (DMARDs). Formal models for effect modification could be derived from individual patient data (IPD) sourced from trials, or from observational data. A concern with the latter is potential selection bias. Where data is weak, expert elicitation could guide adjustments related to changes in position within the sequence. However, in the absence of convincing evidence for effect modification, the simpler approach of using unadjusted treatment effects is preferable, particularly if an absolute scale such as DAS28 is used for response.

Estimating the Baseline Response in the Comparator Treatment

For modeling purposes, relative treatment effects need to be applied to the absolute proportion of (DAS28) responders that would be seen if a conventional DMARD was given instead of a biologic at the relevant point in the sequence. The absolute rate from the control arm of a biologic trial has often been used for this purpose, as have absolute rates from trials of conventional DMARDs. An alternative would be to use registry data. The latter would match the required patient profile most closely, but would be vulnerable to issues such as selection bias. Therefore, the approach of pooling control arms from trials with populations similar to the decision population was preferred.

Modeling Adverse Events in the Initial Treatment Phase

The reason for not continuing treatment past the initial phase may have consequence for the choice and efficacy of subsequent treatments, and may also have cost implications. Models should therefore distinguish between adverse events and lack of efficacy as reasons for short-term treatment termination. Information on adverse event rates for different biologics will be reported by most trials. Models should not exclude trials that do not report causes for treatment discontinuation. This can be avoided by estimating the overall discontinuation rate and the split between causes, rather than estimating the absolute rate for each cause.

Current Available Evidence and Further Research Needs

Mapping Between (Change in) DAS28 and ACR 20/50/70

While DAS28 is the preferred measure of short-term response to treatment for the RA consensus model, many trials report ACR 20/50/70 instead. Research is required to develop mappings between ACR 20/50/70 and DAS28, so that DAS28-based models are informed by all relevant trials. Few, if any, data sources collect or report both measures. Therefore, mappings will need to be constructed through indirect comparison with other outcome measures sensitive to disease activity. Since ACR measures are relative, while DAS28 is an absolute scale, mappings should allow for dependence on baseline disease activity. IPD from trials would be the ideal evidence for this, potentially supplemented by registry data (e.g., estimation of the DAS28/Health Assessment Questionnaire [HAQ] change relationship from the British Society for Rheumatology Biologics Register [BSRBR]).

Mapping Between Existing PsA Outcome Measures (PsARC, PASI) and Composite Measures Currently in Development

The evaluation group for the UK NICE appraisal of biologics for PsA developed a Bayesian network meta-analysis to synthesize trial evidence on short-term response to biologics [13]. Treatment effects were estimated on four outcomes: Psoriatic Arthritis Response Criteria (PsARC), Arthritis Response Criteria (ARC; both for joint symptoms), Psoriasis Area and Severity Index (PASI; for skin symptoms), and HAQ (for functional impact). The version informing the economic model involved a positive correlation between PsARC and PASI response. The analysis, once updated and extended to include newer treatments, satisfies the requirements of the workshop consensus and should inform future PsA models that are based on PsARC and PASI response. The Group for Research and Assessment of Psoriasis and Psoriatic Arthritis (GRAPPA) is an international organization actively engaged in the development of response measures in PsA [14]. The GRAppa Composite Exercise (GRACE) study has collected data on multiple PsA dimensions and has recently developed novel composite responder indices [15]. If clinical practice changes as a result of these developments, further research will be required to develop mapping functions between new and existing response measures.

Updating Reviews of Short-Term Adverse Events

The consensus model requires estimation of the proportional split between lack of efficacy and adverse events for those who discontinue treatment at an early stage, based on comprehensive and up-to-date evidence. Systematic reviews of biologic trials undertaken to inform UK NICE technology appraisals can be used to identify this evidence base. There are additional reviews of adverse events in the literature [16]. Systematic reviews of sequential biologic therapy have also assessed the impact on the efficacy of a second biologic of having experiences adverse events on the first biologic [17]. This evidence base needs to be collated, updated and synthesized to inform the consensus model.

Modeling the Long-Term Treatment Phase

Summary of Consensus View

  • HAQ should be used to represent disease progression, although a multidimensional measure which includes pain should be considered for mapping disease progression to health utilities.

  • The source for mappings used between outcome measures should be clearly stated and justified, and be consistent with current applied and methodological research.

  • Survival models may be used to extrapolate beyond the follow-up period of data on the duration of successful long-term treatment. All relevant data should be used to fit such models; this may include open-label trial follow-up and registry data. However, treatment duration differences between biologics should not be assumed based on observational data alone.

  • Assumed rates of HAQ progression should be consistent with observations from longitudinal data.

  • Models should distinguish between adverse events and loss of efficacy as reasons for treatment withdrawal.

  • The rebound in disease progression on treatment withdrawal should be evidence-based as far as possible. Where multiple scenarios are consistent with the available evidence, the impact of alternative plausible assumptions should be explored through sensitivity analysis.

Outcome of Workshop Discussions

HAQ has been widely used in models to represent disease progression, for historical reasons. Several mapping algorithms between HAQ and quality of life (QoL) measures (e.g., EQ5D) have been developed and used in existing models [18, 19]. However, algorithms for mapping between outcome measures such as HAQ and EQ5D are an area of active research [20], and the most appropriate algorithm for use in decision models may change over time. For example, recent research has suggested that pain has an important influence on QoL in patients with RA, independent of HAQ [24]. Therefore, models could in future use a multidimensional (HAQ and pain) outcome measure for disease progression.

Observational data have been used to estimate the duration of treatment and the rate of change in HAQ over time while on treatment, and sometimes support assumed differences between biologics. Models should not be ‘hard-wired’ to exclude such differences, but the reference case should only allow differences between drugs of the same class if based on data from randomized studies. The impact of differences inferred from observational data could be explored in supplementary analyses, but estimates should reflect the increased risk of bias. The estimates may be more credible if based on observational data collected in a clearly relevant population, or on a synthesis of multiple sources of non-randomized evidence.

HAQ progression is sometimes assumed to be zero on biologics. This is not biologically credible in the long term in view of the effect of ageing on HAQ. Further long-term data are needed in RA and PsA populations in remission. Current models for non-biologics assume linear progression at a rate which appears to result in too many people reaching the HAQ ceiling too quickly. Registries may give some data on HAQ progression, and elicitation could also be used to incorporate expert opinion on long-term HAQ progression where existing data is insufficient. Mixture models have been fitted to registry data showing distinct sub-populations with different HAQ trajectories. By averaging over these trajectories, a more realistic non-linear model could be developed for HAQ progression over time.

It is important to record the reason for treatment switching, as this can influence the choice and efficacy of subsequent treatment. However, there is an interaction between these factors, since adverse events are more likely to lead to treatment being withdrawn if efficacy is diminished. Estimates of rebound on treatment termination should be based on data and assumptions avoided as far as possible. However, observations rarely coincide with treatment switching decisions. Expert elicitation may be necessary to determine the most appropriate assumption. While rebound may in fact occur over a period of time, a step change is an acceptable simplifying assumption. Rebound effects are likely to differ between RA and PsA patients, and data on the former should not be used as a basis for estimating rebound in the latter.

Current Available Evidence and Further Research Needs

Estimating Duration of Treatment in Responders

Existing models use diverse data sources for estimates of biologic treatment duration, and interpret those data in different ways. None of these approaches were thought to satisfy the requirements of the consensus model, and further research is required to establish treatment duration distributions based on up-to-date and relevant data. Registries have several advantages as the basis for estimating this information—they are often comprehensive, provide detailed patient-level data, and are up-to-date. Registries could also be used to explore the impact of effect modification and the extent to which treatment duration differs between biologics, although as a non-randomized data source such analyses should be interpreted with caution.

Disease Progression on Long-Term Treatment

The consensus group also felt existing modeling approaches to disease progression were not appropriate for the consensus model. In particular, the assumption of linear HAQ progression leads to patients in the model reaching HAQ ceiling values earlier than is observed with real patients. Research is currently underway exploring non-linear HAQ progression models. Once this research is fully available it may prove an appropriate basis for the consensus approach. If the data available do not provide definitive evidence for long-term HAQ progression, they may be supplemented with elicitation of expert opinion.

Mappings Between Disease-Specific Severity Measures and Health-Related QoL

Mappings between HAQ and QoL scores have been developed using trial and/or observational data. Mappings currently used in models do not account for the independent impact of pain on QoL, and do not draw fully on all currently available evidence. Further work is required to produce definitive mapping functions between HAQ scores (with pain if appropriate) and QoL. This will first involve identifying the appropriate data sources, which may include registries and/or IPD from trials (if available). The appropriate method for deriving mapping algorithms from this data will then need to be identified. For PsA, data collected by the GRACE study may provide information to map combined joint, skin and pain symptoms to QoL scores.

Impact of Treatment Switching on HAQ

Empirical estimates of HAQ rebound on treatment withdrawal are challenging to derive and lacking in existing literature. Such estimates could be derived from registry data, although follow-up visits often do not coincide with treatment withdrawal, limiting the accuracy of estimation. Elicitation techniques could be used to capture clinical judgment on rebound if empirical approaches are unsuccessful. Given the challenges of estimating rebound, the sensitivity of cost-effectiveness findings to alternative assumptions should be explored within the consensus model.

Estimating Lifetime Costs and Benefits

Summary of Consensus View

  • Models should allow for an association between disease severity and mortality.

  • Models should adopt the decision-maker’s chosen perspective for costs included. This may involve assuming health care utilization to be a function of disease severity.

Outcome of Workshop Discussions

There is evidence to suggest disease severity has an impact on age-adjusted mortality risk, but not to suggest that choice of treatment has any additional influence on mortality. For PsA, skin symptoms may be additionally associated with mortality. The cost perspective of a model should reflect the preferences of the decision-maker involved. In the UK, for example, the reference case perspective for NICE technology appraisals is health and personal social care costs only. An acceptable approach to modeling the indirect impact of treatment on such costs is to assume a relationship between disease severity and resource use. For PsA, resource use should be modeled as a function of both joint and skin symptoms (although double counting should be avoided). Where models use discrete time-cycles, cycle duration should be short enough to accurately reflect resource use patterns.

Current Available Evidence and Further Research Needs

Arthritis Health Care Utilization

Research is required to collate diverse evidence on the relationship between disease severity in RA and PsA and healthcare utilization. This research should initially take the form of identifying current literature and appropriate data sources. The relationship between disease severity and health care utilization has been estimated in several published analyses drawing on routine data. Work that has informed existing models includes analysis of registry data from the US [21] and Sweden [22]. More recently, analysis has been published of the total costs for patients with RA and PsA, including productivity losses, using Norwegian registry data [23].

Mortality and Disease Severity

There are conflicting findings in the literature regarding the relationship between mortality and disease severity. Research is therefore required to establish a definitive estimate for the consensus model. Routine data may provide the most appropriate source for this relationship. For example, Lunt et al. have analyzed mortality data in the BSRBR for this relationship [24], and their analysis included covariates such as disease duration and severity. Additional research would identify the full current evidence base and use this to derive the consensus relationship, either through synthesis of multiple evidence sources or establishing clinical consensus on the most appropriate data source.

Structural Modeling Approaches

Summary of Consensus View

  • Models should be able to represent response for each biologic therapy in a sequence, but do not need to model individual post-biologic conventional DMARDs.

  • Individual patient models have several advantages when representing RA and PsA patient histories, but the merits of cohort modeling approaches should also be explored.

Outcome of Workshop Discussions

Models should have the flexibility to explore alternative positions for biologics within the sequence of treatments. While there may be benefit in modeling specific DMARD sequences once biologic therapies have been exhausted, the group felt that treatments have limited effects at this stage in practice. Therefore, it is preferable not to explicitly model sequences of conventional DMARDs following biologic therapy, unless data on such patients becomes available that credibly challenges this view.

The group noted that both cohort and individual sampling approaches have been adopted by previous models, and there were divergent views over the relative merits of these approaches. Guidance exists in the literature on factors which should influence the choice of model type [25, 26]; as a general principle, models should be as simple as possible whilst remaining consistent with the underlying decision problem and theory of disease [27]. However, the appropriate model structure for the evaluation of biologics in arthritis has not been definitively established in the literature, and remains a question of both practical and methodological interest.

Current Available Evidence and Further Research Needs

Given the alternative approaches to model structure in existing models, future research should involve developing models that follow the consensus approach as closely as possible whilst adopting alternative structures, to evaluate how closely each model structure is able to follow the consensus approach and the impact of structure on model results.

Discussion

Decision-analytic models have become a key resource in health technology assessment. However, models are often developed independently by manufacturers, academic groups and regulatory bodies, leading to a range of models with divergent structures and conclusions, as is the case for biologic therapies for inflammatory joint diseases [8]. This can lead to confusion over the assumptions and data selection choices driving results, and skepticism of the validity of model results. Our aim was to show how a process of bringing together independent modeling and clinical experts could lead to clear consensus guidance for future models, increasing their credibility. It may not be feasible or desirable to require manufacturers or academic experts to follow the consensus approach in every detail. The former might view this as restricting their ability to fairly present the benefits of their product, and the latter might wish to follow their own academic opinion on the appropriate modeling approach for a specific policy question. However, if they were encouraged to set out how their models differed from the consensus approach, and present the impact of this deviation on their results, the resulting transparency would enhance the credibility of recommendations derived from those models, and help decision-makers understand the reasons behind any differences in findings between models.

One limitation of this consensus is that the working group consisted solely of UK-based clinicians, models and regulators. Health technology assessment clearly has aspects that are country-specific, and this may mean that certain elements of our consensus would need to be adapted to other contexts. However, the structures we have followed, and many of our findings, are relevant internationally. Our work also provides a case study of a process that can easily be extended to support decision-making in other disease areas. The process of developing consensus, and identifying its current limits, has the added benefit of highlighting areas where further research is most needed to support reimbursement decisions.