Constructing evidence-based treatment strategies using methods from computer science

https://doi.org/10.1016/j.drugalcdep.2007.01.005Get rights and content

Abstract

This paper details a new methodology, instance-based reinforcement learning, for constructing adaptive treatment strategies from randomized trials. Adaptive treatment strategies are operationalized clinical guidelines which recommend the next best treatment for an individual based on his/her personal characteristics and response to earlier treatments. The instance-based reinforcement learning methodology comes from the computer science literature, where it was developed to optimize sequences of actions in an evolving, time varying system. When applied in the context of treatment design, this method provides the means to evaluate both the therapeutic and diagnostic effects of treatments in constructing an adaptive treatment strategy. The methodology is illustrated with data from the STAR*D trial, a multi-step randomized study of treatment alternatives for individuals with treatment-resistant major depressive disorder.

Introduction

In the treatment of substance abuse and other mental disorders, there is frequently large heterogeneity in response to any one treatment. As a rule, clinicians must often try a series of treatments in order to obtain a response. Furthermore these disorders are often chronic, requiring clinical treatment over a long-term period with options for altering or switching treatment when side effects or relapse on treatment occur. Consequently, the best clinical care requires adaptive changes in the duration, dose, or type of treatment over time. Clinical guidelines individualize treatment by recommending treatment type, dosage, or duration depending on the patient's history of treatment response, adherence, and burden. Adaptive treatment strategies (Lavori and Dawson, 1998, Lavori and Dawson, 2003, Lavori et al., 2000, Murphy, 2003, Murphy and McKay, 2004, Collins et al., 2004) are operationalized clinical guidelines. These strategies consist of a sequence of “decision rules” which individually tailor a sequence of treatments for individual patients. The decision rules are defined through two components: an input and an output. The input includes information about a patient (e.g. baseline features such as age, concurrent disorders), as well as the outcomes of present or prior treatments (e.g. severity of side effects, etc.). The output consists of one or more recommended treatment options.

Adaptive treatment strategies are different from standard treatments in two ways. First, such strategies consider treatment sequences (as opposed to a single treatment). Such a consideration is essential when the initial treatments lack sufficient efficacy or are not tolerated, or when relapse is common. Second, such strategies consider time varying outcomes to determine which of several possible next treatments is best for which patients. The overall goal is to improve longer term outcomes, as opposed to focusing on only short-term benefits, for patients with chronic disorders.

This paper introduces and describes a novel methodology, instance-based reinforcement learning (Ormoneit and Sen, 2002, Sutton and Barto, 1998), for constructing useful adaptive treatment strategies from data collected during randomized trials. Reinforcement learning was originally inspired by the trial-and-error learning studied in the psychology of animal learning (thus the term “learning”). In this setting, good actions by the animal are positively reinforced and poor actions are negatively reinforced (thus the term “reinforcement”). Reinforcement learning was formalized in computer science and operations research by researchers interested in sequential decision-making for artificial intelligence and robotics, where there is a need to estimate the usefulness of taking sequences of actions in an evolving, time varying, system (Sutton and Barto, 1998).

Reinforcement learning methods differ from standard statistical methods in that these methods can be used to evaluate a given treatment based on the immediate and longer term effects of this treatment in a treatment sequence. Furthermore these methods provide the means to evaluate both the therapeutic and diagnostic effects of treatment. The diagnostic effect of a treatment is the treatment's ability to elicit informative patient responses that permit the clinician to better match the subsequent treatment to the patient. Both diagnostic and therapeutic effects are crucial when evaluating the usefulness of a treatment in an adaptive treatment strategy.

We consider the use of reinforcement learning to analyze data from studies in which patients are randomized multiple, sequential, times (see Stone et al., 1995, Tummarello et al., 1997, Schneider et al., 2001, Fava et al., 2003, Stroup et al., 2003 for examples). Such studies are known as sequential multiple assignment randomized trials (SMART) (Murphy, 2005). Readers unfamiliar with SMART studies are encouraged to initially read Murphy et al. (2007a) for an introduction.

In the following we introduce reinforcement learning and illustrate the concepts using a simple hypothetical SMART study on alcohol dependence. We then provide early results from its use in constructing treatment strategies from a recently completed SMART study called the sequenced treatment alternatives to relieve depression (STAR*D) trial (www.star-d.org) (Fava et al., 2003, Rush et al., 2004). STAR*D is the largest study of major depressive disorder. It was designed as a sequenced multi-step randomized clinical trial of patients with major depressive disorder, with the specific goal of comparing treatments for depressions that have not remitted after one, two, or even three antidepressant treatments. Finally we discuss the methodological and practical challenges of applying similar techniques for constructing treatment strategies to a broader class of chronic disorders, including substance abuse.

Section snippets

Adaptive treatment strategies and reinforcement learning

To discuss reinforcement learning, we first review the definition of an adaptive treatment strategy. Throughout we use terms likely to be familiar to clinician researchers yet for those who would like to follow up this introduction in the more technical literature, we provide the analogous terms used by the computer science community in parentheses.

Adaptive treatment strategies (policies in computer science) are composed of a series of decision rules, one per treatment step. Decision rules are

Instance-based reinforcement learning

To understand instance-based reinforcement learning, it is useful to conceptualize the data from the SMART trial as if the data were a databank. Then when a new patient presents, one searches the databank for similar patients and selects the decision rule that produced the highest value (i.e. worked best) for these similar patients (hence the term “instance-based;” one learns what to do by considering similar instances). The important issue is the choice of an appropriate measure of similarity

Case study

We now perform a case study of the concepts explored above using data from the STAR*D study.

Conclusion

Reinforcement learning provides a set of analysis tools that can be used with data to optimize the sequential decisions that must be made in clinical practice. The methodology is general, and can be applied to constructing adaptive treatment strategies for many chronic disorders, including drug and alcohol dependence. While the type of treatments and strategies considered in drug and alcohol dependence differ from those used in treating depression, the methodology is sufficiently general to be

Acknowledgements

We gratefully acknowledge the contribution of the STAR*D team, in particular investigators at the Texas Southwestern Medical Center and the University of Pittsburgh School of Public Health, who supplied the data necessary for this work. STAR*D was funded in part with Federal funds from the National Institute of Mental Health, National Institutes of Health, under Contract N01MH90003 to UT Southwestern Medical Center at Dallas (P.I.: A.J. Rush). Susan Murphy and Joelle Pineau were funded

References (31)

  • P.W. Lavori et al.

    Developing and comparing treatment strategies: an annotated portfolio of designs

    Psychopharmacol. Bull.

    (1998)
  • P.W. Lavori et al.

    Dynamic treatment regimes: practical design considerations

    Clin. Trials

    (2003)
  • B.S. Linn et al.

    Cumulative illness rating scale

    J. Am. Geriatr. Soc.

    (1968)
  • S.A. Murphy

    Optimal dynamic treatment regimes

    J. R. Stat. Soc. Ser. B

    (2003)
  • S.A. Murphy

    An experimental design for the development of adaptive treatment strategies

    Stat. Med.

    (2005)
  • Cited by (40)

    • Scalable lifelong reinforcement learning

      2017, Pattern Recognition
      Citation Excerpt :

      Reinforcement learning (RL) provides the ability to solve sequential decision-making problems with limited feedback. Applications with these characteristics range from robotics control [1] to personalized medicine [2,3]. Though successful, typical RL methods require a substantial amount of experience before acquiring acceptable behavior.

    • A review on application of reinforcement learning in healthcare

      2023, Cyber Trafficking, Threat Behavior, and Malicious Activity Monitoring for Healthcare Organizations
    View all citing articles on Scopus
    View full text