Scolaris Content Display Scolaris Content Display

Exercise therapy for chronic fatigue syndrome

This is not the most recent version

Collapse all Expand all

Background

Chronic fatigue syndrome (CFS) is characterised by persistent, medically unexplained fatigue, as well as symptoms such as musculoskeletal pain, sleep disturbance, headaches and impaired concentration and short‐term memory. CFS presents as a common, debilitating and serious health problem. Treatment may include physical interventions, such as exercise therapy, which was last reviewed in 2004.

Objectives

The objective of this review was to determine the effects of exercise therapy (ET) for patients with CFS as compared with any other intervention or control.

• Exercise therapy versus 'passive control' (e.g. treatment as usual, waiting‐list control, relaxation, flexibility).

• Exercise therapy versus other active treatment (e.g. cognitive‐behavioural therapy (CBT), cognitive treatment, supportive therapy, pacing, pharmacological therapy such as antidepressants).

• Exercise therapy in combination with other specified treatment strategies versus other specified treatment strategies (e.g. exercise combined with pharmacological treatment vs pharmacological treatment alone).

Search methods

We searched The Cochrane Collaboration Depression, Anxiety and Neurosis Controlled Trials Register (CCDANCTR), the Cochrane Central Register of Controlled Trials (CENTRAL) and SPORTDiscus up to May 2014 using a comprehensive list of free‐text terms for CFS and exercise. We located unpublished or ongoing trials through the World Health Organization (WHO) International Clinical Trials Registry Platform (to May 2014). We screened reference lists of retrieved articles and contacted experts in the field for additional studies

Selection criteria

Randomised controlled trials involving adults with a primary diagnosis of CFS who were able to participate in exercise therapy. Studies had to compare exercise therapy with passive control, psychological therapies, adaptive pacing therapy or pharmacological therapy.

Data collection and analysis

Two review authors independently performed study selection, risk of bias assessments and data extraction. We combined continuous measures of outcomes using mean differences (MDs) and standardised mean differences (SMDs). We combined serious adverse reactions and drop‐outs using risk ratios (RRs). We calculated an overall effect size with 95% confidence intervals (CIs) for each outcome.

Main results

We have included eight randomised controlled studies and have reported data from 1518 participants in this review. Three studies diagnosed individuals with CFS using the 1994 criteria of the Centers for Disease Control and Prevention (CDC); five used the Oxford criteria. Exercise therapy lasted from 12 to 26 weeks. Seven studies used variations of aerobic exercise therapy such as walking, swimming, cycling or dancing provided at mixed levels in terms of intensity of the aerobic exercise from very low to quite rigorous, whilst one study used anaerobic exercise. Control groups consisted of passive control (eight studies; e.g. treatment as usual, relaxation, flexibility) or CBT (two studies), cognitive therapy (one study), supportive listening (one study), pacing (one study), pharmacological treatment (one study) and combination treatment (one study). Risk of bias varied across studies, but within each study, little variation was found in the risk of bias across our primary and secondary outcome measures.

Investigators compared exercise therapy with 'passive' control in eight trials, which enrolled 971 participants. Seven studies consistently showed a reduction in fatigue following exercise therapy at end of treatment, even though the fatigue scales used different scoring systems: an 11‐item scale with a scoring system of 0 to 11 points (MD ‐6.06, 95% CI ‐6.95 to ‐5.17; one study, 148 participants; low‐quality evidence); the same 11‐item scale with a scoring system of 0 to 33 points (MD ‐2.82, 95% CI ‐4.07 to ‐1.57; three studies, 540 participants; moderate‐quality evidence); and a 14‐item scale with a scoring system of 0 to 42 points (MD ‐6.80, 95% CI ‐10.31 to ‐3.28; three studies, 152 participants; moderate‐quality evidence). Serious adverse reactions were rare in both groups (RR 0.99, 95% CI 0.14 to 6.97; one study, 319 participants; moderate‐quality evidence), but sparse data made it impossible for review authors to draw conclusions. Study authors reported a positive effect of exercise therapy at end of treatment with respect to sleep (MD ‐1.49, 95% CI ‐2.95 to ‐0.02; two studies, 323 participants), physical functioning (MD 13.10, 95% CI 1.98 to 24.22; five studies, 725 participants) and self‐perceived changes in overall health (RR 1.83, 95% CI 1.39 to 2.40; four studies, 489 participants). It was not possible for review authors to draw conclusions regarding the remaining outcomes.

Investigators compared exercise therapy with CBT in two trials (351 participants). One trial (298 participants) reported little or no difference in fatigue at end of treatment between the two groups using an 11‐item scale with a scoring system of 0 to 33 points (MD 0.20, 95% CI ‐1.49 to 1.89). Both studies measured differences in fatigue at follow‐up, but neither found differences between the two groups using an 11‐item fatigue scale with a scoring system of 0 to 33 points (MD 0.30, 95% CI ‐1.45 to 2.05) and a nine‐item Fatigue Severity Scale with a scoring system of 1 to 7 points (MD 0.40, 95% CI ‐0.34 to 1.14). Serious adverse reactions were rare in both groups (RR 0.67, 95% CI 0.11 to 3.96). We observed little or no difference in physical functioning, depression, anxiety and sleep, and we were not able to draw any conclusions with regard to pain, self‐perceived changes in overall health, use of health service resources and drop‐out rate.

With regard to other comparisons, one study (320 participants) suggested a general benefit of exercise over adaptive pacing, and another study (183 participants) a benefit of exercise over supportive listening. The available evidence was too sparse to draw conclusions about the effect of pharmaceutical interventions.

Authors' conclusions

Patients with CFS may generally benefit and feel less fatigued following exercise therapy, and no evidence suggests that exercise therapy may worsen outcomes. A positive effect with respect to sleep, physical function and self‐perceived general health has been observed, but no conclusions for the outcomes of pain, quality of life, anxiety, depression, drop‐out rate and health service resources were possible. The effectiveness of exercise therapy seems greater than that of pacing but similar to that of CBT. Randomised trials with low risk of bias are needed to investigate the type, duration and intensity of the most beneficial exercise intervention.

PICOs

Population
Intervention
Comparison
Outcome

The PICO model is widely used and taught in evidence-based health care as a strategy for formulating questions and search strategies and for characterizing clinical studies or meta-analyses. PICO stands for four different potential components of a clinical question: Patient, Population or Problem; Intervention; Comparison; Outcome.

See more on using PICO in the Cochrane Handbook.

Exercise as treatment for patients with chronic fatigue syndrome

Who may be interested in this review?

• People with chronic fatigue syndrome and their family and friends.

• Professionals working in specialist chronic fatigue services.

• Professionals working in therapeutic exercise.

• General practitioners.

Why is this review important?

Chronic fatigue syndrome (CFS) is sometimes called myalgic encephalomyelitis (ME). Research estimates that between 2 in 1000 and 2 in 100 adults in the USA are affected by CFS. People with CFS often have long‐lasting fatigue, joint pain, headaches, sleep problems, and poor concentration and short‐term memory. These symptoms cause significant disability and distress for people affected by CFS. There is no clear medical cause for CFS, so people who are affected often deal with misunderstanding of their condition from family, friends and healthcare professionals. National Institute for Health and Care Excellence (NICE) guidelines recommend exercise therapy for individuals with CFS, and a previous review of the evidence suggested that exercise therapy was a promising approach to the treatment. It is thought that exercise therapy can help management of CFS symptoms by helping people gradually reintroduce physical activity into their daily lives.

This review is an update of a previous Cochrane review from 2004, which showed that exercise therapy was a promising treatment for adults with CFS. Since the review, additional studies investigating the effectiveness and safety of exercise therapy for patients with CFS have been published.

What questions does this review aim to answer?

• Is exercise therapy more effective than ‘passive’ treatments (e.g. waiting list, treatment as usual, relaxation, flexibility)?

• Is exercise therapy more effective than other ‘active’ therapies (e.g. cognitive‐behavioural therapy (CBT), pacing, medication)?

• Is exercise therapy more effective when combined with another treatment than when given alone?

• Is exercise therapy safer than other treatments?

Which studies were included in the review?

We searched databases to find all high‐quality studies of exercise therapy for CFS published up to May 2014. To be included in the review, studies had to be randomised controlled trials and include adults over 18 years of age, more than 90% of whom had a clear diagnosis of CFS. We included eight studies with a total of 1518 participants in the review. Seven studies used aerobic exercise therapy such as walking, swimming, cycling or dancing; the remaining study used non‐aerobic exercise. Most studies asked participants to exercise at home, between three and five times per week, with a target duration of 5 to 15 minutes per session using different means of incrementation.

What does evidence from the review tell us?

Moderate‐quality evidence showed exercise therapy was more effective at reducing fatigue compared to ‘passive’ treatment or no treatment. Exercise therapy had a positive effect on people’s daily physical functioning, sleep and self‐ratings of overall health.

One study suggests that exercise therapy was more effective than pacing strategies for reducing fatigue. However exercise therapy was no more effective than CBT.

Exercise therapy did not worsen symptoms for people with CFS. Serious side effects were rare in all groups, but limited information makes it difficult to draw firm conclusions about the safety of exercise therapy.

Evidence was not sufficient to show effects of exercise therapy on pain, use of other healthcare services, or to allow assessment of rates of drop‐out from exercise therapy programmes.

What should happen next?

Researchers suggest that further studies should be carried out to discover what type of exercise is most beneficial for people affected by CFS, which intensity is best, the optimal length, as well as the most beneficial delivery method.

Authors' conclusions

Implications for practice

Encouraging evidence suggests that exercise therapy can contribute to alleviation of some symptoms of CFS, especially fatigue. Exercise therapy seems to perform better than no intervention or pacing and seems to lead to results similar to those seen with cognitive behavioural therapy. Reported results were obtained from patients who were able to participate (not from those too disabled to attend clinics); these results were inconclusive as to type of exercise therapy and showed heterogeneity. Few serious adverse reactions were reported. We think the evidence suggests that exercise therapy might be an effective and safe intervention for patients able to attend clinics as outpatients.

Implications for research

Further randomised controlled studies are needed to clarify the most effective type, intensity and duration of exercise therapy. These studies should report contextual characteristics of the exercise therapy provided, such as deliverer of the intervention, schedule, explanation and materials, supervision and monitoring. It is important that these trials measure health service use alongside the primary outcomes of fatigue and adverse effects, as well as alongside relevant secondary outcomes. Researchers should take care to describe which set of diagnostic criteria they have used and how they operationalised the diagnostic process.

Summary of findings

Open in table viewer
Summary of findings for the main comparison.

Exercise therapy for chronic fatigue syndrome

Patient or population: males and females over 18 years of age with chronic fatigue syndrome

Intervention: exercise therapy

Comparison: standard care, waiting list or relaxation/flexibility

Outcomes

Illustrative comparative risks* (95% CI)

Relative effect
(95% CI)

Number of participants
(studies)

Quality of the evidence
(GRADE)

Comments

Assumed risk

Corresponding risk

Control

Exercise

Fatiguea: FS, Fatigue Scale (0 to 11 points)

(end of treatment)

Mean fatigue in the control groups was 10.4 points

Mean fatigue in the intervention groups was
6.06 points lower (6.95 to 5.17 lower)

148
(1 study)

⊕⊕⊝⊝
Lowb,c

Lower score indicates less fatigue

Fatiguea: FS, Fatigue Scale (0 to 33 points)

(end of treatment)

Mean fatigue ranged across control groups from 15.3 to 26.3 points

Mean fatigue in the intervention groups was
2.82 points lower (4.07 to 1.57 lower)

540
(3 studies)

⊕⊕⊕⊝
Moderateb

Lower score indicates less fatigue

Fatiguea: FS, Fatigue Scale (0 to 42 points)

(end of treatment)

Mean fatigue ranged across control groups from 24.4 to 31.6 points

Mean fatigue in the intervention groups was 6.80 points lower (10.31 to 3.28 lower)

152
(3 studies)

⊕⊕⊕⊝
Moderateb

Lower score indicates less fatigue

Participants with serious adverse reactions

Study population

RR 0.99 (0.14 to 6.97)

319
(1 study)

⊕⊕⊕⊝
Moderated,e

13 per 1000

12 per 1000
(2 to 87)

Quality of Life (QOL) Scale (16 to 112 points)

(follow‐up)

Mean QOL score in the control group was 72 points

Mean QOL score in the intervention groups was 9.00 points lower (19.00 lower to 1.00 higher)

44
(1 study)

⊕⊝⊝⊝
Very lowb,f

Higher score indicates improved QOL

Physical functioning: SF‐36 subscale (0 to 100 points)

(end of treatment)

Mean physical functioning score ranged from 31.1 to 55.2 points across control groups

Mean physical functioning score in the intervention groups was 13.10 points higher (1.98 to 24.22 higher)

725
(5 studies)

⊕⊕⊝⊝

Lowb,g

Higher score indicates improved physical function

Depression: HADS depression score (0 to 21 points)

(end of treatment)

Mean depression score ranged across control groups from 5.2 to 11.2 points

Mean depression score in the intervention groups was 1.63 points lower (3.50 lower to 0.23 higher)

504
(5 studies)

⊕⊝⊝⊝
Very lowb,g,h

Lower score indicates fewer depressive symptoms

Sleep: Jenkins Sleep Scale (0 to 20 points)

(end of treatment)

Mean sleep score ranged across control groups from 11.7 to 12.2 points

Mean sleep score in the intervention groups was
1.49 points lower (2.95 to 0.02 lower)

323
(2 studies)

⊕⊕⊝⊝
Lowb,h

Lower score indicates improved sleep quality

Self‐perceived changes in overall health

(end of treatment)

Study population

RR 1.83 (1.39 to 2.40)

489
(4 studies)

⊕⊕⊕⊝
Moderateb

RR higher than 1 means that more participants in exercise groups reported improvement

218 per 1000

399 per 1000
(303 to 523)

Medium‐risk population

238 per 1000

436 per 1000
(331 to 571)

Drop‐out

(end of treatment)

Study population

RR 1.63 (0.77 to 3.43)

843

(6 studies)

⊕⊕⊝⊝
Lowb,g

RR higher than 1 means that more participants in exercise groups dropped out from treatment

70 per 1000

114 per 1000

(54 to 241)

Medium‐risk population

89 per 1000

145 per 1000

(69 to 305)

*The basis for the assumed risk (e.g. median control group risk across studies) is provided in footnotes. The corresponding risk (and its 95% confidence interval) is based on the assumed risk in the comparison group and the relative effect of the intervention (and its 95% CI).
CI: Confidence interval; RR: Risk ratio.

GRADE Working Group grades of evidence.
High quality: Further research is very unlikely to change our confidence in the estimate of effect.
Moderate quality: Further research is likely to have an important impact on our confidence in the estimate of effect and may change the estimate.
Low quality: Further research is very likely to have an important impact on our confidence in the estimate of effect and is likely to change the estimate.
Very low quality: We are very uncertain about the estimate.

aWe choose to present effect estimates as measured on the original scales rather than to transform them to standardised units. As 3 different scoring systems for fatigue were used, the outcome is presented over 3 rows.

bRisk of bias (‐1): All studies were at risk of performance bias, as they were unblinded.
cInconsistency (‐1): shows inconsistencies with other available trials when meta‐analysis based on standardised mean differences is performed. Subgroup analyses could not explain variation due to diagnostic criteria, treatment strategy or type of control.
dRisk of bias (0): This outcome is unlikely to have been affected by detection or performance bias.
eImprecision (‐1): low numbers of events and wide confidence intervals.
fImprecision (‐2): very low numbers of participants and wide confidence intervals, which encompass benefit and harm.
gInconsistency (‐1): variation in effect size and direction of effect across available studies.
hImprecision (‐1): Confidence interval fails to exclude negligible differences in favour of the intervention.

Background

Description of the condition

Chronic fatigue syndrome (CFS) is an illness characterised by persistent, medically unexplained fatigue. Symptoms include severe, disabling fatigue, as well as musculoskeletal pain, sleep disturbance, headaches, and impaired concentration and short‐term memory (Prins 2006). Individuals experience significant disability and distress, which may be exacerbated by lack of understanding from others, including healthcare professionals. The term 'myalgic encephalomyelitis (ME)' is often used, but 'CFS' is the term that has been adopted and clearly defined for research purposes, and it will be used in this review. The diagnosis can be made only after all alternative diagnoses have been excluded (Reeves 2003;Reeves 2007); several sets of criteria are currently used to diagnose CFS (Carruthers 2011; Fukuda 1994; NICE 2007; Reeves 2003; Sharpe 1991). The Centers for Disease Control and Prevention (CDC) 1994 diagnostic criteria for CFS (Fukuda 1994) are the most widely cited for research purposes (Fonhus 2011), resulting in prevalence of CFS of between 0.24% (Reyes 2003) and 2.55% (Reeves 2007) among US adults. Practical application of diagnostic criteria may help to explain some of the observed variation in prevalence estimates (Johnston 2013). In practice, most patients visit their local general practitioner (GP) for assessment. A minority of patients may be referred to specialist clinics (e.g. neurology, infectious diseases, psychiatry, endocrinology or general medicine) for exclusion of alternative underlying disorders.

Description of the intervention

Exercise therapy is often included as part of a treatment programme for individuals with CFS. 'Exercise' is defined as "planned structured and repetitive bodily movement done to improve or maintain one or more components of physical fitness" (ACSM 2001); 'therapy' is defined as "treatment intended to relieve or heal a disorder" (Oxford English Dictionary). We define 'exercise therapy' as a "regimen or plan of physical activity designed and prescribed [and] intended to relieve or heal a disorder," and 'therapeutic exercise' or 'exercise therapy' can be described as "planned exercise performed to attain a specific physical benefit, such as maintenance of the range of motion, strengthening of weakened muscles, increased joint flexibility, or improved cardiovascular and respiratory function" (Mosby 2009). Aerobic exercise such as walking, jogging, swimming or cycling is included, along with anaerobic exercise such as strength or stabilising exercises. Graded exercise therapy is characterised by establishment of a baseline of achievable exercise or physical activity, followed by a negotiated, incremental increase in the duration of time spent physically active followed by an increase in intensity (White 2011).

How the intervention might work

Physical activity can improve health and quality of life for patients with chronic disease (Blair 2009). The causal pathway for CFS is unknown; however several hypotheses have been proposed as to why exercise therapy might be a viable treatment. The 'deconditioning model' assumes that the syndrome is perpetuated by reversible physiological changes of deconditioning and avoidance of activity; therefore exercise should improve deconditioning and thus the condition of patients with CFS (Clark 2005; White 2011). However, mediation studies suggest that improved conditioning is not associated with better outcomes (Fulcher 1997;Moss‐Morris 2005). Some graded exercise therapy (GET) programmes are designed to gradually reintroduce the patient to the avoided stimulus of physical activity or exercise, which may involve a conditioned response leading to fatigue (Clark 2005;Fulcher 2000;White 2011). Mediation studies suggest that reduced symptom focus may mediate outcomes with GET, consistent with this model (Clark 2005; Moss‐Morris 2005). Evidence has also been found for central sensitisation contributing to hyperresponsiveness of the central nervous system to a variety of visceral inputs (Nijs 2011). The most replicated finding in patients with CFS is an increased sense of effort during exercise, which is consistent with this model (Fulcher 2000;Paul 2001). Graded exercise therapy may reduce this extra sense of effort, perhaps by reducing central sensitisation (Fulcher 1997).

Further research is needed to verify these hypotheses, but effective treatments may be discovered without knowledge of the effective pathway or underlying cause.

Why it is important to do this review

The previous Cochrane review (Edmonds 2004) suggested that exercise therapy was a promising treatment but that larger studies were needed to address the safety of this therapy (Edmonds 2004). Such studies have been completed and their findings published, so that the present time is propitious for an updated review. Exercise therapy is often used as treatment for individuals with CFS and is recommended by treatment guidelines (NICE 2007). People with CFS should have the opportunity to make informed decisions about their care and treatment based on robust research evidence. This review will examine the effectiveness of exercise therapy, provided as a stand‐alone intervention or as part of a treatment plan. The Cochrane Collaboration has reviewed multiple aspects of treatment for patients with CFS. A review on CBT was published in 2008 (Price 2008), and one on traditional Chinese herbal medicine in 2009 (Adams 2009); also, a protocol on pharmacological treatments was submitted (Hard 2009).

This review, which is an update of a Cochrane review first published in 2004, will update the evidence base that serves as a resource for informed decision making by healthcare personnel and patients. A protocol for an accompanying individual patient data review on chronic fatigue syndrome and exercise therapy has been published (Larun 2014).

Objectives

The objective of this review was to determine the effects of exercise therapy (ET) for patients with chronic fatigue syndrome (CFS) as compared with any other intervention or control.

  • Exercise therapy versus 'passive control' (e.g. treatment as usual, waiting‐list control, relaxation, flexibility).

  • Exercise therapy versus other active treatment (e.g. cognitive‐behavioural therapy (CBT), cognitive treatment, supportive therapy, pacing, pharmacological therapy such as antidepressants).

  • Exercise therapy in combination with other specified treatment strategies versus other specified treatment strategies (e.g. exercise combined with pharmacological treatment vs pharmacological treatment alone).

Methods

Criteria for considering studies for this review

Types of studies

We included randomised controlled trials, as well as cluster‐randomised trials and cross‐over trials.

Types of participants

We included trials of male and female participants over the age of 18, irrespective of cultures and settings. Investigators currrently have used several sets of criteria to diagnose CFS (Carruthers 2011; Fukuda 1994; NICE 2007; Reeves 2003;Sharpe 1991); therefore we decided to include trials in which participants fulfilled the following diagnostic criteria for CFS or ME.

  • Fatigue, or a symptom synonymous with fatigue, was a prominent symptom.

  • Fatigue was medically unexplained (i.e. other diagnoses known to cause fatigue such as anorexia nervosa or sleep apnoea could be excluded).

  • Fatigue was sufficiently severe to significantly disable or distress the participant.

  • Fatigue persisted for at least six months.

We included trials that included participants with disorders other than CFS provided that > 90% of participants had been given a primary diagnosis of CFS based on the criteria discussed above. We included in the analysis of this review trials in which less than 90% of participants had a primary diagnosis of CFS only if data on CFS were reported separately.

Co‐morbidity

Studies involving participants with co‐morbid physical or common mental disorders were eligible for inclusion only if the co‐morbidity did not provide an alternative explanation for fatigue.

Types of interventions

Experimental intervention

Both aerobic and anaerobic interventions aimed at exercising big muscle groups, for example, walking, swimming, jogging and strength or stabilising exercises, could be included. Both individual and group treatment modalities were eligible, but interventions had to be clearly described and supported by appropriate references.

'Exercise therapy' is an umbrella term for the different types of exercise provided; it is based on the American College of Sports Medicine definition (ACSM 2001). We categorised exercise therapies in this review in accordance with descriptions of the interventions provided by individual studies. We prepared a table of Interventions with detailed information on exercise therapy reported by the included studies, as definitions vary across time and context. As a point of reference, we used the following empirical definitions, as derived from descriptions of the interventions.

  • Graded exercise therapy (GET): exercise in which the incremental increase in exercise was mutually set.

  • Exercise with pacing: exercise in which the incremental increase in exercise was personally set.

  • Anaerobic exercise: exercise that requires a high level of exertion, in a brief spurt or short‐term in duration by the participant that can be gradually increased over time with practice

We did not impose restrictions with regard to the duration of each treatment session, the number of sessions or the time between sessions. Trials presenting data from one of the following comparisons were eligible for inclusion.

Comparator interventions

  • ‘Passive control’: treatment as usual/waiting‐list control/relaxation/flexibility.

    • 'Treatment as usual' comprises medical assessments and advice given on a naturalistic basis. 'Relaxation' consists of techniques that aim to increase muscle relaxation (e.g. autogenic training, listening to a relaxation tape). 'Flexibility' includes stretches performed according to selected exercises given.

  • Psychological therapies: cognitive‐behavioural therapy (CBT)/cognitive treatment/supportive therapy/behavioural therapies/psychodynamic therapies.

  • Adaptive pacing therapy.

  • Pharmacological therapy (e.g. antidepressants).

Types of outcome measures

Primary outcomes

1. Fatigue: measured using any validated scale (e.g. Fatigue Scale (FS) (Chalder 1993), Fatigue Severity Scale (FSS) (Krupp 1989)).

2. Adverse outcomes: measured using any reporting system (e.g. serious adverse reactions (SARs) (European Union Clinical Trials Directive 2001)).

Secondary outcomes

3. Pain: measured using any validated scale (e.g. Brief Pain Inventory (Cleeland 1994)).

4. Physical functioning: measured using any validated scale (e.g. Short Form (SF)‐36, physical functioning subscale (Ware 1992)).

5. Quality of life (QOL): measured using any validated scale (e.g. Quality of Life Scale (Burckhardt 2003)).

6. Mood disorders: measured using validated instruments (e.g. Hospital Anxiety and Depression Scale (Zigmond 1983)).

7. Sleep duration and quality: measured by self‐report on a validated scale, or objectively by polysomnography (e.g. Pittsburgh Sleep Quality Index (Buysse 1989)).

8. Self‐perceived changes in overall health: measured by self‐report on a validated scale (e.g. Global Impression Scale (Guy 1976)).

9. Health service resource use (e.g. primary care consultation rate, secondary care referral rate, use of alternative practitioners).

10. Drop‐outs (any reason).

Timing of outcome assessment

We extracted from all studies data on each outcome for end of treatment and end of follow‐up.

Search methods for identification of studies

Electronic searches

The Cochrane Collaboration's Depression, Anxiety and Neurosis (CCDAN) Review Group's Trials Search Coordinator (TSC) searched their Group's Specialized Register (CCDANCTR‐Studies and CCDANCTR‐References) (all years to 9 May 2014). This register is created from routine generic searches of MEDLINE (1950‐ ), EMBASE (1974‐ ) and PsycINFO (1967‐ ). Details of CCDAN's generic search strategies, used to inform he CCDANCTR can be found on the Group‘s web site.

The CCDANCTR‐Studies Register was searched using the following terms:
Diagnosis = ("Chronic Fatigue Syndrome" or fatigue) and Free Text = (exercise or sport* or relaxation or "multi convergent" or "tai chi")

The CCDANCTR‐References Register was searched using a more sensitive list of free‐text search terms to identify additional untagged/uncoded references, e.g. fatigue*, myalgic encephalomyelitis*, exercise, physical active* and taiji. Full search strategy listed in Appendix 1.

A complementary search of the following bibliographic databases and international trial registers were also conducted to 9 May 2014 (see Appendix 2):

  • SPORTSDiscus (1985 ‐ );

  • The Cochrane Central Register of Controlled Trials (CENTRAL, all years ‐); and

  • WHO International Clinical Trials Portal.

Searching other resources

We contacted the authors of included studies and screened reference lists to identify additional published or unpublished data. We conducted citation searches using the Institute for Scientific Information (ISI) Science Citation Index on the Web of Science.

Data collection and analysis

Selection of studies

Two of three review authors (LL, JO‐J, KGB) inspected identified studies, using eligibility criteria to select relevant studies. In cases of disagreement, they consulted a third review author (JRP).

Data extraction and management

Melissa Edmonds and Jonatahan R Price independently extracted data from included studies for the 2004 version of this review, and LL and JO‐J did so for this review update, using a standardised extraction sheet. They extracted mean scores at endpoint, the standard deviation (SD) or standard error (SE) of these values and the number of participants included in these analyses. When only the SE was reported, review authors converted it to the SD. For dichotomous outcomes, such as drop‐outs, we extracted the number of events. We sought clarification from trial authors when necessary from investigators involved in the following trials: Fulcher 1997, Moss‐Morris 2005, Wallman 2004, Wearden 2009, Wearden 2010 and White 2011. We resolved disagreement between review authors by discussion.

Main comparisons

  • Exercise therapy versus 'passive control'.

  • Exercise therapy versus psychological treatment.

  • Exercise therapy versus adaptive pacing therapy.

  • Exercise therapy versus pharmacological therapy (e.g. antidepressants).

  • Exercise therapy as an adjunct to other treatment versus other treatment alone.

Assessment of risk of bias in included studies

Working independently, LL and JO‐J, KGB or Jane Dennis (JD) assessed risk of bias using The Cochrane Collaboration risk of bias tool which was published in the most recent version of the Cochrane Handbook for Systematic Reviews of Interventions (Higgins 2011). This tool encourages consideration of how the sequence was generated, how allocation was concealed, the integrity of blinding at outcome, the completeness of outcome data, selective reporting and other potential sources of bias. We classified all items in the risk of bias assessment as low risk, high risk or unclear risk by the extent to which bias was prevented.

Measures of treatment effect

Continuous data

For continuous outcomes, we calculated the mean difference (MD) when the same scale was used in a similar manner across studies. When results for continuous outcomes were presented using different scales or different versions of the same scale, we used the standardised mean difference (SMD).

Dichotomous data

For dichotomous outcomes, we expressed effect size in terms of risk ratio (RR).

Unit of analysis issues

Studies with multiple treatment groups

We extracted data from relevant arms of the included studies, and we compared the experimental condition (exercise therapy) versus each individual comparator intervention: ‘Passive control’ (treatment as usual/waiting‐list control/relaxation/flexibility); 'Psychological treatment' (cognitive‐behavioural therapy (CBT)/cognitive treatment/supportive therapy/behavioural therapies/psychodynamic therapies); 'Adaptive pacing therapy; and Pharmacological therapy (e.g. antidepressants). This meant that data from the exercise arm could be included in a separate univariate analysis for more than one comparison. We described under Differences between protocol and review planned methods that were found redundant, as we did not include studies requiring their use.

Dealing with missing data

When possible, we calculated missing standard deviations from reported standard errors, P values or confidence limits using the methods described in Chapter 7 (Sections 7.7.3.2 and 7.7.3.3) of the Cochrane Handbook for Systematic Reviews of Interventions (Higgins 2011). We approached trial investigators to obtain other types of missing data.

Assessment of heterogeneity

For this update, we assessed heterogeneity in keeping with the recommendations of the Cochrane Handbook for Systematic Reviews of Interventions (I2 values of 0 to 40%: might not be important; 30% to 60%: may represent moderate heterogeneity; 50% to 90%: may represent substantial heterogeneity; 75% to 100%: show considerable heterogeneity; Higgins 2011). In addition to the I2 value (Higgins 2003), we present the P value of the Chi2 test, and we considered the direction and magnitude of treatment effects when making judgements about statistical heterogeneity. We deemed that no analyses were inappropriate as a result of the presence of statistical heterogeneity, as the measures and statistics used have low power and are unstable when based on few and small studies. A P value < 0.1 from the Chi2 test was used as an indicator of statistically significant heterogeneity because of the low power of provided measures.

Assessment of reporting biases

We planned at the protocol stage to construct funnel plots when sufficient numbers of trials allowed a meaningful presentation, to establish whether other potential biases could be present. Asymmetry of these plots may indicate publication bias, although it also may represent a true relationship between trial size and effect size. We identified an insufficient number of studies to use this approach in the present version of the review (Egger 1997). We considered clinical diversity of the studies as a possible explanation for some of the heterogeneity apparent between studies.

Data synthesis

As the result of expected clinical heterogeneity (slightly different interventions, populations and comparators) among studies, we chose the random‐effects model as the default method of analysis because the alternative fixed‐effect model assumes that the true treatment effect in each trial is the same, and that observed differences are due to chance.

We performed analyses using Review Manager 5.0.

Subgroup analysis and investigation of heterogeneity

We planned no subgroup analyses a priori. To explore possible differences between studies that used different strategies (e.g. exercise therapy), control conditions and diagnostic criteria, we performed post hoc subgroup analyses. We describe results of these subgroup analyses in the text of the review.

Sensitivity analysis

We planned no sensitivity analyses a priori. To explore the possible impact of our pooling strategy (e.g. the impact of using SMD vs MD), we performed post hoc sensitivity analyses. In addition, we performed sensitivity analyses when studies with outlying results where excluded. We describe results of these sensitivity analyses in the text of the review.

Results

Description of studies

Results of the search

Our searches identified 908 unique records. Of these, we retrieved 50 records and read the full text. Along with the five included studies from the 2004 version of this review (Fulcher 1997; Moss‐Morris 2005; Powell 2001; Wallman 2004; Wearden 1998), we have included three additional studies in this update (Jason 2007;Wearden 2010;White 2011; see Figure 1).


PRISMA flow diagram.

PRISMA flow diagram.

Included studies

A total of eight studies (Fulcher 1997; Jason 2007;Moss‐Morris 2005;Powell 2001; Wallman 2004;Wearden 1998; Wearden 2010;White 2011) met our inclusion criteria for this review (23 reports in all). All included studies were written in English and were published in peer‐reviewed journals.

Design

All included studies were described as randomised controlled trials.

Three studies included two arms (Fulcher 1997; Moss‐Morris 2005; Wallman 2004) comparing exercise versus relaxation/flexibility, waiting list or standard care, respectively.

Four studies had four arms. For Powell 2001, we combined the three intervention arms and used these as comparators versus treatment as usual. We considered two arms (exercise + drug placebo vs exercise placebo + drug placebo) in Wearden 1998 as relevant for this review. For Jason 2007 and White 2011, all four arms were used, as were three arms in Wearden 2010.

The eight studies randomly assigned a total of 1518 participants. Samples included in this review ranged from 49 (Moss‐Morris 2005) to 641 participants (White 2011).

Setting

Two studies took place in primary care settings: one in the United Kingdom (Wearden 2010) and one in Australia (Wallman 2004). Two studies were performed in secondary care facilities: one in the United Kingdom (Fulcher 1997) and one in New Zealand (Moss‐Morris 2005). One study recruited from a variety of sources but took place at a hospital in the USA (Jason 2007). Three studies were conducted at secondary/tertiary care settings in the United Kingdom (Powell 2001; Wearden 1998; White 2011).

Participants

Three studies used the Centers for Disease Control and Prevention (CDC) 1994 criteria (Fukuda 1994) as inclusion criteria (Jason 2007; Moss‐Morris 2005; Wallman 2004), and five (Fulcher 1997; Powell 2001; Wearden 1998; Wearden 2010; White 2011) used the Oxford criteria (Sharpe 1991). Wearden 2010 and White 2011 showed an overlap between Oxford criteria (Sharpe 1991) and London ME criteria (The National Task Force on CFS) of 31% and 51%, respectively. More female than male participants were included (range 71% to 84% when all arms were included), and mean ages across studies were between 33 and 44.6 years (confirmation of age data was requested from a trial investigator in one case (Wallman 2009)). The studies reported median illness duration of between 2.3 and 7 years. All but one study (Wallman 2004) reported depression, which ranged from 18% (Wearden 2010) of those with a depression diagnosis to 39% among participants with a current Axis I disorder (Jason 2007). Three studies did not report work and employment information (Wallman 2004; Wearden 2010;White 2011). Fulcher 1997 and Jason 2007 reported that 39% and 46% of participants were working or studying on at least a part‐time basis, 22% of participants in Moss‐Morris 2005 were unemployed and were unable to work because of disability and 42% of participants in Powell 2001 were receiving disability pensions (Table 1).

Open in table viewer
Table 1. Study demographics

Study ID

N

Gender

Duration of illness

Depression co‐morbidity

Use of antidepressants (ADs)

Work and employment status

Fulcher 1997

66

49F/17M

65% female

2.7 years

20 (30%) possible cases of depression (HADS)

30 (45%) on full‐dose AD (n = 20) or low‐dose AD (n = 10)

26 (39%) working or studying at least part time

Jason 2007

114

95F/19M

83% female

> 5.0 years

44 (39%) with a current Axis I disorder

(depression and anxiety most common)

Not stated

52 (46%) working or studying at least part time, 24% unemployed, 6% retired, 25% on disability

Moss‐Morris 2005

49

34F/15M

69% female

3.1 years

14 (29%) possible or probable cases of depression (HADS)

Not stated

11 (22%) were unemployed and were unable to work because of disability

Powell 2001

148

116F/32M

78% female

4.3 years

58 (39%) possible or probable cases of depression (HADS)

27 (18%) used AD

50 (34%) were working, 64 (43%) were on disability

Wallman 2004

61

47F/14M

77% female

Not stated

Not stated

16 (26%) used AD

Not stated

Wearden 1998

136

97F/39M

71% female

2.3 years

46 (34%) with depressive disorder according to DSM‐III‐R criteria

Not stated

114 (84%) had recently changed occupation

Wearden 2010

296

230F/66M

78% female

7.0 years

53 (18%) had a depression diagnosis

160 (54%) were prescribed AD in the past 6 months

Not stated

White 2011

641

495F/146M

77% female

2.7 years

219 (34%) with any depressive disorder

260 (41%) used AD

Not stated

Intervention characteristics

The exercise therapy regimen lasted between 12 and 26 weeks. Seven studies used variations of aerobic exercise therapy such as walking, swimming, cycling or dancing at mixed levels in terms of intensity of the aerobic activity ranging from very low to quite rigorous; the remaining study used anaerobic exercise (Jason 2007). Scheduled therapist meetings could be conducted face‐to‐face or by telephone and varied from every second week to weekly; some sessions involved talking, and some exercise. Most of the included studies asked participants to exercise at home, most often between three and five times per week, with a target duration of 5 to 15 minutes per session using different means of incrementation (Fulcher 1997; Moss‐Morris 2005; Powell 2001; Wallman 2004; Wearden 1998; Wearden 2010; White 2011). Participants were asked to perform self‐monitoring by using such tools as heart monitors, the Borg Scale or a diary including an exercise log to measure adherence to treatment (Table 2). Control interventions included treatment as usual, relaxation plus flexibility and a waiting‐list control group.

Open in table viewer
Table 2. Characteristics of exercise interventions

Study ID

Deliverer of intervention

Explanation and materials

Type of exercise

Schedule therapist

Schedule home

Duration of activity

Initial exercise level

Increment steps

Participant self‐monitoring

Criteria for (non)‐increment

Fulcher 1997

Exercise physiologist

Verbal explanation of deconditioning and reconditioning

Walking (encouraged to take other modes such as cycling and swimming)

Weekly

(1 hour), talking only

5 days/wk

5 to 15 minutes increasing to 30 minutes/d

5 to 15 minutes at 40% of peak O2 consumption

(target HR of resting + 50% of HRR)

Duration increased 1 to 2 minutes per week up to 30 minutes; then intensity increased

Ambulatory heart rate monitors

If increased fatigue, continue at the same level for an extra week

Wearden 1998

Physiotherapist,

fitness focus

Minimal explanation; no written materials

Preferred activity

(walking/jogging, some did cycling, swimming)

At week 0, 1, 2, 4, 8, 12*, 20, 26*,

talking only

(*evaluation visits)

3 days/wk

20 minutes

75% of VO2max from bike test

Intensity increased

Borg Exertion Scale chart, before and after HR

Increase if:
10 beats/min drop post exercise and 2‐point drop in Borg Scale score

Powell 2001

Senior clinical therapist

Explanations for GET, circadian dysrhythmia, deconditioning, sleep

"educational information pack"

Aerobic exercise;

own choice but mostly exercise bike

9 face‐to‐face

(1.5 hours each)

Tailored

Tailored to functional abilities

Tailored to functional abilities: “a level which you are capable of doing on a BAD DAY”

Varying daily increase (e.g. "5 second increase each day for the rest of the second week"

to 30 minutes twice/d

Duration of exercise

Discouraged, but restart at lower level and rapidly reincrease

Wallman 2004

Single physical therapist

Small laminated Borg Scale and heart rate monitor

Walking/jogging, swimming or cycling

Phone contact every 2 weeks

Every second day

From 5 to 15 minutes, increasing to 30 minutes

Initial exercise duration was between 5 and 15 minutes, and intensity was based on the mean HR value achieved midpoint during submaximal exercise tests 

Duration increased by 2 to 5 minutes/2 wk

Heart rate monitoring,

Borg Exertion Scale

Keep Borg within 11 to 14. Adjust every 2 weeks. Average peak HR when exercising comfortably at a typical day represents patient’s target heart rate (± 3 bpm) for future sessions

Moss‐Morris 2005

Health psychology MSc student, researcher

Focused on the "downward spiral of activity reduction, deconditioning"

Walking (but could also do other preferred exercise, e.g. jogging, swimming)

Weekly for 12 weeks, talking only

4 to 5 days/wk

Set collaboratively approx 5 to 15 minutes

HR at 40% of VO2max

Duration 3 to 5 minutes/wk

Intensity increased after 6 weeks 5 bpm/wk

Ambulatory heart rate monitors

If increased fatigue, continue at the same level for an extra week

Jason 2007

Registered nurses supervised by exercise physiologist

"Behavioral goals explained, energy system education, redefining exercise"

"individualized, constructive and pleasurable activities"

Every 2 weeks

(45 minutes),

13 sessions

3 per week

Tailored

Flexibility tests

Strength test (hand grip)

"Gradually increasing anaerobic activity levels"

Self‐monitoring daily exercise diary

New targets only after habituation, or if goals achieved for 2 weeks

Wearden 2010

Nurses with 16 half‐days of training and supervision

Explanation of physiological symptoms and training in first session

Wide choice: walking, stairs, bicycle, dance, jog

10 sessions over 18 weeks

Several times per day

First 90 minutes, then alternating 60 and 30 minutes

Determined collaboratively with the participant

"Increased very gradually," examples show 50% increase per day

Diary of progress on exercise programme, with note of daily activities

On "bad days," try to do same as day before

White 2011

Exercise therapist/physiotherapist

(8 to 10 days training + ongoing supervision)

142‐page manual:

benefits of exercise

and "how to" of GET; some got pedometers

Wide choice: walking, cycling, swimming, Tai Chi.

Aim to build into daily activities

Weekly × 4, then

fortnightly;

total of 15 sessions

5 to 6 days/wk

Negotiated, goal to get to 30 minutes per session

Test of fitness (step test. and 6‐minute walking test),

perceived physical exertion, actigraphy data

"20% increases" per fortnight; increase duration to 30 minutes, then increase intensity

Exercise diary + Borg scale +

“Use non‐symptoms to monitor” and

heart rate monitor

(for intensity increases)

Do not increase if global increase in symptoms

© 9. March 2012, Paul Glasziou, Bond University, Australia

Outcomes

The main outcomes were symptom levels measured by rating scales at end of treatment (12 to 26 weeks) and at follow‐up (52 to 70 weeks). Fatigue was measured by the Fatigue Scale (FS) (Chalder 1993) in seven studies (Fulcher 1997; Moss‐Morris 2005; Powell 2001; Wallman 2004; Wearden 1998; Wearden 2010; White 2011) and by the Fatigue Severity Scale (FSS) (Krupp 1989) in one study (Jason 2007). Another study (White 2011) reported adverse outcomes according to SAR categories (European Union Clinical Trials Directive 2001).

The Jason 2007 study measured pain using the Brief Pain Inventory (Cleeland 1994). Physical functioning was measured by the SF‐36 physical functioning subscale (Ware 1992) in seven studies (Fulcher 1997; Jason 2007; Moss‐Morris 2005; Powell 2001; Wearden 1998; Wearden 2010; White 2011). Quality of life was measured by the Quality of Life Scale (QOLS) (Burckhardt 2003) in another study (Jason 2007).

Seven studies (Fulcher 1997; Jason 2007; Moss‐Morris 2005; Powell 2001; Wallman 2004; Wearden 2010; White 2011) reported self‐perceived changes in overall health using the Global Impression Scale (Guy 1976).

Of the seven studies that reported mood disorder, six (Fulcher 1997; Powell 2001; Wallman 2004; Wearden 1998; Wearden 2010; White 2011) used the Hospital Anxiety and Depression Scale (HADS) (Zigmond 1983), and one (Jason 2007) used the Beck Depression Inventory (BDI‐II) (Beck 1996) and the Beck Anxiety Inventory (BAI) (Hewitt 1993). Three studies (Powell 2001; Wearden 2010; White 2011) measured sleep problems by using a questionnaire (Jenkins 1988), two (Fulcher 1997; Powell 2001) by using the Pittburgh Sleep Quality Index (PSQI) (Buysse 1989).

One study reported health service resource use (White 2011).

Drop‐out was calculated by the review authors.

Included studies reported several outcomes in addition to those reported in this review, such as work capacity by oxygen consumption (VO2), the six‐minute walking test and illness beliefs. See Characteristics of included studies for more detailed information.

Ethics approval

Ethics approval was obtained for all listed studies and sponsoring or funding listed.

Excluded studies

Two studies were excluded in 2004, as the diagnoses used were Gulf War veterans' illness (Guarino 2001) and subclinical chronic fatigue (Ridsdale 2004). The study awaiting assessment from 2004 was also excluded (Stevens 1999), as exercise therapy was a minor part of a combination treatment.

The current version excluded 14 studies (Evering 2008; Gordon 2010; Guarino 2001; Nunez 2011; Ridsdale 2004; Ridsdale 2012; Russel 2001; Stevens 1999; Taylor 2004; Taylor 2006; Thomas 2008; Tummers 2012; Viner 2004; Wright 2005). In addition to the two studies excluded from the 2004 version because of the population included (Guarino 2001; Ridsdale 2004), another with the diagnosis of chronic fatigue was excluded (Ridsdale 2012), as were two in which participants were younger than 18 years (Viner 2004; Wright 2005). Along with the one study excluded in 2004 (Stevens 1999), another five studies (Evering 2008; Nunez 2011; Russel 2001; Taylor 2004; Tummers 2012) were excluded in this review update because exercise therapy was a minor part of the intervention. One study was excluded because investigators compared two exercise interventions (Gordon 2010). Two studies were excluded because they were not RCTs (Taylor 2006; Thomas 2008).

Ongoing studies

We identified five ongoing studies in trial registers (Broadbent 2012;Kos 2012;Marques 2012;Vos‐Vromans 2008; White 2012).

Studies awaiting classification

Studies identified from searches run to 9 May 2014 were assessed for eligibility and were classified accordingly. Three studies identified in the search are waiting assessment for possible inclusion, as the available information is too sparse for conclusions about eligibility. One abstract seems to refer to an unpublished study (Hatcher 1998), but we have not been able to contact the study authors for clarification. Additionally, two citations refer to studies that are available only in Chinese (Liu 2010; Zhuo 2007). Again, we have not been able to contact the study authors to clarify their relevance, and we have not had the resources to perform translation.

New studies found at this update

Three new studies have been added in this updated review (Jason 2007;Wearden 2010;White 2011).

Risk of bias in included studies

Summaries of the risk of bias assessments are presented in Figure 2 and Figure 3.


Risk of bias summary: review authors' judgements about each risk of bias item for each included study.

Risk of bias summary: review authors' judgements about each risk of bias item for each included study.


Risk of bias graph: review authors' judgements about each risk of bias item presented as percentages across all included studies.

Risk of bias graph: review authors' judgements about each risk of bias item presented as percentages across all included studies.

Allocation

All but one of the studies had adequate sequence generation (Wallman 2004). We judged five reported methods of allocation concealment as 'adequate' and found that methods described by the remaining three were unclear (Jason 2007; Powell 2001; Wallman 2004).

Blinding

As the intervention did not allow for blinding of participants or personnel delivering the exercise‐based interventions, and as all measures were performed by self‐report, blinding was impossible. This inevitably puts the review at some risk of bias, and all of the included studies were rated as having high risk of bias.

Incomplete outcome data

Risk of bias due to incomplete outcomes was low in five of the eight included studies, reflecting the fact that loss to follow‐up was low, and that participants who were lost to follow‐up were evenly distributed between intervention and control groups (Fulcher 1997; Moss‐Morris 2005; Powell 2001; Wallman 2004; White 2011). One trial was associated with unclear risk of attrition bias (Wearden 2010). The drop‐out rate in the intervention groups in this trial was relatively high, but most of the participants who dropped out from treatment were still available for follow‐up assessments and were analysed within the groups to which they were randomly assigned (Wearden 2010). Two trials were associated with high risk of attrition bias (Jason 2007; Wearden 1998). Wearden 1998 reported large drop‐out rates in all intervention groups as compared with control groups, and many participants were lost to follow‐up. In Jason 2007, the conservatively defined drop‐out rate (i.e. "attending four or fewer sessions or stopping therapy prior to satisfactory completion of therapy") on average was 25%. Study authors used the best linear unbiased predictor to avoid taking missing data into account, but as loss to follow‐up for various intervention groups was not reported, we assessed the risk of attrition bias as high for this trial.

Selective reporting

Two studies (Wearden 2010; White 2011) referenced published protocols, and when we checked these against the published results, we found that reporting was adequate. In one study (Wearden 1998), trial investigators reported numerical data for only one subscale (health perception) of the Medical Outcomes Survey (MOS) scale (Ware 1992), for which data favour the intervention group; no numerical data were given for the five other subscales, nor for another scale (anxiety), as data were "similar in trial completers." It was not possible to check the other studies for selective reporting bias; therefore their risk of bias is considered unclear.

Other potential sources of bias

Seven of the eight studies seem to be free of other sources of bias, and one showed a baseline difference across groups for several variables (Jason 2007). These were not discussed when results were presented in the paper. In addition this study had 25 outcome measures; because of this large number, one significant measure would be expected to occur by chance (Jason 2007). Wallman 2004 showed differences between groups for anxiety and mental fatigue at baseline, and this might have influenced the results.

Effects of interventions

See: Summary of findings for the main comparison

Exercise therapy versus control

Comparison 1. Exercise therapy versus treatment as usual, relaxation or flexibility

All included studies (Fulcher 1997; Jason 2007; Moss‐Morris 2005; Powell 2001; Wallman 2004; Wearden 1998; Wearden 2010; White 2011) contributed data for this comparison.

1.1 Fatigue

Powell 2001 (148 participants) assessed fatigue by dichotomised scoring of an 11‐item Fatigue Scale (FS, 0 to 11 points) (Chalder 1993) and reported results clearly in favour of exercise therapy (mean difference (MD) ‐6.06, 95% confidence interval (CI) ‐6.95 to ‐5.17; Analysis 1.1). Three studies (Wallman 2004; Wearden 2010; White 2011) measured fatigue among a total of 540 participants using the same 11‐item FS with a different scoring system (0 to 33 points) (Chalder 1993) (Analysis 1.1). The pooled estimate suggests that exercise therapy was significantly more effective than treatment as usual (MD ‐2.82, 95% CI ‐4.07 to ‐1.57) – a result that was not associated with heterogeneity (I² = 0%, P value 0.54). Three studies (Fulcher 1997; Moss‐Morris 2005; Wearden 1998) with a total of 152 participants measured fatigue using a 14‐item FS (0 to 42 points) (Chalder 1993). Pooling shows a significant decrease in fatigue in the exercise group when compared with treatment as usual (MD ‐6.80 points, 95% CI ‐10.31 to ‐3.28), and the analysis was associated with low heterogeneity (I² = 20%, P value 0.29).

At follow‐up, small strengthening of the effect was observed on the 11‐point FS (Chalder 1993) as reported by Powell 2001 (MD ‐7.13, 95% CI ‐7.97 to ‐6.29; 148 participants; Analysis 1.2). Pooling of the two studies (Wearden 2010; White 2011) that measured fatigue on the 33‐point scale resulted in almost the same effect estimate at follow‐up as at end of treatment (MD ‐2.87, 95% CI ‐4.18 to ‐1.55; 472 participants; Analysis 1.2). The latter analysis was not associated with any unexplained heterogeneity (I² = 0%, P value 0.46). Jason 2007 (50 participants) did not report results at end of treatment but showed little or no difference in fatigue between anaerobic exercise and treatment as usual at follow‐up, as measured on the Fatigue Severity Scale (FSS) (Krupp 1989) (MD 0.15, 95% CI ‐0.55 to 0.85; Analysis 1.2).

Sensitivity analysis

Investigating heterogeneity

At end of treatment, fatigue was measured and reported on different scales, and we performed a sensitivity analysis in which all available studies were pooled using an SMD method. This strategy led to a pooled random‐effects estimate of ‐0.68 (95% CI ‐1.02 to ‐0.35), but the analysis suffered from considerable heterogeneity (I² = 78%, P value < 0.0001; Analysis 1.19). The observed heterogeneity was caused mainly by the deviating results presented in Powell 2001. Exclusion of Powell 2001 gave rise to a pooled SMD of ‐0.46 (95% CI ‐0.63 to ‐0.29) – an estimate that was not associated with heterogeneity (I² = 13%, P value 0.33).

At follow‐up, the four available studies (Jason 2007; Powell 2001; Wearden 2010; White 2011) measured and reported fatigue on different scales, and we performed a sensitivity analysis in which all available studies were pooled using an SMD method. The pooled SMD estimate is ‐0.63 (95% CI ‐1.32 to 0.06), but heterogeneity was extensive (I² = 93%, P value < 0.00001). Exclusion of Powell 2001 gave rise to a new pooled SMD of ‐0.29 (95% CI ‐0.55 to ‐0.03) and reduced heterogeneity (I² = 46%, P value 0.16).

Subgroup analysis

To explore the possible impact of our pooling strategy (e.g. the impact of pooling studies adhering to different exercise strategies and control conditions), we performed post hoc subgroup analyses within Analysis 1.1 and Analysis 1.2.

Type of exercise

Post hoc subgroup analysis based on treatment strategy could not establish differences (I² = 0%, P value 0.60) between studies of graded exercise therapy (Fulcher 1997; Moss‐Morris 2005; Powell 2001; Wearden 1998; Wearden 2010; White 2011) and studies testing exercise with self‐pacing (Wallman 2004) (SMD ‐0.71, 95% CI ‐1.09 to ‐0.32; I² = 82% vs SMD ‐0.54, 95% CI ‐1.05 to ‐0.02, respectively) (Analysis 1.19).

At follow‐up, post hoc subgroup analysis resulted in statistically significant subgroup differences (I² = 73.7%, P value 0.05) between the three studies (Powell 2001; Wearden 2010; White 2011) comparing graded exercise versus treatment as usual (SMD ‐0.86, 95% CI ‐1.67 to ‐0.05; I² = 95%) and Jason 2007, in which anaerobic activity was compared with relaxation (SMD 0.12, 95% CI ‐0.44 to 0.67).

Type of control

We cannot establish a subgroup difference (I² = 0%, P value 0.88) between the five studies with treatment as usual as control (Moss‐Morris 2005; Powell 2001; Wearden 1998; Wearden 2010; White 2011) and the two studies prescribing relaxation or flexibility to participants in the control arm (Fulcher 1997; Wallman 2004) (SMD ‐0.70, 95% CI ‐1.14 to ‐0.25 vs SMD ‐0.65, 95% CI ‐1.02 to ‐0.28).

Diagnostic criteria

As the use of various diagnostic criteria is often emphasised as particularly important with regard to treatment response, we also performed subgroup analyses based on diagnostic criteria. Comparison of the two studies using 1994 CDC criteria (Moss‐Morris 2005; Wallman 2004) and the five studies using the Oxford criteria (Fulcher 1997; Powell 2001; Wearden 1998; Wearden 2010; White 2011) revealed no differences between subgroups (I² = 0%, P value 0.84) (SMD ‐0.73, 95% CI ‐1.17 to ‐0.28 vs SMD ‐0.66, 95% CI ‐1.09 to ‐0.24).

1.2 Adverse effects

White 2011 reported two serious adverse reactions (SARs) (European Union Clinical Trials Directive 2001) possibly related to treatment among the 160 participants (i.e. deterioration in mobility and self‐care and worse CFS symptoms and function) in the exercise group and two SARs among the 159 participants in the control group (i.e. worse CFS symptoms and function and increased depression and incapacity) (odds ratio (OR) 0.99, 95% CI 0.14 to 7.1; Analysis 1.3). Participants in the Wearden 2010 trial reported no SARs to therapy.

1.3 Pain

Wearden 1998 reported that all treated groups scored similarly on the pain subscale of SF‐36 (Ware 1992), but measured values were not reported.

One trial, Jason 2007 (43 participants), assessed pain using the Brief Pain Inventory (Cleeland 1994) at follow‐up (Analysis 1.4) and observed an MD of ‐0.97 (95% CI ‐2.44 to 0.50) on pain severity and ‐0.69 on the pain interference subscale (95% CI ‐2.48 to 1.10). The wide confidence interval implies that the results were inconclusive.

1.4 Physical functioning

Five trials (Fulcher 1997; Moss‐Morris 2005; Powell 2001; Wearden 2010; White 2011) with a total of 725 participants assessed physical functioning according to the physical functioning subscale of SF‐36 (Ware 1992) at end of treatment. The pooled estimate for these studies (Analysis 1.5) suggests that mean improvement for participants randomly assigned to exercise therapy was 13.10 points higher (95% CI 1.98 to 24.22) than for the treatment as usual group, but heterogeneity was considerable (I² = 89%, P value < 0.00001).

Four trials (669 participants) contributed data for evaluation of physical functioning at follow‐up (Jason 2007; Powell 2001; Wearden 2010; White 2011). Jason 2007 observed better results among participants in the relaxation group (MD 21.48, 95% CI 5.81 to 37.15). However, results were distorted by large baseline differences in physical functioning between the exercise and relaxation groups (39/100 vs 54/100); therefore we decided not to include these results in the meta‐analysis. Pooling of the three remaining trials (621 participants) showed a mean improvement on the SF‐36 physical functioning subscale that was 16.33 points higher for exercise than for treatment as usual (95% CI ‐4.08 to 36.74; Analysis 1.6), but heterogeneity was excessive (I² = 96%, P value < 0.00001); therefore little or no difference cannot be ruled out.

Sensitivity analysis

Investigating heterogeneity

Extensive heterogeneity in Analysis 1.5 was largely driven by the remarkably positive effect of exercise therapy reported by Powell 2001. Heterogeneity (I²) dropped to 52% (P value 0.10) following exclusion of Powell 2001, and the pooled mean difference still showed better improvement for participants in the exercise group (MD 7.37, 95% CI 1.23 to 13.51). The remaining heterogeneity may reflect the large variation in baseline physical functioning observed across studies, ranging from 29.8 (Wearden 2010) to 53.1 (Moss‐Morris 2005), but the number of available studies was low; it is therefore difficult to explore this association further.

Also at follow‐up, observed heterogeneity was driven by remarkably positive results in favour of exercise as reported by Powell 2001. If Powell 2001 was excluded, heterogeneity dropped to 0% (P value 0.50), and the two remaining trials (Wearden 2010; White 2011) reported a smaller but statistically significant difference in favour of exercise therapy (MD ‐5.79, 95% CI ‐10.53 to ‐1.06).

Subgroup analysis

To explore the possible impact of varying exercise strategies and control conditions, we performed post hoc subgroup analyses within Analysis 1.5 and Analysis 1.6.

Type of exercise

All studies included in Analysis 1.5 and Analysis 1.6 offered graded exercise therapy. Jason 2007 observed better results among participants in the relaxation group than among those in the anaerobic exercise group (MD 21.48, 95% CI 5.81 to 37.15) at follow‐up. As stated above, these results were distorted by large baseline differences in physical functioning between exercise and relaxation groups (39 of 100 vs 54 of 100) and were not included in Analysis 1.6.

Type of control

At end of treatment, post hoc subgroup analysis did not establish a subgroup difference (I² = 0%, P value 0.92) between the four studies (Moss‐Morris 2005; Powell 2001; Wearden 2010; White 2011) using treatment as usual as control (MD ‐12.96, 95% CI ‐26.63 to 0.72; I² = 92%) and Fulcher 1997, in which relaxation or flexibility was used as a control (MD ‐13.87, 95% CI ‐24.31 to ‐3.43). All studies available for analysis at follow‐up adhered to the treatment as usual control condition, hence no sensitivity analyses were performed within Analysis 1.6.

Diagnostic criteria

We found no evidence of subgroup differences (I² = 0%, P value 0.91) between one study diagnosing participants according to the 1994 CDC criteria (MD ‐14.05, 95% CI ‐27.48 to ‐0.62; Moss‐Morris 2005) and four studies diagnosing participants according to the Oxford criteria (MD ‐12.92, 95% CI ‐25.99 to 0.14). All studies available for analysis at follow‐up recruited participants in keeping with the Oxford criteria, thus no sensitivity analyses were performed within Analysis 1.6.

1.5 Quality of life

None of the included studies reported quality of life at end of treatment. At follow‐up, an estimate of effect suggested improvement towards better quality of life (Burckhardt 2003) among participants in the control group (MD 9.00, 95% CI ‐1.00 to 19.00; P value 0.08) compared with those given exercise therapy (Jason 2007; Analysis 1.7; 44 participants), but little or no effect cannot be ruled out. This estimate is biased in favour of the control arm because of baseline differences between groups.

1.6.1 Depression

Five studies (Fulcher 1997; Powell 2001; Wallman 2004; Wearden 1998; Wearden 2010) with a total of 504 participants contributed information on depression at end of treatment (12 to 26 weeks), all utilising the depression subscale of the Hospital Anxiety and Depression Scale (HADS) (Zigmond 1983). Pooling study results yielded an estimate of effect that suggested improvement in depression scores among participants allocated to exercise therapy compared with controls (MD 1.6 points, 95% CI ‐0.23 to 3.5; Analysis 1.8), but the results were highly heterogeneous (I² = 84%, P value < 0.0001), and little or no difference cannot be ruled out.

At follow‐up (Analysis 1.9), Jason 2007 (45 participants) assessed depression using the Beck Depression Inventory (BDI‐II) (Beck 1996) and observed no difference in depression scores (MD 3.44, 95% CI ‐3.00 to 9.88)—an estimate that favours controls because of baseline differences between groups. Three trials reported HADS depression subscale values (Zigmond 1983) at follow‐up (Powell 2001; Wearden 2010; White 2011; 609 participants). The pooled estimate of effect suggests that exercise therapy improved depression more than treatment as usual (MD ‐2.26, 95% CI ‐5.09 to 0.56), but heterogeneity was considerable (I² = 92%, P value < 0.00001), and little or no difference cannot be ruled out.

Sensitivity analysis

Investigating heterogeneity

At end of treatment, Powell 2001 again reported very positive results and contributed greatly to the total heterogeneity. Exclusion of Powell 2001 led to a reduction in observed effect size (MD 0.80, 95% CI ‐0.21 to 1.82), but heterogeneity was also greatly reduced (I² = 36%, P value 0.20).

Also at follow‐up, Powell 2001 reported a substantial benefit of exercise therapy compared with results described by the other trials. Exclusion of Powell 2001 from the meta‐analysis was associated with a great reduction in heterogeneity, as I² dropped from 92% to 9% (P value 0.30). Exclusion of Powell 2001 was also associated with a change in the observed effect estimate (MD ‐0.77, 95% CI ‐1.64 to 0.09). Hence, we still see an effect estimate suggesting modest benefit associated with exercise therapy, but little or no difference cannot be ruled out.

Standardised mean difference (SMD)

At longer‐term follow‐up, depression was measured and reported on different measurement scales; therefore we performed a sensitivity analysis in which all available studies were pooled using an SMD method. The four available studies (Jason 2007; Powell 2001; Wearden 2010; White 2011) yielded a pooled standardised estimate of SMD ‐0.35 (95% CI ‐0.93 to 0.23) in an analysis that was associated with considerable heterogeneity (I² = 91%, P value < 0.00001).

Subgroup analysis

To explore the possible impact of varying exercise strategies and control conditions, we performed post hoc subgroup analyses within Analysis 1.8 and Analysis 1.9.

Type of exercise

No statistical subgroup differences (I² = 0%, P value 0.75) were observed between the four studies offering graded exercise therapy (Fulcher 1997; Powell 2001; Wearden 1998; Wearden 2010) and Wallman 2004, which offered exercise with personal pacing.

At longer‐term follow‐up, four available studies (Jason 2007; Powell 2001; Wearden 2010; White 2011) provided a pooled standardised estimate of SMD ‐0.35 (95% CI ‐0.93 to 0.23) in an analysis that was associated with considerable heterogeneity (I² = 91%, P value < 0.00001). Post hoc subgroup analysis resulted in a statistically significant subgroup difference (I² = 71.2%, P value 0.06) between the three studies (Powell 2001; Wearden 2010; White 2011) comparing graded exercise therapy versus treatment as usual (SMD ‐0.53, 95% CI ‐1.20 to 0.13) and Jason 2007, which compared anaerobic activity versus relaxation (SMD 0.31, 95% CI ‐0.28 to 0.90).

Type of control

At end of treatment, the post hoc subgroup analysis did not establish a subgroup difference (I² = 0%, P value 0.61) between the three studies (Powell 2001; Wearden 1998; Wearden 2010) using treatment as usual as the control (MD ‐2.01, 95% CI ‐5.12 to 1.10; I² = 91%) and the two studies (Fulcher 1997; Wallman 2004) using relaxation or flexibility as the control (MD ‐1.05, 95% CI ‐2.95 to 0.84; I² = 59%).

1.6.2 Anxiety

Five trials (Fulcher 1997; Powell 2001; Wallman 2004; Wearden 1998; Wearden 2010) assessed anxiety at end of treatment using the anxiety subscale of the HADS (Zigmond 1983). Three studies (387 participants) reported data in a way that facilitated comparison in a meta‐analysis (Powell 2001; Wallman 2004; Wearden 2010), resulting in a pooled MD of ‐1.48 points (95% CI ‐3.58 to 0.61; Analysis 1.10). The meta‐analysis was associated with heterogeneity (I² = 79%, P value 0.008), but some of this heterogeneity can be explained by uncorrected baseline differences in HADS anxiety score in included trials. Wearden 1998 (68 participants) stated that no significant changes were observed on the HADS anxiety score at end of treatment. Fulcher 1997 (58 participants) did not observe changes in median HADS anxiety score in the exercise group, whereas an increase in median HADS anxiety score from 4 to 7 was observed in the control group. However, the difference between exercise and control groups did not reach statistical significance in non‐parametric statistical analysis.

Four trials assessed anxiety at longer‐term follow‐up (52 to 70 weeks; Analysis 1.11). Jason 2007 (45 participants) reported a mean difference on the Beck Anxiety Inventory (BAI) (Beck 1996) of 0.70 points (95% CI ‐4.52 to 5.92), and the wide confidence interval implies inconclusive results. Three trials (607 participants) assessed follow‐up changes in anxiety using the HADS anxiety subscale (Powell 2001; Wearden 2010; White 2011). The pooled MD suggests greater improvement in HADS anxiety score in the exercise group compared with the group given treatment as usual (MD 1.01, 95% CI ‐0.74 to 2.75), but heterogeneity was considerable (I² = 78%, P value 0.01), and little or no difference cannot be ruled out.

Sensitivity analysis

Investigating heterogeneity

At follow‐up, Powell 2001 reported very positive results and contributed to increased heterogeneity. Exclusion of Powell 2001 reduced heterogeneity to 63% (P value 0.10), and the pooled MD for White 2011 and Wearden 2010 was reduced to 0.24 (95% CI ‐1.27 to 1.74).

Standardised mean difference (SMD)

At longer‐term follow‐up, anxiety was measured and reported on different measurement scales; therefore we performed a sensitivity analysis in which all available studies were pooled using an SMD method. Four available studies (Jason 2007; Powell 2001; Wearden 2010; White 2011) yielded a pooled standardised estimate of SMD ‐0.17 (95% CI ‐0.50 to 0.15), but the analysis was associated with heterogeneity (I² = 71%, P value 0.02).

Subgroup analysis

To explore the possible impact of varying exercise strategies and control conditions, we performed post hoc subgroup analyses within Analysis 1.10 and Analysis 1.11.

Type of exercise and control

At end of treatment, post hoc subgroup analysis did not establish a subgroup difference (I² = 0%, P value 0.64) between the two studies (Powell 2001; Wearden 2010) comparing graded exercise therapy versus treatment as usual (MD ‐1.22, 95% CI 0.‐4.51 to 2.07; I² = 88%) and Wallman 2004, which compared exercise with personal pacing versus flexibility and relaxation (MD ‐2.10, 95% CI ‐3.86 to ‐0.34).

At follow‐up, four available studies (Jason 2007; Powell 2001; Wearden 2010; White 2011) yielded a pooled standardised estimate of SMD ‐0.17 (95% CI ‐0.50 to 0.15), but the analysis was associated with heterogeneity (I² = 71%, P value 0.02). We could not establish a statistically significant subgroup difference (I² = 0%, P value 0.40) between the three studies (Powell 2001; Wearden 2010; White 2011) comparing graded exercise therapy versus treatment as usual (SMD ‐0.23, 95% CI ‐0.61 to 0.16) and Jason 2007, which compared anaerobic activity versus relaxation (SMD 0.08, 95% CI ‐0.51 to 0.66).

1.7 Sleep

Two trials (Powell 2001; Wearden 2010), with a total of 323 participants, suggested that sleep assessed by the Jenkins Sleep Scale (Jenkins 1988) had improved more among participants in the exercise group at end of treatment (MD ‐1.49 points, 95% CI ‐2.95 to ‐0.02; P value 0.05; Analysis 1.12). Fulcher 1997, with 59 participants at end of treatment, observed a reduction in median sleep score, as assessed by the Pittsburgh Sleep Quality Index, from 7 to 5 in the exercise group, whereas median sleep score remained 6 in the control group; this group difference did not reach statistical significance in non‐parametric statistical analysis.

At follow‐up, three included trials (Powell 2001; Wearden 2010; White 2011) (610 participants) showed effects in favour of exercise therapy when they were pooled (MD ‐2.04 points, 95% CI ‐3.48 to ‐0.23; P value 0.03; Analysis 1.13), but the three studies showed heterogeneous results: a large positive effect in Powell 2001 (MD ‐4.05, 95% CI ‐6.08 to ‐2.02) and a moderate effect in White 2011 (MD ‐2.00, 95% CI ‐3.84 to ‐0.23), with Wearden 2010 reporting no observed statistically significant differences between the two groups (MD ‐0.31, 95% CI ‐1.97 to 1.35).

Subgroup analysis

All available studies compared graded exercise therapy versus treatment as usual. All studies recruited participants according to the Oxford criteria, thus no subgroup analyses were performed within Analysis 1.12 and Analysis 1.13.

1.8 Self‐perceived changes in overall health

Seven trials assessed changes in overall health at end of treatment or at follow‐up by using a self‐rated Global Impression Change Scale with scores ranging from 1 (very much better) to 7 (very much worse). We performed analysis of the numbers of participants reporting improvement. Four trials (523 participants) reported changes in overall health after end of treatment (Fulcher 1997; Moss‐Morris 2005; Wallman 2004; Wearden 2010) and consistently showed a larger number of participants with some degree of improvement in the exercise group (RR 1.83, 95% CI 1.39 to 2.40; Analysis 1.14).

Three trials (518 participants) reporting self‐perceived changes in overall health at follow‐up were more inconsistent (Jason 2007; Powell 2001; White 2011). The point estimate for the risk ratio favoured exercise therapy (RR 1.88, 95% CI 0.76 to 4.64; Analysis 1.15), but the confidence interval implies inconclusive results, and heterogeneity was substantial (I² = 85%). Jason 2007 showed no significant differences between exercise and relaxation (RR 0.83, 95% CI 0.44 to 1.56) and White 2011 suggested a positive effect of exercise therapy compared with treatment as usual (RR 1.63, 95% CI 1.16 to 2.29), whereas Powell 2001 indicated a large positive effect for exercise (RR 5.96, 95% CI 2.36 to 15.09).

Subgroup analysis

To explore the potential impact of varying exercise strategies and control conditions, we performed a post hoc subgroup analysis within Analysis 1.14 and Analysis 1.15.

Type of control

At end of treatment, the pooled RR for all available studies was 1.83 (95% CI 1.39 to 2.40; I² = 0%) compared with 1.99 (95% CI 1.38 to 2.86; I² = 0%) in the treatment as usual subgroup (Moss‐Morris 2005; White 2011) and 1.64 (95% CI 1.09 to 2.48; I² = 0%) in the relaxation/flexibility subgroup (Fulcher 1997; Wallman 2004). Tests for subgroup differences did not establish differences between the two groups (I² = 0%, P value 0.50).

Type of exercise

Three studies offering graded exercise therapy (Fulcher 1997; Moss‐Morris 2005; White 2011) tended towards a greater chance of improvement (RR 2.01, 95% CI 1.46 to 2.77) than the study offering exercise with personal pacing (RR 1.43, 95% CI 0.85 to 2.41; Wallman 2004), but statistical tests did not establish a subgroup difference (I² = 13.6%, P value 0.28).

At follow‐up, the pooled RR for the three available studies was 1.88 (95% CI 0.76 to 4.64) in an analysis associated with extensive heterogeneity (I² = 85%, P value 0.001). The post hoc subgroup analysis did not firmly establish a subgroup difference (I² = 63%, P value 0.10) between the two studies (Powell 2001; White 2011) comparing graded exercise therapy versus treatment as usual (RR 2.92, 95% CI 0.75 to 11.35; I² = 87%) and Jason 2007, which compared anaerobic activity versus relaxation (RR 0.83, 95% CI 0.44 to 1.56).

1.9 Health service resources

Data on health service resources are available for one of the included studies with a total of 320 participants (White 2011). During the 12‐month post‐randomisation period, participants in the treatment as usual group had a higher mean number of specialist medical care contacts than those allocated to exercise therapy (MD ‐1.40, 95% CI ‐1.87 to ‐0.93; Analysis 1.16). Use of primary care resources (i.e. general practitioner or practice nurse), other doctor contacts (i.e. neurologist, psychiatrist or other specialists), accident and emergency contacts, medication (i.e. hypnotics, anxiolytics, antidepressants or analgesics), contacts with other healthcare professionals (i.e. dentist, optician, pharmacist, psychologist, physiotherapist, community mental health nurse or occupational therapist), inpatient contacts and other contacts with healthcare/social services (e.g. social worker, support worker, nutritionist, magnetic resonance imaging (MRI), computed tomography (CT), electroencephalography (EEG)) did not differ significantly between the two groups (Analysis 1.16; Analysis 1.17)

1.10 Drop‐out

Six studies (Fulcher 1997; Moss‐Morris 2005; Powell 2001; Wearden 1998; Wearden 2010; White 2011), with a total of 843 participants, reported drop‐out rates (Analysis 1.18). The pooled RR for drop‐out was 1.63 (95% CI 0.77 to 3.43). The confidence interval implies that these results were inconclusive, and heterogeneity was moderate (I² = 50%).

Subgroup analysis

The main analysis pooled studies using treatment as usual (Moss‐Morris 2005; Powell 2001; Wearden 1998; Wearden 2010) and studies using flexibility (Fulcher 1997) into the same comparison. The pooled RR for all available studies was 1.63 (95% CI 0.77 to 3.43; I² = 50%) compared with 1.77 (95% CI 0.71 to 4.38; I² = 61%) in the treatment as usual subgroup and 1.33 (95% CI 0.32 to 5.50) in the flexibility subgroup (Fulcher 1997). Tests for subgroup differences did not establish differences between the two groups (I² = 0%, P value 0.74).

Exercise therapy versus other treatments

Comparison 2. Exercise therapy versus psychological treatment

Three trials (Jason 2007; White 2011; Wearden 2010) contributed data to this comparison, which included cognitive‐behavioural therapy (CBT) (Jason 2007; White 2011), cognitive therapy treatment (COG) (Jason 2007) and supportive listening (Wearden 2010). We decided not to pool the results in meta‐analyses because of clinical and contextual heterogeneity.

2.1 Fatigue

End of treatment

White 2011 (298 participants) showed little or no difference in fatigue between exercise therapy and CBT (MD 0.20, 95% CI ‐1.49 to 1.89; Analysis 2.1).

Compared with 97 participants randomly assigned to supportive listening (Wearden 2010), 85 participants in the graded exercise therapy group experienced greater improvement in fatigue (MD ‐4.03, 95% CI ‐6.24 to ‐1.82; P value < 0.001; Analysis 2.1).

Follow‐up

Jason 2007 assessed fatigue using a 7‐point Fatigue Severity Scale (Krupp 1989) and showed an MD of ‐0.10 (95% CI ‐0.79 to 0.59) for anaerobic exercise versus COG (49 participants; Analysis 2.2). The wide confidence interval implies imprecise and inconclusive results.

Wide confidence intervals and imprecise results also apply to the comparison of anaerobic exercise versus CBT as reported by Jason 2007 (49 participants) with an MD of 0.40 (95% CI ‐0.34 to 1.14; Analysis 2.2). White 2011 compared graded exercise therapy versus CBT (302 participants) by assessing fatigue on a 33‐point Fatigue Scale (Chalder 1993) and observed little or no difference between the two groups (MD 0.30, 95% CI ‐1.45 to 2.05; Analysis 2.3).

Wearden 2010 (182 participants) assessed fatigue on a 33‐point Fatigue Scale (Chalder 1993) and reported differences between rehabilitation and supportive listening that favoured graded exercise therapy (MD ‐2.72, 95% CI ‐5.14 to ‐0.30; P value 0.03; Analysis 2.3).

Sensitivity analysis

At follow‐up, the available studies (Jason 2007; White 2011) measured and reported fatigue on different scales, and we performed a sensitivity analysis in which the two studies were pooled using an SMD method. The resulting pooled SMD estimate is 0.07 (95% CI ‐0.13 to 0.28) with no unexplained heterogeneity (I² = 0%, P value 0.40).

Subgroup analysis

Post hoc subgroup analysis did not establish a subgroup difference (I² = 0%, P value 0.40) between White 2011, which compared graded exercise therapy versus CBT (SMD 0.04, 95% CI ‐0.19 to 0.26), and Jason 2007, which compared anaerobic activity versus CBT (SMD 0.30, 95% CI ‐0.26 to 0.86).

2.2 Adverse effects

White 2011 reported the number of serious adverse reactions (SARs) (European Union Clinical Trials Directive 2001) observed in each treatment group (Analysis 2.4). Two adverse reactions possibly related to treatment were observed among the 160 participants in the exercise group (one participant with deterioration in mobility and self‐care, and one with worse CFS symptoms and function), and three participants reporting a total of four SARs were described among 161 participants in the CBT group (one incident of self‐harm, one incident of low mood with an episode of self‐harm, one episode of worsened mood and CFS symptoms and one incident of threatened self‐harm). Thus, the observed RR was 0.67 (95% CI 0.11 to 3.96), implying that these results were inconclusive.

Wearden 2010 stated that no participants in the rehabilitation or supportive listening group demonstrated SARs with a probable relation to therapy (Analysis 2.4).

2.3 Pain

Jason 2007 (43 participants) reported differences in pain at follow‐up (52 weeks), as assessed by the Brief Pain Inventory (Cleeland 1994). When anaerobic exercise was compared with CBT, results were imprecise for pain severity (MD 0.07, 95% CI ‐1.52 to 1.66; Analysis 2.5) and for pain interference (MD ‐0.35, 95% CI ‐2.29 to 1.59; Analysis 2.6). As the result of baseline differences between groups, these estimates, to some extent, are biased in favour of exercise.

Jason 2007 also compared anaerobic exercise versus COG (44 participants). Here, inconclusive results were observed in pain severity (MD 0.51, 95% CI ‐0.92 to 1.94; Analysis 2.5) and pain interference (MD 0.39, 95% CI ‐1.37 to 2.15; Analysis 2.6).

2.4 Physical functioning

End of treatment

White 2011 (298 participants) reported changes in physical functioning between participants randomly assigned to exercise and CBT at end of treatment by using the SF‐36 physical functioning subscale (Ware 1992). Scores on this scale range from 0 to 100, and study authors observed little or no difference in physical function between the two groups (MD ‐1.20, 95% CI ‐6.30 to 3.90; Analysis 2.7).

Wearden 2010 (181 participants) suggested greater improvement in physical function among participants in the graded exercise therapy group than in the supportive listening group (MD ‐6.66 point, 95% CI ‐13.7 to 0.40; P value 0.06; Analysis 2.7), but little or no difference cannot be ruled out.

Follow‐up

Both Jason 2007 and White 2011 reported physical function at 52‐week follow‐up. Whereas White 2011 (302 participants) observed little or no difference between graded exercise therapy and CBT (MD 0.50, 95% CI ‐4.89 to 5.89; Analysis 2.8), Jason 2007 (46 participants) reported a significant difference favouring CBT (MD 18.92, 95% CI 2.12 to 35.72; Analysis 2.8) when compared with anaerobic exercise. However, results of the latter study are skewed because of uncorrected baseline differences in physical function between the two groups (39 vs 46 points), and this explains some of the observed heterogeneity.

Jason 2007 (47 participants) also compared anaerobic exercise versus COG, suggesting a large difference in favour of COG (MD 21.37, 95% CI 6.61 to 36.13; Analysis 2.8). It should be noted, however, that the latter estimate is probably biased in favour of COG because of uncorrected baseline differences in physical function between the two groups (39 vs 46 points).

Wearden 2010 (171 participants) suggested greater improvement in physical function among participants in the graded exercise therapy than in the supportive listening group (MD ‐7.55 point, 95% CI ‐15.57 to 0.47; Analysis 2.8), but little or no difference cannot be ruled out.

2.5 Quality of life

Study authors provided no data.

2.6.1 Depression

End of treatment

In Wearden 2010 (182 participants), graded exercise therapy was associated with greater improvement on the HADS depression subscale (Zigmond 1983) than was seen with supportive listening (MD ‐1.57, 95% CI ‐2.74 to ‐0.40; P value 0.008; Analysis 2.9). We did not identify trials reporting depression for exercise versus CBT or for exercise versus COG at end of treatment.

Follow‐up

Jason 2007 assessed depression using the Beck Depression Inventory (BDI‐II) (Beck 1996). When comparing anaerobic exercise versus COG (45 participants), study authors saw a trend towards greater improvement among participants in the COG group (MD 5.08, 95% CI ‐0.77 to 10.93; Analysis 2.10), but little or no difference cannot be ruled out.

Two trials compared exercise therapy versus CBT (Jason 2007; White 2011), with neither showing statistically significant differences between the two groups. Jason 2007 (44 participants) assessed depression using the BDI‐II (Beck 1996) and reported imprecise results (MD 2.99, 95% CI ‐4.37 to 10.35; Analysis 2.10); interpretation of these results is further complicated by baseline differences between groups. On the other hand, White 2011 (287 participants) assessed depression using the HADS depression subscale (Zigmond 1983) and found little or no difference between graded exercise therapy and CBT (MD ‐0.10, 95% CI ‐1.00 to 0.80; Analysis 2.11).

Wearden 2010 compared graded exercise therapy and supportive listening. At end of treatment, results favoured exercise, but this effect was not sustained at 70 weeks' follow‐up (171 participants; MD ‐0.79, 95%CI ‐2.31 to 0.55; Analysis 2.11).

Sensitivity analysis

As depression was measured and reported on two different scales in Jason 2007 and White 2011, we performed a sensitivity analysis in which the two studies were pooled using an SMD method. The resulting pooled SMD estimate is 0.01 (95% CI ‐0.21 to 0.22) with no unexplained heterogeneity (I² = 0%, P value 0.42).

Subgroup analysis

Post hoc subgroup analysis did not establish a subgroup difference (I² = 0%, P value 0.42) between White 2011, which compared graded exercise therapy versus CBT (SMD ‐0.03, 95% CI ‐0.26 to 0.21) and Jason 2007, which compared anaerobic exercise versus CBT (SMD 0.23, 95% CI ‐0.36 to 0.83).

2.6.2 Anxiety

End of treatment

Wearden 2010 (182 participants) found little or no difference on the HADS anxiety subscale (Zigmond 1983) between graded exercise therapy and supportive listening (MD ‐0.48, 95% CI ‐1.85 to 0.89; Analysis 2.12). We did not identify trials reporting anxiety for exercise therapy versus CBT or for exercise therapy versus COG at end of treatment.

Follow‐up

Jason 2007 (45 participants) assessed anxiety using the Beck Anxiety Inventory (BAI) (Beck 1996). When comparing anaerobic exercise versus COG, study authors did not observe statistically significant differences between groups, but results were imprecise (MD 3.15, 95% CI ‐1.17 to 7.47; Analysis 2.13).

Two trials compared exercise therapy versus CBT (Jason 2007; White 2011), with neither showing statistically significant differences between the two groups. Jason 2007 (44 participants) assessed anxiety using the BAI (Beck 1996), with imprecise and statistically insignificant results (MD 0.66, 95% CI ‐4.68 to 6.00; Analysis 2.13). White 2011 (287 participants) found little or no difference between graded exercise therapy and CBT using the HADS anxiety subscale (MD 0.30, 95% CI ‐0.71 to 1.31; Analysis 2.14).

Wearden 2010 (171 participants) did not observe statistically significant differences on the HADS anxiety subscale between graded exercise therapy and supportive listening at 70 weeks (MD ‐0.08, 95%CI ‐1.52 to 1.36; Analysis 2.14).

Sensitivity analysis

As depression was measured and reported on two different scales in Jason 2007 and White 2011, we performed a sensitivity analysis in which the two studies were pooled using an SMD method. The resulting pooled SMD estimate is 0.07 (95% CI ‐0.15 to 0.28) with no unexplained heterogeneity (I² = 0%, P value 0.99).

Subgroup analysis

Post hoc subgroup analysis did not establish a subgroup difference (I² = 0%, P value 0.99) between White 2011, which compared graded exercise therapy versus CBT (SMD 0.07, 95% CI ‐0.16 to 0.30) and Jason 2007, which compared anaerobic activity versus CBT (SMD 0.07, 95% CI ‐0.52 to 0.66).

2.7 Sleep

End of treatment

Wearden 2010 observed that the 83 participants in the graded exercise therapy group experienced greater improvement on the 20‐point Jenkins Sleep Scale (Jenkins 1988) as compared with the 97 participants in the supportive listening group (MD ‐2.46 points, 95% CI ‐4.01 to ‐0.91; P value 0.002; Analysis 2.15). We did not identify trials reporting sleep for exercise therapy versus CBT or for exercise therapy versus COG at end of treatment.

Follow‐up

White 2011 (287 participants) assessed sleep using the Jenkins Sleep Scale (Jenkins 1988) and found little or no difference between graded exercise therapy and CBT (MD ‐0.90, 95%CI ‐2.07 to 0.27; Analysis 2.16). Wearden 2010 (171 participant) also used the Jenkins Sleep Scale and found little or no difference between graded exercise therapy and supportive listening (MD ‐0.86, 95% CI ‐2.56 to 0.84; Analysis 2.16).

2.8 Self‐perceived changes in overall health

Two trials (Jason 2007; White 2011) assessed changes in overall health by using a self‐rated Global Impression Change Scale with scores ranging from 1 (very much better) to 7 (very much worse) (Guy 1976). We performed analysis of the numbers of participants reporting improvement.

End of treatment

White 2011 (320 participants) reported changes in overall health following graded exercise therapy versus CBT, but results were inconclusive (RR 0.96, 95% CI 0.71 to 1.31; Analysis 2.17).

Follow‐up

At follow‐up, self‐perceived changes in overall health were reported by Jason 2007 and White 2011.

For the comparison of COG versus anaerobic exercise, Jason 2007 (50 participants) showed that more participants in the CBT group than in the exercise group tended to report improvement, but little or no difference between CBT and exercise therapy cannot be ruled out (RR 0.63, 95% CI 0.36 to 1.10; Analysis 2.18).

Both Jason 2007 (47 participants) and White 2011 (321 participants) compared exercise therapy versus CBT. Pooling resulted in an RR of 0.71 (95% CI 0.33 to 1.54; Analysis 2.18), implying imprecise and inconclusive results. The meta‐analysis was associated with considerable heterogeneity (I2 = 86%) as the result of inconsistency between effect estimates reported by Jason 2007, which compared anaerobic exercise versus CBT (RR 0.46, 95% CI 0.28 to 0.77), and White 2011, which compared graded exercise therapy versus CBT (RR 1.02, 95% CI 0.77 to 1.35).

2.9 Health service resources

Data on health service resources were provided by one of the included studies with a total of 321 participants (White 2011). During the 12‐month post‐randomisation period, participants in the CBT group showed lower mean numbers of contacts with neurologist, psychiatrist or other specialists (MD 0.60, 95% CI 0.05 to 1.15; Analysis 2.19) and lower mean numbers of inpatient days (MD 0.80, 95% CI 0.41 to 1.19; Analysis 2.19) when compared with participants in the exercise group. However, these group differences were not seen when data were analysed at a dichotomous level (Analysis 2.20).

2.10 Drop‐out

White 2011 (321 participant) reported drop‐out from treatment. Drop‐out rates were not significantly different between graded exercise therapy and CBT (RR 0.59, 95% CI 0.28 to 1.25; Analysis 2.21), but these results were imprecise and inconclusive because few events were reported.

Wearden 2010 reported that more participants discontinued graded exercise therapy (12 of 92 participants) than supportive listening (7 of 91 participants) (RR 1.70, 95% CI 0.70 to 4.11; Analysis 2.21), but the confidence interval implies that these results were imprecise and inconclusive.

Comparison 3. Exercise therapy versus adaptive pacing therapy

One trial contributed data on 319 participants for this comparison (White 2011).

3.1 Fatigue

Fatigue assessed by a 33‐point Fatigue Scale (Chalder 1993) improved more among participants allocated to graded exercise therapy than adaptive pacing (MD ‐2.00, 95% CI ‐3.57 to ‐0.43; P value 0.01) when measured at end of treatment (24 weeks; 305 participants). This positive effect was sustained at 52 weeks' follow‐up (307 participants; MD ‐2.50, 95% CI ‐4.16 to ‐0.84; P value 0.003; Analysis 3.1).

3.2 Adverse effects

White 2011 reported the number of SARs (European Union Clinical Trials Directive 2001) observed in each treatment group (Analysis 3.2). Two SARs possibly related to treatment were observed among the 160 participants in the graded exercise therapy group (one incident of deterioration in mobility and self‐care, and one episode of worse CFS symptoms and function) compared with two in the adaptive pacing group (159 participants) (one incident of suicidal thoughts, and one episode of worsened depression). Thus, results were inconclusive, with an RR of 0.99 (95% CI 0.14 to 6.97).

3.3 Pain

No data were provided.

3.4 Physical functioning

The graded exercise therapy group (150 participants) experienced significant improvement in physical functioning compared with the adaptive pacing group (155 participants) (Analysis 3.3). At end of treatment, participants in the graded exercise therapy group scored a mean of 12.2 points better (95% CI ‐17.23 to ‐7.17) on the SF‐36 physical functioning subscale (Ware 1992) than those in the adaptive pacing group—a difference that was sustained at 52 weeks' follow‐up (307 participants; MD ‐11.8, 95% CI ‐17.5 to ‐6.05).

3.5 Quality of life

No data were provided.

3.6.1 Depression

The change on the HADS depression subscale (Zigmond 1983) at end of treatment was not reported (White 2011). At follow‐up, participants in the graded exercise therapy group (144 participants) had improved by a mean of 1.10 points (95% CI ‐2.09 to ‐0.11) on the HADS depression subscale when compared with the 149 participants in the pacing group (Analysis 3.4).

3.6.2 Anxiety

White 2011 did not report the change on the HADS anxiety subscale (Zigmond 1983) at end of treatment, and they observed little or no difference between the two groups (293 participants) at 52 weeks (MD ‐0.40, 95% CI ‐1.40 to 0.60; Analysis 3.5).

3.7 Sleep

White 2011 did not report change in sleep at end of treatment as assessed by the 20‐point Jenkins Sleep Scale (Jenkins 1988). At follow‐up, participants in the graded exercise therapy group (144 participants) had improved by a mean of 1.60 points (95% CI ‐2.70 to ‐0.50) when compared with the 150 participants in the adaptive pacing group (Analysis 3.6).

3.8 Self‐perceived changes in overall health

White 2011 assessed changes in overall health by using a self‐rated Global Impression Change Scale with scores ranging from 1 (very much better) to 7 (very much worse) (Guy 1976). Comparisons of the numbers of participants reporting improvement showed that a larger fraction of participants in the graded exercise therapy group experienced improvement at end of treatment (319 participants; RR 1.45, 95% CI 1.02 to 2.07; Analysis 3.7). At follow‐up, an estimate of effect that suggested improvement favouring graded exercise therapy was still observed, but little or no effect cannot be ruled out (319 participants; RR 1.31, 95% CI 0.96 to 1.79).

3.9 Health service resources

One of the included studies with a total of 319 participants provided data on health service resources (White 2011). During the 12‐month post‐randomisation period, participants in the pacing group showed lower mean numbers of contacts with complementary healthcare resources (MD 3.80, 95% CI 1.42 to 6.18; Analysis 3.8), lower mean numbers of contacts with other doctors (neurologist, psychiatrist and other specialists) (MD 0.70, 95% CI 0.14 to 1.26; Analysis 3.8), lower mean numbers of accidents and emergencies (MD 0.50, 95% CI 0.31 to 0.69; Analysis 3.8) and higher mean numbers of inpatient days (MD 1.00, 95% CI 0.46 to 1.54; Analysis 3.8) than were seen among participants in the exercise group. However, these group differences were not seen when data were analysed at a dichotomous level (Analysis 3.9).

3.10 Drop‐out

In the PACE trial (White 2011), 10 of the 160 participants in the graded exercise therapy group and 11 of the 160 participants in the adaptive pacing group withdrew, thus the results were inconclusive (RR 0.91, 95% CI 0.40 to 2.08; Analysis 3.10).

Comparison 4. Exercise therapy versus antidepressants

One trial contributed data on a total of 69 participants to this comparison (Wearden 1998). In this trial, investigators combined graded exercise therapy with antidepressant placebo, and the antidepressant used was fluoxetine.

4.1 Fatigue

Investigators assessed fatigue on a 42‐point Fatigue Scale (Chalder 1993; 48 participants) at end of treatment, but the results were inconclusive (MD ‐1.99, 95% CI ‐8.28 to 4.30; Analysis 4.1).

4.2 Adverse effects

Study authors provided no data.

4.3 Pain

Study authors provided no data.

4.4 Physical functioning

Study authors provided no data.

4.5 Quality of life

Study authors provided no data.

4.6.1 Depression

Researchers assessed depression among 48 participants at end of treatment using the HADS depression subscale (Zigmond 1983), but they found little or no difference between the exercise and fluoxetine groups (MD 0.15, 95% CI ‐2.11 to 2.41; Analysis 4.2).

4.6.2 Anxiety

Study authors provided no data.

4.7 Sleep

Study authors provided no data.

4.8 Self‐perceived changes in overall health

Study authors provided no data.

4.9 Health service resources

Study authors provided no data.

4.10 Drop‐out

Wearden 1998 observed similar drop‐out rates in both groups, with 11 drop‐outs reported among the 34 participants in the exercise group and 10 drop‐outs among the 35 participants in the antidepressant group (RR 1.13, 95% CI 0.55 to 2.31; Analysis 4.3), implying that the results were inconclusive.

Exercise therapy adjunctive to other treatment versus the other treatment alone

Comparison 5. Exercise therapy versus antidepressants plus exercise therapy

One trial contributed data for a total of 68 participants to this comparison (Wearden 1998). In this trial, investigators combined graded exercise therapy with use of the antidepressant fluoxetine.

5.1 Fatigue

Researchers assessed fatigue on a 42‐point Fatigue Scale (Chalder 1993; 43 participants) at end of treatment, but the results were inconclusive (MD ‐3.66, 95% CI ‐10.41 to 3.09; Analysis 5.1).

5.2 Adverse effects

Study authors provided no data.

5.3 Pain

Study authors provided no data.

5.4 Physical functioning

Study authors provided no data.

5.5 Quality of life

Study authors provided no data.

5.6.1 Depression

Researchers assessed depression at end of treatment among 43 participants using the HADS depression subscale (Zigmond 1983), but the results were inconclusive (MD ‐0.52, 95% CI ‐2.68 to 2.14; Analysis 5.2).

5.6.2 Anxiety

Study authors provided no data.

5.7 Sleep

Study authors provided no data.

5.8 Self‐perceived changes in overall health

Study authors provided no data.

5.9 Health service resources

Study authors provided no data.

5.10 Drop‐out

Wearden 1998 observed similar drop‐out rates in both groups, with 14 drop‐outs reported among the 33 participants in the exercise plus antidepressant group, and 10 drop‐outs among the 35 participants in the antidepressant group (RR 1.48, 95% CI 0.77 to 2.87; Analysis 5.3). The confidence interval implies that the results were inconclusive.

Discussion

Summary of main results

We have included eight studies including 1518 participants in this review.

When exercise therapy was compared with 'passive control,' fatigue was significantly reduced at end of treatment (Analysis 1.1). Data on serious adverse reactions (SARs) were available from only one trial, and SARs were rare, but too few events were reported to allow any conclusions to be drawn (Analysis 1.3). A positive effect of exercise therapy was observed both at end of treatment and at follow‐up with respect to sleep (Analysis 1.12; Analysis 1.13), physical functioning (Analysis 1.5; Analysis 1.6) and self‐perceived changes in overall health (Analysis 1.14;Analysis 1.15). For the remaining outcomes, we were not able to draw any conclusions.

When exercise therapy was compared with cognitive‐behavioural therapy (CBT), little or no difference in fatigue was noted between the two groups (Analysis 2.1; Analysis 2.2). Serious adverse reactions were rare and were reported at similar rates in the two groups. Events were few; therefore results were too imprecise to allow any conclusions to be drawn (Analysis 2.4). Little or no difference was observed between exercise therapy and CBT for physical functioning (Analysis 2.7; Analysis 2.8), depression (Analysis 2.10;Analysis 2.11), anxiety (Analysis 2.13;Analysis 2.14) and sleep (Analysis 2.16). It was not possible to draw any conclusions regarding pain (Analysis 2.5;Analysis 2.6), self‐perceived changes in overall health (Analysis 2.17; Analysis 2.18) or drop‐out (Analysis 2.21).

When exercise therapy was compared with pacing, fatigue (Analysis 3.1), physical functioning (Analysis 3.3), depression (Analysis 3.4), sleep (Analysis 3.6) and self‐perceived changes in overall health at end of treatment (Analysis 3.7) were significantly better. Data on SARs were available from only one trial, and SARs were rare, but events were too few to allow any conclusions to be drawn (Analysis 3.2). For anxiety, little or no difference between groups was reported (Analysis 3.5).

Overall completeness and applicability of evidence

The evidence base was limited to patients able to participate in exercise therapy, and all studies were conducted in developed countries (Australia, New Zealand, North America and the United Kingdom). Settings varied from primary to tertiary care, which suggests easy generalisation. Most of the outcomes investigated were reported in the included studies, apart from health service resources. Most studies used aerobic exercise, but it would be preferable if we had found studies that used different types of exercise therapy, as this would reflect clinical practice.

Quality of the evidence

Risk of bias across studies was relatively low. We were able to identify pre‐published protocols for only two studies (Wearden 2010; White 2011) and have identified a risk of unpublished outcomes.

One limitation is that formal blinding of participants and clinicians to treatment arm is not inherently possible in trials of exercise therapy. This increases risk of bias, as instructors' and participants' knowledge of group assignation might have influenced the true effect. In addition, outcomes were measured subjectively (e.g. questionnaires, visual analogue scales), leading to risk that this might increase the outcome estimate. Against this, many patient charities are opposed to exercise therapy for chronic fatigue syndrome (CFS), and this may in contrast reduce the effect. Six of the seven studies reported that investigators used intention‐to‐treat analysis, but this was done in different ways, which might have influenced the effect estimate. One study (Jason 2007) reported baseline differences, used a best linear unbiased predictor to avoid taking missing data into account and described 25 outcomes, with none stated as primary.

Several methodological challenges have become evident during the review process. An obvious topic of discussion is the between‐study variation observed with regard to type of exercise, intensity of exercise and incremental procedures used (Table 2). We acknowledge that an effect of exercise therapy is likely to depend on how training is conducted, thus inclusion of trials using different exercise regimens is likely to introduce some heterogeneity into the analysis. Possibly equally important, the treatment provided to participants in the control group was not uniform across included trials. Whereas the difference between waiting list, relaxation and treatment as usual is rather obvious, it is important to recognise that the actual ingredients of ‘treatment as usual’ differ widely among the included trials, and this may contribute to variation in observed effect estimates. With regard to participants and their health status, it is important to realise that substantial differences in baseline illness severity were noted, as illustrated by the wide range in baseline physical functioning, depression co‐morbidity and illness duration shown in Table 1. Some trials applied narrow selection criteria, whereas others seem to have included more heterogeneous sample populations; these differences might cause variation in the observed effect estimate. Our finding of similar outcomes with different definitions of CFS mitigates this risk.

All potential sources of heterogeneity mentioned above could have contributed to variation in results derived from the aggregate analysis presented in the present review and might have reduced our ability to draw firm conclusions. It is easy to imagine a potential correlation between observed treatment effect and factors such as exercise characteristics, control conditions, participant recruitment strategies, participant characteristics and baseline differences. We aimed to explore these associations in subgroup analyses. However, the number of potential heterogeneity factors is high and the number of available trials is low; therefore we were limited in our ability to explore heterogeneity in a sensible way at the aggregate level.

Potential biases in the review process

The strength of this review lies in its rigorous methods, which include thorough searching for evidence, systematic appraisal of study quality and systematic and well‐defined data synthesis. Even though we tried to search as extensively as possible, we may have missed out on eligible trials, such as trials reported only in dissertations or in non‐indexed journals.

The table of interventions (Table 2) includes published and unpublished information regarding types of interventions, but not effect estimates. For this updated review, we have not collected unpublished data for our outcomes but have used data from the 2004 review (Edmonds 2004) and from published versions of included articles.

The authors of this review had to make a cutoff regarding what kind of exercise should be included. We decided to exclude traditional Chinese exercise such as Tai Chi and Qigong, but to include pragmatic rehabilitation for which the type of exercise is described as walking, walking stairs, bicycling, dancing or jogging. The cutoff might be contentious, and discussion regarding what type of exercise should be included should be ongoing.

One of the included studies (Powell 2001) is an outlier, reporting very positive results in favour of exercise therapy; we decided post hoc to perform a sensitivity analysis from which Powell 2001 was excluded to learn what the results would be if this study was not included.

Review authors noted potential bias regarding how the comparators in this review were categorised and pooled. We decided to report diverse comparators such as cognitive‐behavioural therapy (CBT), cognitive therapy treatment (COG) and supportive therapy together as a single comparator called 'psychological treatments' (however, because of clinical and contextual heterogeneity, we decided not to pool the results in meta‐analyses). These different psychological treatments do have similar elements, for example, both CBT and COG use cognitive approaches and goal setting; however they differ in certain respects (e.g. CBT tries to change unhelpful thoughts, while COG aims to accept them (Jason 2007)). Our approach of combining these comparators might be considered contentious, and discussion about what should be lumped together and what should be split into different comparators should be ongoing.

Meta‐analysis of individual patient data (IPD) constitutes an alternative approach to meta‐analysis of aggregate data. Analysis based on individual patient data in general will enable us to use a wider range of statistical and analytical approaches (Higgins 2011). In particular, by utilising IPD, it is possible to explore the relative importance of the various heterogeneity factors mentioned above more thoroughly, and to ensure that missing data and baseline differences are dealt with in standardised ways. With access to IPD, it is also possible to perform subgroup analyses that have not been previously reported. A project aimed at undertaking IPD analyses of the trials included in the present review has been initiated, and when the IPD analyses are presented, they are likely to shed some new light on the aggregate level analyses presented in the current systematic review.

Agreements and disagreements with other studies or reviews

This review is an updated version of a review that was originally published in 2004 (Edmonds 2004); the revised version offers major additions and changes. According to recent updates provided in the Cochrane Handbook for Systematic Reviews of Interventions, we have implemented several methodological improvements, including a thorough risk of bias assessment for all included studies (Higgins 2011). Also, the updated search for literature led to the inclusion of three new trials with a total of 1051 participants (Jason 2007; Wearden 2010; White 2011), thus the number of included participants has more than tripled since the 2004 version. The inclusion of new trials has important implications. First, statistical power has been increased by the addition of new data. Second, the most recent trials offered longer follow‐up times; therefore we can provide more clear conclusions about follow‐up treatment effects in this update than were provided in the original review. Third, the most recent trials involve comparisons beyond exercise therapy versus treatment as usual, for example, comparisons of exercise therapy versus other active treatment strategies such as CBT and adaptive pacing therapy.

This update provides valuable additional information when compared with the original review, and results reported in the original review are largely confirmed in this update. Moreover, the results reported here correspond well with those of other systematic reviews (Bagnall 2002; Larun 2011; Prins 2006) and with existing guidelines (NICE 2007). One meta‐analysis of CBT and GET suggests that the two treatments are equally efficacious, especially for patients with co‐morbid anxiety or depressive symptoms (Castell 2011).

A recent randomised trial comparing quality of life among participants randomly assigned to group CBT plus graded exercise therapy plus conventional pharmacological treatment or exercise counselling plus conventional pharmacological treatment found no differences between the two groups at 12 months' follow‐up (Nunez 2011). This trial did not meet our a priori inclusion criteria and was excluded from our review. As the comparison used in Nunez 2011 differs from the comparisons reported in our review, it is difficult to compare the results directly; this comparison was complicated further by the fact that Nunez 2011 did not measure outcomes viewed as primary outcomes in our review. Consequently, our view is that the conclusions presented in our review correspond well with those of other relevant studies and reviews, but further research is needed to explore the considerable heterogeneity observed across available trials.

PRISMA flow diagram.
Figures and Tables -
Figure 1

PRISMA flow diagram.

Risk of bias summary: review authors' judgements about each risk of bias item for each included study.
Figures and Tables -
Figure 2

Risk of bias summary: review authors' judgements about each risk of bias item for each included study.

Risk of bias graph: review authors' judgements about each risk of bias item presented as percentages across all included studies.
Figures and Tables -
Figure 3

Risk of bias graph: review authors' judgements about each risk of bias item presented as percentages across all included studies.

Comparison 1 Exercise therapy versus treatment as usual, relaxation or flexibility, Outcome 1 Fatigue (end of treatment).
Figures and Tables -
Analysis 1.1

Comparison 1 Exercise therapy versus treatment as usual, relaxation or flexibility, Outcome 1 Fatigue (end of treatment).

Comparison 1 Exercise therapy versus treatment as usual, relaxation or flexibility, Outcome 2 Fatigue (follow‐up).
Figures and Tables -
Analysis 1.2

Comparison 1 Exercise therapy versus treatment as usual, relaxation or flexibility, Outcome 2 Fatigue (follow‐up).

Comparison 1 Exercise therapy versus treatment as usual, relaxation or flexibility, Outcome 3 Participants with serious adverse reactions.
Figures and Tables -
Analysis 1.3

Comparison 1 Exercise therapy versus treatment as usual, relaxation or flexibility, Outcome 3 Participants with serious adverse reactions.

Comparison 1 Exercise therapy versus treatment as usual, relaxation or flexibility, Outcome 4 Pain (follow‐up).
Figures and Tables -
Analysis 1.4

Comparison 1 Exercise therapy versus treatment as usual, relaxation or flexibility, Outcome 4 Pain (follow‐up).

Comparison 1 Exercise therapy versus treatment as usual, relaxation or flexibility, Outcome 5 Physical functioning (end of treatment).
Figures and Tables -
Analysis 1.5

Comparison 1 Exercise therapy versus treatment as usual, relaxation or flexibility, Outcome 5 Physical functioning (end of treatment).

Comparison 1 Exercise therapy versus treatment as usual, relaxation or flexibility, Outcome 6 Physical functioning (follow‐up).
Figures and Tables -
Analysis 1.6

Comparison 1 Exercise therapy versus treatment as usual, relaxation or flexibility, Outcome 6 Physical functioning (follow‐up).

Comparison 1 Exercise therapy versus treatment as usual, relaxation or flexibility, Outcome 7 Quality of life (follow‐up).
Figures and Tables -
Analysis 1.7

Comparison 1 Exercise therapy versus treatment as usual, relaxation or flexibility, Outcome 7 Quality of life (follow‐up).

Comparison 1 Exercise therapy versus treatment as usual, relaxation or flexibility, Outcome 8 Depression (end of treatment).
Figures and Tables -
Analysis 1.8

Comparison 1 Exercise therapy versus treatment as usual, relaxation or flexibility, Outcome 8 Depression (end of treatment).

Comparison 1 Exercise therapy versus treatment as usual, relaxation or flexibility, Outcome 9 Depression (follow‐up).
Figures and Tables -
Analysis 1.9

Comparison 1 Exercise therapy versus treatment as usual, relaxation or flexibility, Outcome 9 Depression (follow‐up).

Comparison 1 Exercise therapy versus treatment as usual, relaxation or flexibility, Outcome 10 Anxiety (end of treatment).
Figures and Tables -
Analysis 1.10

Comparison 1 Exercise therapy versus treatment as usual, relaxation or flexibility, Outcome 10 Anxiety (end of treatment).

Comparison 1 Exercise therapy versus treatment as usual, relaxation or flexibility, Outcome 11 Anxiety (follow‐up).
Figures and Tables -
Analysis 1.11

Comparison 1 Exercise therapy versus treatment as usual, relaxation or flexibility, Outcome 11 Anxiety (follow‐up).

Comparison 1 Exercise therapy versus treatment as usual, relaxation or flexibility, Outcome 12 Sleep (end of treatment).
Figures and Tables -
Analysis 1.12

Comparison 1 Exercise therapy versus treatment as usual, relaxation or flexibility, Outcome 12 Sleep (end of treatment).

Comparison 1 Exercise therapy versus treatment as usual, relaxation or flexibility, Outcome 13 Sleep (follow‐up).
Figures and Tables -
Analysis 1.13

Comparison 1 Exercise therapy versus treatment as usual, relaxation or flexibility, Outcome 13 Sleep (follow‐up).

Comparison 1 Exercise therapy versus treatment as usual, relaxation or flexibility, Outcome 14 Self‐perceived changes in overall health (end of treatment).
Figures and Tables -
Analysis 1.14

Comparison 1 Exercise therapy versus treatment as usual, relaxation or flexibility, Outcome 14 Self‐perceived changes in overall health (end of treatment).

Comparison 1 Exercise therapy versus treatment as usual, relaxation or flexibility, Outcome 15 Self‐perceived changes in overall health (follow‐up).
Figures and Tables -
Analysis 1.15

Comparison 1 Exercise therapy versus treatment as usual, relaxation or flexibility, Outcome 15 Self‐perceived changes in overall health (follow‐up).

Comparison 1 Exercise therapy versus treatment as usual, relaxation or flexibility, Outcome 16 Health resource use (follow‐up) [Mean no. of contacts].
Figures and Tables -
Analysis 1.16

Comparison 1 Exercise therapy versus treatment as usual, relaxation or flexibility, Outcome 16 Health resource use (follow‐up) [Mean no. of contacts].

Comparison 1 Exercise therapy versus treatment as usual, relaxation or flexibility, Outcome 17 Health resource use (follow‐up) [No. of users].
Figures and Tables -
Analysis 1.17

Comparison 1 Exercise therapy versus treatment as usual, relaxation or flexibility, Outcome 17 Health resource use (follow‐up) [No. of users].

Comparison 1 Exercise therapy versus treatment as usual, relaxation or flexibility, Outcome 18 Drop‐out.
Figures and Tables -
Analysis 1.18

Comparison 1 Exercise therapy versus treatment as usual, relaxation or flexibility, Outcome 18 Drop‐out.

Comparison 1 Exercise therapy versus treatment as usual, relaxation or flexibility, Outcome 19 Subgroup analysis for fatigue.
Figures and Tables -
Analysis 1.19

Comparison 1 Exercise therapy versus treatment as usual, relaxation or flexibility, Outcome 19 Subgroup analysis for fatigue.

Comparison 2 Exercise therapy versus psychological treatment, Outcome 1 Fatigue at end of treatment (FS; 11 items/0 to 33 points).
Figures and Tables -
Analysis 2.1

Comparison 2 Exercise therapy versus psychological treatment, Outcome 1 Fatigue at end of treatment (FS; 11 items/0 to 33 points).

Comparison 2 Exercise therapy versus psychological treatment, Outcome 2 Fatigue at follow‐up (FSS; 1 to 7 points).
Figures and Tables -
Analysis 2.2

Comparison 2 Exercise therapy versus psychological treatment, Outcome 2 Fatigue at follow‐up (FSS; 1 to 7 points).

Comparison 2 Exercise therapy versus psychological treatment, Outcome 3 Fatigue at follow‐up (FS; 11 items/0 to 33 points).
Figures and Tables -
Analysis 2.3

Comparison 2 Exercise therapy versus psychological treatment, Outcome 3 Fatigue at follow‐up (FS; 11 items/0 to 33 points).

Comparison 2 Exercise therapy versus psychological treatment, Outcome 4 Participants with serious adverse reactions.
Figures and Tables -
Analysis 2.4

Comparison 2 Exercise therapy versus psychological treatment, Outcome 4 Participants with serious adverse reactions.

Comparison 2 Exercise therapy versus psychological treatment, Outcome 5 Pain at follow‐up (BPI, pain severity subscale; 0 to 10 points).
Figures and Tables -
Analysis 2.5

Comparison 2 Exercise therapy versus psychological treatment, Outcome 5 Pain at follow‐up (BPI, pain severity subscale; 0 to 10 points).

Comparison 2 Exercise therapy versus psychological treatment, Outcome 6 Pain at follow‐up (BPI, pain interference subscale; 0 to 10 points).
Figures and Tables -
Analysis 2.6

Comparison 2 Exercise therapy versus psychological treatment, Outcome 6 Pain at follow‐up (BPI, pain interference subscale; 0 to 10 points).

Comparison 2 Exercise therapy versus psychological treatment, Outcome 7 Physical functioning at end of treatment (SF‐36, physical functioning subscale; 0 to 100 points).
Figures and Tables -
Analysis 2.7

Comparison 2 Exercise therapy versus psychological treatment, Outcome 7 Physical functioning at end of treatment (SF‐36, physical functioning subscale; 0 to 100 points).

Comparison 2 Exercise therapy versus psychological treatment, Outcome 8 Physical functioning at follow‐up (SF‐36, physical functioning subscale; 0 to 100 points).
Figures and Tables -
Analysis 2.8

Comparison 2 Exercise therapy versus psychological treatment, Outcome 8 Physical functioning at follow‐up (SF‐36, physical functioning subscale; 0 to 100 points).

Comparison 2 Exercise therapy versus psychological treatment, Outcome 9 Depression at end of treatment (HADS depression score; 7 items/21 points).
Figures and Tables -
Analysis 2.9

Comparison 2 Exercise therapy versus psychological treatment, Outcome 9 Depression at end of treatment (HADS depression score; 7 items/21 points).

Comparison 2 Exercise therapy versus psychological treatment, Outcome 10 Depression at follow‐up (BDI; 0 to 63 points).
Figures and Tables -
Analysis 2.10

Comparison 2 Exercise therapy versus psychological treatment, Outcome 10 Depression at follow‐up (BDI; 0 to 63 points).

Comparison 2 Exercise therapy versus psychological treatment, Outcome 11 Depression at follow‐up (HADS depression score; 7 items/21 points).
Figures and Tables -
Analysis 2.11

Comparison 2 Exercise therapy versus psychological treatment, Outcome 11 Depression at follow‐up (HADS depression score; 7 items/21 points).

Comparison 2 Exercise therapy versus psychological treatment, Outcome 12 Anxiety at end of treatment (HADS anxiety; 7 items/21 points).
Figures and Tables -
Analysis 2.12

Comparison 2 Exercise therapy versus psychological treatment, Outcome 12 Anxiety at end of treatment (HADS anxiety; 7 items/21 points).

Comparison 2 Exercise therapy versus psychological treatment, Outcome 13 Anxiety at follow‐up (BAI; 0 to 63 points).
Figures and Tables -
Analysis 2.13

Comparison 2 Exercise therapy versus psychological treatment, Outcome 13 Anxiety at follow‐up (BAI; 0 to 63 points).

Comparison 2 Exercise therapy versus psychological treatment, Outcome 14 Anxiety at follow‐up (HADS anxiety; 7 items/21 points).
Figures and Tables -
Analysis 2.14

Comparison 2 Exercise therapy versus psychological treatment, Outcome 14 Anxiety at follow‐up (HADS anxiety; 7 items/21 points).

Comparison 2 Exercise therapy versus psychological treatment, Outcome 15 Sleep at end of treatment (Jenkins Sleep Scale; 0 to 20 points).
Figures and Tables -
Analysis 2.15

Comparison 2 Exercise therapy versus psychological treatment, Outcome 15 Sleep at end of treatment (Jenkins Sleep Scale; 0 to 20 points).

Comparison 2 Exercise therapy versus psychological treatment, Outcome 16 Sleep at follow‐up (Jenkins Sleep Scale; 0 to 20 points).
Figures and Tables -
Analysis 2.16

Comparison 2 Exercise therapy versus psychological treatment, Outcome 16 Sleep at follow‐up (Jenkins Sleep Scale; 0 to 20 points).

Comparison 2 Exercise therapy versus psychological treatment, Outcome 17 Self‐perceived changes in overall health at end of treatment.
Figures and Tables -
Analysis 2.17

Comparison 2 Exercise therapy versus psychological treatment, Outcome 17 Self‐perceived changes in overall health at end of treatment.

Comparison 2 Exercise therapy versus psychological treatment, Outcome 18 Self‐perceived changes in overall health at follow‐up.
Figures and Tables -
Analysis 2.18

Comparison 2 Exercise therapy versus psychological treatment, Outcome 18 Self‐perceived changes in overall health at follow‐up.

Comparison 2 Exercise therapy versus psychological treatment, Outcome 19 Health resource use (follow‐up) [Mean no. of contacts].
Figures and Tables -
Analysis 2.19

Comparison 2 Exercise therapy versus psychological treatment, Outcome 19 Health resource use (follow‐up) [Mean no. of contacts].

Comparison 2 Exercise therapy versus psychological treatment, Outcome 20 Health resource use (follow‐up) [No. of users].
Figures and Tables -
Analysis 2.20

Comparison 2 Exercise therapy versus psychological treatment, Outcome 20 Health resource use (follow‐up) [No. of users].

Comparison 2 Exercise therapy versus psychological treatment, Outcome 21 Drop‐out.
Figures and Tables -
Analysis 2.21

Comparison 2 Exercise therapy versus psychological treatment, Outcome 21 Drop‐out.

Comparison 3 Exercise therapy versus adaptive pacing, Outcome 1 Fatigue.
Figures and Tables -
Analysis 3.1

Comparison 3 Exercise therapy versus adaptive pacing, Outcome 1 Fatigue.

Comparison 3 Exercise therapy versus adaptive pacing, Outcome 2 Participants with serious adverse reactions.
Figures and Tables -
Analysis 3.2

Comparison 3 Exercise therapy versus adaptive pacing, Outcome 2 Participants with serious adverse reactions.

Comparison 3 Exercise therapy versus adaptive pacing, Outcome 3 Physical functioning.
Figures and Tables -
Analysis 3.3

Comparison 3 Exercise therapy versus adaptive pacing, Outcome 3 Physical functioning.

Comparison 3 Exercise therapy versus adaptive pacing, Outcome 4 Depression.
Figures and Tables -
Analysis 3.4

Comparison 3 Exercise therapy versus adaptive pacing, Outcome 4 Depression.

Comparison 3 Exercise therapy versus adaptive pacing, Outcome 5 Anxiety.
Figures and Tables -
Analysis 3.5

Comparison 3 Exercise therapy versus adaptive pacing, Outcome 5 Anxiety.

Comparison 3 Exercise therapy versus adaptive pacing, Outcome 6 Sleep.
Figures and Tables -
Analysis 3.6

Comparison 3 Exercise therapy versus adaptive pacing, Outcome 6 Sleep.

Comparison 3 Exercise therapy versus adaptive pacing, Outcome 7 Self‐perceived changes in overall health.
Figures and Tables -
Analysis 3.7

Comparison 3 Exercise therapy versus adaptive pacing, Outcome 7 Self‐perceived changes in overall health.

Comparison 3 Exercise therapy versus adaptive pacing, Outcome 8 Health resource use (follow‐up) [Mean no. of contacts].
Figures and Tables -
Analysis 3.8

Comparison 3 Exercise therapy versus adaptive pacing, Outcome 8 Health resource use (follow‐up) [Mean no. of contacts].

Comparison 3 Exercise therapy versus adaptive pacing, Outcome 9 Health resource use (follow‐up) [No. of users].
Figures and Tables -
Analysis 3.9

Comparison 3 Exercise therapy versus adaptive pacing, Outcome 9 Health resource use (follow‐up) [No. of users].

Comparison 3 Exercise therapy versus adaptive pacing, Outcome 10 Drop‐out.
Figures and Tables -
Analysis 3.10

Comparison 3 Exercise therapy versus adaptive pacing, Outcome 10 Drop‐out.

Comparison 4 Exercise therapy + antidepressant placebo versus antidepressant + exercise placebo, Outcome 1 Fatigue.
Figures and Tables -
Analysis 4.1

Comparison 4 Exercise therapy + antidepressant placebo versus antidepressant + exercise placebo, Outcome 1 Fatigue.

Comparison 4 Exercise therapy + antidepressant placebo versus antidepressant + exercise placebo, Outcome 2 Depression.
Figures and Tables -
Analysis 4.2

Comparison 4 Exercise therapy + antidepressant placebo versus antidepressant + exercise placebo, Outcome 2 Depression.

Comparison 4 Exercise therapy + antidepressant placebo versus antidepressant + exercise placebo, Outcome 3 Drop‐out.
Figures and Tables -
Analysis 4.3

Comparison 4 Exercise therapy + antidepressant placebo versus antidepressant + exercise placebo, Outcome 3 Drop‐out.

Comparison 5 Exercise therapy + antidepressant versus antidepressant + exercise placebo, Outcome 1 Fatigue.
Figures and Tables -
Analysis 5.1

Comparison 5 Exercise therapy + antidepressant versus antidepressant + exercise placebo, Outcome 1 Fatigue.

Comparison 5 Exercise therapy + antidepressant versus antidepressant + exercise placebo, Outcome 2 Depression.
Figures and Tables -
Analysis 5.2

Comparison 5 Exercise therapy + antidepressant versus antidepressant + exercise placebo, Outcome 2 Depression.

Comparison 5 Exercise therapy + antidepressant versus antidepressant + exercise placebo, Outcome 3 Drop‐out.
Figures and Tables -
Analysis 5.3

Comparison 5 Exercise therapy + antidepressant versus antidepressant + exercise placebo, Outcome 3 Drop‐out.

Exercise therapy for chronic fatigue syndrome

Patient or population: males and females over 18 years of age with chronic fatigue syndrome

Intervention: exercise therapy

Comparison: standard care, waiting list or relaxation/flexibility

Outcomes

Illustrative comparative risks* (95% CI)

Relative effect
(95% CI)

Number of participants
(studies)

Quality of the evidence
(GRADE)

Comments

Assumed risk

Corresponding risk

Control

Exercise

Fatiguea: FS, Fatigue Scale (0 to 11 points)

(end of treatment)

Mean fatigue in the control groups was 10.4 points

Mean fatigue in the intervention groups was
6.06 points lower (6.95 to 5.17 lower)

148
(1 study)

⊕⊕⊝⊝
Lowb,c

Lower score indicates less fatigue

Fatiguea: FS, Fatigue Scale (0 to 33 points)

(end of treatment)

Mean fatigue ranged across control groups from 15.3 to 26.3 points

Mean fatigue in the intervention groups was
2.82 points lower (4.07 to 1.57 lower)

540
(3 studies)

⊕⊕⊕⊝
Moderateb

Lower score indicates less fatigue

Fatiguea: FS, Fatigue Scale (0 to 42 points)

(end of treatment)

Mean fatigue ranged across control groups from 24.4 to 31.6 points

Mean fatigue in the intervention groups was 6.80 points lower (10.31 to 3.28 lower)

152
(3 studies)

⊕⊕⊕⊝
Moderateb

Lower score indicates less fatigue

Participants with serious adverse reactions

Study population

RR 0.99 (0.14 to 6.97)

319
(1 study)

⊕⊕⊕⊝
Moderated,e

13 per 1000

12 per 1000
(2 to 87)

Quality of Life (QOL) Scale (16 to 112 points)

(follow‐up)

Mean QOL score in the control group was 72 points

Mean QOL score in the intervention groups was 9.00 points lower (19.00 lower to 1.00 higher)

44
(1 study)

⊕⊝⊝⊝
Very lowb,f

Higher score indicates improved QOL

Physical functioning: SF‐36 subscale (0 to 100 points)

(end of treatment)

Mean physical functioning score ranged from 31.1 to 55.2 points across control groups

Mean physical functioning score in the intervention groups was 13.10 points higher (1.98 to 24.22 higher)

725
(5 studies)

⊕⊕⊝⊝

Lowb,g

Higher score indicates improved physical function

Depression: HADS depression score (0 to 21 points)

(end of treatment)

Mean depression score ranged across control groups from 5.2 to 11.2 points

Mean depression score in the intervention groups was 1.63 points lower (3.50 lower to 0.23 higher)

504
(5 studies)

⊕⊝⊝⊝
Very lowb,g,h

Lower score indicates fewer depressive symptoms

Sleep: Jenkins Sleep Scale (0 to 20 points)

(end of treatment)

Mean sleep score ranged across control groups from 11.7 to 12.2 points

Mean sleep score in the intervention groups was
1.49 points lower (2.95 to 0.02 lower)

323
(2 studies)

⊕⊕⊝⊝
Lowb,h

Lower score indicates improved sleep quality

Self‐perceived changes in overall health

(end of treatment)

Study population

RR 1.83 (1.39 to 2.40)

489
(4 studies)

⊕⊕⊕⊝
Moderateb

RR higher than 1 means that more participants in exercise groups reported improvement

218 per 1000

399 per 1000
(303 to 523)

Medium‐risk population

238 per 1000

436 per 1000
(331 to 571)

Drop‐out

(end of treatment)

Study population

RR 1.63 (0.77 to 3.43)

843

(6 studies)

⊕⊕⊝⊝
Lowb,g

RR higher than 1 means that more participants in exercise groups dropped out from treatment

70 per 1000

114 per 1000

(54 to 241)

Medium‐risk population

89 per 1000

145 per 1000

(69 to 305)

*The basis for the assumed risk (e.g. median control group risk across studies) is provided in footnotes. The corresponding risk (and its 95% confidence interval) is based on the assumed risk in the comparison group and the relative effect of the intervention (and its 95% CI).
CI: Confidence interval; RR: Risk ratio.

GRADE Working Group grades of evidence.
High quality: Further research is very unlikely to change our confidence in the estimate of effect.
Moderate quality: Further research is likely to have an important impact on our confidence in the estimate of effect and may change the estimate.
Low quality: Further research is very likely to have an important impact on our confidence in the estimate of effect and is likely to change the estimate.
Very low quality: We are very uncertain about the estimate.

aWe choose to present effect estimates as measured on the original scales rather than to transform them to standardised units. As 3 different scoring systems for fatigue were used, the outcome is presented over 3 rows.

bRisk of bias (‐1): All studies were at risk of performance bias, as they were unblinded.
cInconsistency (‐1): shows inconsistencies with other available trials when meta‐analysis based on standardised mean differences is performed. Subgroup analyses could not explain variation due to diagnostic criteria, treatment strategy or type of control.
dRisk of bias (0): This outcome is unlikely to have been affected by detection or performance bias.
eImprecision (‐1): low numbers of events and wide confidence intervals.
fImprecision (‐2): very low numbers of participants and wide confidence intervals, which encompass benefit and harm.
gInconsistency (‐1): variation in effect size and direction of effect across available studies.
hImprecision (‐1): Confidence interval fails to exclude negligible differences in favour of the intervention.

Figures and Tables -
Table 1. Study demographics

Study ID

N

Gender

Duration of illness

Depression co‐morbidity

Use of antidepressants (ADs)

Work and employment status

Fulcher 1997

66

49F/17M

65% female

2.7 years

20 (30%) possible cases of depression (HADS)

30 (45%) on full‐dose AD (n = 20) or low‐dose AD (n = 10)

26 (39%) working or studying at least part time

Jason 2007

114

95F/19M

83% female

> 5.0 years

44 (39%) with a current Axis I disorder

(depression and anxiety most common)

Not stated

52 (46%) working or studying at least part time, 24% unemployed, 6% retired, 25% on disability

Moss‐Morris 2005

49

34F/15M

69% female

3.1 years

14 (29%) possible or probable cases of depression (HADS)

Not stated

11 (22%) were unemployed and were unable to work because of disability

Powell 2001

148

116F/32M

78% female

4.3 years

58 (39%) possible or probable cases of depression (HADS)

27 (18%) used AD

50 (34%) were working, 64 (43%) were on disability

Wallman 2004

61

47F/14M

77% female

Not stated

Not stated

16 (26%) used AD

Not stated

Wearden 1998

136

97F/39M

71% female

2.3 years

46 (34%) with depressive disorder according to DSM‐III‐R criteria

Not stated

114 (84%) had recently changed occupation

Wearden 2010

296

230F/66M

78% female

7.0 years

53 (18%) had a depression diagnosis

160 (54%) were prescribed AD in the past 6 months

Not stated

White 2011

641

495F/146M

77% female

2.7 years

219 (34%) with any depressive disorder

260 (41%) used AD

Not stated

Figures and Tables -
Table 1. Study demographics
Table 2. Characteristics of exercise interventions

Study ID

Deliverer of intervention

Explanation and materials

Type of exercise

Schedule therapist

Schedule home

Duration of activity

Initial exercise level

Increment steps

Participant self‐monitoring

Criteria for (non)‐increment

Fulcher 1997

Exercise physiologist

Verbal explanation of deconditioning and reconditioning

Walking (encouraged to take other modes such as cycling and swimming)

Weekly

(1 hour), talking only

5 days/wk

5 to 15 minutes increasing to 30 minutes/d

5 to 15 minutes at 40% of peak O2 consumption

(target HR of resting + 50% of HRR)

Duration increased 1 to 2 minutes per week up to 30 minutes; then intensity increased

Ambulatory heart rate monitors

If increased fatigue, continue at the same level for an extra week

Wearden 1998

Physiotherapist,

fitness focus

Minimal explanation; no written materials

Preferred activity

(walking/jogging, some did cycling, swimming)

At week 0, 1, 2, 4, 8, 12*, 20, 26*,

talking only

(*evaluation visits)

3 days/wk

20 minutes

75% of VO2max from bike test

Intensity increased

Borg Exertion Scale chart, before and after HR

Increase if:
10 beats/min drop post exercise and 2‐point drop in Borg Scale score

Powell 2001

Senior clinical therapist

Explanations for GET, circadian dysrhythmia, deconditioning, sleep

"educational information pack"

Aerobic exercise;

own choice but mostly exercise bike

9 face‐to‐face

(1.5 hours each)

Tailored

Tailored to functional abilities

Tailored to functional abilities: “a level which you are capable of doing on a BAD DAY”

Varying daily increase (e.g. "5 second increase each day for the rest of the second week"

to 30 minutes twice/d

Duration of exercise

Discouraged, but restart at lower level and rapidly reincrease

Wallman 2004

Single physical therapist

Small laminated Borg Scale and heart rate monitor

Walking/jogging, swimming or cycling

Phone contact every 2 weeks

Every second day

From 5 to 15 minutes, increasing to 30 minutes

Initial exercise duration was between 5 and 15 minutes, and intensity was based on the mean HR value achieved midpoint during submaximal exercise tests 

Duration increased by 2 to 5 minutes/2 wk

Heart rate monitoring,

Borg Exertion Scale

Keep Borg within 11 to 14. Adjust every 2 weeks. Average peak HR when exercising comfortably at a typical day represents patient’s target heart rate (± 3 bpm) for future sessions

Moss‐Morris 2005

Health psychology MSc student, researcher

Focused on the "downward spiral of activity reduction, deconditioning"

Walking (but could also do other preferred exercise, e.g. jogging, swimming)

Weekly for 12 weeks, talking only

4 to 5 days/wk

Set collaboratively approx 5 to 15 minutes

HR at 40% of VO2max

Duration 3 to 5 minutes/wk

Intensity increased after 6 weeks 5 bpm/wk

Ambulatory heart rate monitors

If increased fatigue, continue at the same level for an extra week

Jason 2007

Registered nurses supervised by exercise physiologist

"Behavioral goals explained, energy system education, redefining exercise"

"individualized, constructive and pleasurable activities"

Every 2 weeks

(45 minutes),

13 sessions

3 per week

Tailored

Flexibility tests

Strength test (hand grip)

"Gradually increasing anaerobic activity levels"

Self‐monitoring daily exercise diary

New targets only after habituation, or if goals achieved for 2 weeks

Wearden 2010

Nurses with 16 half‐days of training and supervision

Explanation of physiological symptoms and training in first session

Wide choice: walking, stairs, bicycle, dance, jog

10 sessions over 18 weeks

Several times per day

First 90 minutes, then alternating 60 and 30 minutes

Determined collaboratively with the participant

"Increased very gradually," examples show 50% increase per day

Diary of progress on exercise programme, with note of daily activities

On "bad days," try to do same as day before

White 2011

Exercise therapist/physiotherapist

(8 to 10 days training + ongoing supervision)

142‐page manual:

benefits of exercise

and "how to" of GET; some got pedometers

Wide choice: walking, cycling, swimming, Tai Chi.

Aim to build into daily activities

Weekly × 4, then

fortnightly;

total of 15 sessions

5 to 6 days/wk

Negotiated, goal to get to 30 minutes per session

Test of fitness (step test. and 6‐minute walking test),

perceived physical exertion, actigraphy data

"20% increases" per fortnight; increase duration to 30 minutes, then increase intensity

Exercise diary + Borg scale +

“Use non‐symptoms to monitor” and

heart rate monitor

(for intensity increases)

Do not increase if global increase in symptoms

© 9. March 2012, Paul Glasziou, Bond University, Australia

Figures and Tables -
Table 2. Characteristics of exercise interventions
Comparison 1. Exercise therapy versus treatment as usual, relaxation or flexibility

Outcome or subgroup title

No. of studies

No. of participants

Statistical method

Effect size

1 Fatigue (end of treatment) Show forest plot

7

Mean Difference (IV, Random, 95% CI)

Subtotals only

1.1 Fatigue Scale, FS (11 items/0 to 11 points)

1

148

Mean Difference (IV, Random, 95% CI)

‐6.06 [‐6.95, ‐5.17]

1.2 Fatigue Scale, FS (11 items/0 to 33 points)

3

540

Mean Difference (IV, Random, 95% CI)

‐2.82 [‐4.07, ‐1.57]

1.3 Fatigue Scale, FS (14 items/0 to 42 points)

3

152

Mean Difference (IV, Random, 95% CI)

‐6.80 [‐10.31, ‐3.28]

2 Fatigue (follow‐up) Show forest plot

4

Mean Difference (IV, Random, 95% CI)

Subtotals only

2.1 Fatigue Scale, FS (11 items/0 to 11 points)

1

148

Mean Difference (IV, Random, 95% CI)

‐7.13 [‐7.97, ‐6.29]

2.2 Fatigue Scale, FS (11 items/0 to 33 points)

2

472

Mean Difference (IV, Random, 95% CI)

‐2.87 [‐4.18, ‐1.55]

2.3 Fatigue Severity Scale, FSS (9 items/1 to 7 points)

1

50

Mean Difference (IV, Random, 95% CI)

0.15 [‐0.55, 0.85]

3 Participants with serious adverse reactions Show forest plot

1

Risk Ratio (M‐H, Random, 95% CI)

Totals not selected

4 Pain (follow‐up) Show forest plot

1

Mean Difference (IV, Random, 95% CI)

Totals not selected

4.1 Brief Pain Inventory, pain severity subscale (0 to 10 points)

1

Mean Difference (IV, Random, 95% CI)

0.0 [0.0, 0.0]

4.2 Brief Pain Inventory, pain interference subscale (0 to 10 points)

1

Mean Difference (IV, Random, 95% CI)

0.0 [0.0, 0.0]

5 Physical functioning (end of treatment) Show forest plot

5

Mean Difference (IV, Random, 95% CI)

Subtotals only

5.1 SF‐36, physical functioning subscale (0 to 100 points)

5

725

Mean Difference (IV, Random, 95% CI)

‐13.10 [‐24.22, ‐1.98]

6 Physical functioning (follow‐up) Show forest plot

3

Mean Difference (IV, Random, 95% CI)

Subtotals only

6.1 SF‐36, physical functioning subscale (0 to 100 points)

3

621

Mean Difference (IV, Random, 95% CI)

‐16.33 [‐36.74, 4.08]

7 Quality of life (follow‐up) Show forest plot

1

Mean Difference (IV, Random, 95% CI)

Totals not selected

7.1 Quality of Life Scale (16 to 112 points)

1

Mean Difference (IV, Random, 95% CI)

0.0 [0.0, 0.0]

8 Depression (end of treatment) Show forest plot

5

Mean Difference (IV, Random, 95% CI)

Subtotals only

8.1 HADS, depression score (7 items/21 points)

5

504

Mean Difference (IV, Random, 95% CI)

‐1.63 [‐3.50, 0.23]

9 Depression (follow‐up) Show forest plot

4

Mean Difference (IV, Random, 95% CI)

Subtotals only

9.1 Beck Depression Inventory (0 to 63 points)

1

45

Mean Difference (IV, Random, 95% CI)

3.44 [‐1.00, 9.88]

9.2 HADS, depression subscale (0 to 21 points)

3

609

Mean Difference (IV, Random, 95% CI)

‐2.26 [‐5.09, 0.56]

10 Anxiety (end of treatment) Show forest plot

3

Mean Difference (IV, Random, 95% CI)

Subtotals only

10.1 HADS, anxiety score (0 to 21 points)

3

387

Mean Difference (IV, Random, 95% CI)

‐1.48 [‐3.58, 0.61]

11 Anxiety (follow‐up) Show forest plot

4

Mean Difference (IV, Random, 95% CI)

Subtotals only

11.1 Beck Anxiety Inventory (0 to 63 points)

1

45

Mean Difference (IV, Random, 95% CI)

0.70 [‐4.52, 5.92]

11.2 HADS, anxiety score (0 to 21 points)

3

607

Mean Difference (IV, Random, 95% CI)

‐1.01 [‐2.75, 0.74]

12 Sleep (end of treatment) Show forest plot

2

Mean Difference (IV, Random, 95% CI)

Subtotals only

12.1 Jenkins Sleep Scale (0 to 20 points)

2

323

Mean Difference (IV, Random, 95% CI)

‐1.49 [‐2.95, ‐0.02]

13 Sleep (follow‐up) Show forest plot

3

Mean Difference (IV, Random, 95% CI)

Subtotals only

13.1 Jenkins Sleep Scale (0 to 20 points)

3

610

Mean Difference (IV, Random, 95% CI)

‐2.04 [‐3.84, ‐0.23]

14 Self‐perceived changes in overall health (end of treatment) Show forest plot

4

489

Risk Ratio (M‐H, Random, 95% CI)

1.83 [1.39, 2.40]

15 Self‐perceived changes in overall health (follow‐up) Show forest plot

3

518

Risk Ratio (M‐H, Random, 95% CI)

1.88 [0.76, 4.64]

16 Health resource use (follow‐up) [Mean no. of contacts] Show forest plot

1

Mean Difference (IV, Random, 95% CI)

Totals not selected

16.1 Primary care

1

Mean Difference (IV, Random, 95% CI)

0.0 [0.0, 0.0]

16.2 Other doctor

1

Mean Difference (IV, Random, 95% CI)

0.0 [0.0, 0.0]

16.3 Healthcare professional

1

Mean Difference (IV, Random, 95% CI)

0.0 [0.0, 0.0]

16.4 Inpatient

1

Mean Difference (IV, Random, 95% CI)

0.0 [0.0, 0.0]

16.5 Accident and emergency

1

Mean Difference (IV, Random, 95% CI)

0.0 [0.0, 0.0]

16.6 Other health/social services

1

Mean Difference (IV, Random, 95% CI)

0.0 [0.0, 0.0]

16.7 Complementary health care

1

Mean Difference (IV, Random, 95% CI)

0.0 [0.0, 0.0]

16.8 Standardised medical care

1

Mean Difference (IV, Random, 95% CI)

0.0 [0.0, 0.0]

17 Health resource use (follow‐up) [No. of users] Show forest plot

1

Risk Ratio (M‐H, Random, 95% CI)

Totals not selected

17.1 Primary care

1

Risk Ratio (M‐H, Random, 95% CI)

0.0 [0.0, 0.0]

17.2 Other doctor

1

Risk Ratio (M‐H, Random, 95% CI)

0.0 [0.0, 0.0]

17.3 Healthcare professional

1

Risk Ratio (M‐H, Random, 95% CI)

0.0 [0.0, 0.0]

17.4 Inpatient

1

Risk Ratio (M‐H, Random, 95% CI)

0.0 [0.0, 0.0]

17.5 Accident and emergency

1

Risk Ratio (M‐H, Random, 95% CI)

0.0 [0.0, 0.0]

17.6 Medication

1

Risk Ratio (M‐H, Random, 95% CI)

0.0 [0.0, 0.0]

17.7 Complementary health care

1

Risk Ratio (M‐H, Random, 95% CI)

0.0 [0.0, 0.0]

17.8 Other health/social services

1

Risk Ratio (M‐H, Random, 95% CI)

0.0 [0.0, 0.0]

17.9 Standardised medical care

1

Risk Ratio (M‐H, Random, 95% CI)

0.0 [0.0, 0.0]

18 Drop‐out Show forest plot

6

843

Risk Ratio (M‐H, Random, 95% CI)

1.63 [0.77, 3.43]

19 Subgroup analysis for fatigue Show forest plot

7

840

Std. Mean Difference (IV, Random, 95% CI)

‐0.68 [‐1.02, ‐0.35]

19.1 Graded exercise therapy

6

779

Std. Mean Difference (IV, Random, 95% CI)

‐0.71 [‐1.09, ‐0.32]

19.2 Exercise with self‐pacing

1

61

Std. Mean Difference (IV, Random, 95% CI)

‐0.54 [‐1.05, ‐0.02]

Figures and Tables -
Comparison 1. Exercise therapy versus treatment as usual, relaxation or flexibility
Comparison 2. Exercise therapy versus psychological treatment

Outcome or subgroup title

No. of studies

No. of participants

Statistical method

Effect size

1 Fatigue at end of treatment (FS; 11 items/0 to 33 points) Show forest plot

2

Mean Difference (IV, Random, 95% CI)

Totals not selected

1.1 CBT

1

Mean Difference (IV, Random, 95% CI)

0.0 [0.0, 0.0]

1.2 Supportive listening

1

Mean Difference (IV, Random, 95% CI)

0.0 [0.0, 0.0]

2 Fatigue at follow‐up (FSS; 1 to 7 points) Show forest plot

1

Mean Difference (IV, Random, 95% CI)

Totals not selected

2.1 CT

1

Mean Difference (IV, Random, 95% CI)

0.0 [0.0, 0.0]

2.2 CBT

1

Mean Difference (IV, Random, 95% CI)

0.0 [0.0, 0.0]

3 Fatigue at follow‐up (FS; 11 items/0 to 33 points) Show forest plot

2

Mean Difference (IV, Random, 95% CI)

Totals not selected

3.1 CBT

1

Mean Difference (IV, Random, 95% CI)

0.0 [0.0, 0.0]

3.2 Supportive listening

1

Mean Difference (IV, Random, 95% CI)

0.0 [0.0, 0.0]

4 Participants with serious adverse reactions Show forest plot

2

Risk Ratio (M‐H, Random, 95% CI)

Totals not selected

4.1 CBT

1

Risk Ratio (M‐H, Random, 95% CI)

0.0 [0.0, 0.0]

4.2 Suportive listening

1

Risk Ratio (M‐H, Random, 95% CI)

0.0 [0.0, 0.0]

5 Pain at follow‐up (BPI, pain severity subscale; 0 to 10 points) Show forest plot

1

Mean Difference (IV, Random, 95% CI)

Totals not selected

5.1 CBT

1

Mean Difference (IV, Random, 95% CI)

0.0 [0.0, 0.0]

5.2 CT

1

Mean Difference (IV, Random, 95% CI)

0.0 [0.0, 0.0]

6 Pain at follow‐up (BPI, pain interference subscale; 0 to 10 points) Show forest plot

1

Mean Difference (IV, Random, 95% CI)

Totals not selected

6.1 CBT

1

Mean Difference (IV, Random, 95% CI)

0.0 [0.0, 0.0]

6.2 CT

1

Mean Difference (IV, Random, 95% CI)

0.0 [0.0, 0.0]

7 Physical functioning at end of treatment (SF‐36, physical functioning subscale; 0 to 100 points) Show forest plot

2

Mean Difference (IV, Random, 95% CI)

Totals not selected

7.1 CBT

1

Mean Difference (IV, Random, 95% CI)

0.0 [0.0, 0.0]

7.2 Supportive listening

1

Mean Difference (IV, Random, 95% CI)

0.0 [0.0, 0.0]

8 Physical functioning at follow‐up (SF‐36, physical functioning subscale; 0 to 100 points) Show forest plot

3

Mean Difference (IV, Random, 95% CI)

Subtotals only

8.1 CBT

2

348

Mean Difference (IV, Random, 95% CI)

7.92 [‐9.79, 25.63]

8.2 CT

1

47

Mean Difference (IV, Random, 95% CI)

21.37 [6.61, 36.13]

8.3 Supportive listening

1

171

Mean Difference (IV, Random, 95% CI)

‐7.55 [‐15.57, 0.47]

9 Depression at end of treatment (HADS depression score; 7 items/21 points) Show forest plot

1

Mean Difference (IV, Random, 95% CI)

Totals not selected

9.1 Supportive listening

1

Mean Difference (IV, Random, 95% CI)

0.0 [0.0, 0.0]

10 Depression at follow‐up (BDI; 0 to 63 points) Show forest plot

1

Mean Difference (IV, Random, 95% CI)

Totals not selected

10.1 CT

1

Mean Difference (IV, Random, 95% CI)

0.0 [0.0, 0.0]

10.2 CBT

1

Mean Difference (IV, Random, 95% CI)

0.0 [0.0, 0.0]

11 Depression at follow‐up (HADS depression score; 7 items/21 points) Show forest plot

2

Mean Difference (IV, Random, 95% CI)

Totals not selected

11.1 CBT

1

Mean Difference (IV, Random, 95% CI)

0.0 [0.0, 0.0]

11.2 Supportive listening

1

Mean Difference (IV, Random, 95% CI)

0.0 [0.0, 0.0]

12 Anxiety at end of treatment (HADS anxiety; 7 items/21 points) Show forest plot

1

Mean Difference (IV, Random, 95% CI)

Totals not selected

12.1 Supportive listening

1

Mean Difference (IV, Random, 95% CI)

0.0 [0.0, 0.0]

13 Anxiety at follow‐up (BAI; 0 to 63 points) Show forest plot

1

Mean Difference (IV, Random, 95% CI)

Totals not selected

13.1 CT

1

Mean Difference (IV, Random, 95% CI)

0.0 [0.0, 0.0]

13.2 CBT

1

Mean Difference (IV, Random, 95% CI)

0.0 [0.0, 0.0]

14 Anxiety at follow‐up (HADS anxiety; 7 items/21 points) Show forest plot

2

Mean Difference (IV, Random, 95% CI)

Totals not selected

14.1 CBT

1

Mean Difference (IV, Random, 95% CI)

0.0 [0.0, 0.0]

14.2 Supportive listening

1

Mean Difference (IV, Random, 95% CI)

0.0 [0.0, 0.0]

15 Sleep at end of treatment (Jenkins Sleep Scale; 0 to 20 points) Show forest plot

1

Mean Difference (IV, Random, 95% CI)

Totals not selected

15.1 Supportive listening

1

Mean Difference (IV, Random, 95% CI)

0.0 [0.0, 0.0]

16 Sleep at follow‐up (Jenkins Sleep Scale; 0 to 20 points) Show forest plot

2

Mean Difference (IV, Random, 95% CI)

Totals not selected

16.1 CBT

1

Mean Difference (IV, Random, 95% CI)

0.0 [0.0, 0.0]

16.2 Supportive listening

1

Mean Difference (IV, Random, 95% CI)

0.0 [0.0, 0.0]

17 Self‐perceived changes in overall health at end of treatment Show forest plot

1

Risk Ratio (M‐H, Random, 95% CI)

Totals not selected

17.1 CBT

1

Risk Ratio (M‐H, Random, 95% CI)

0.0 [0.0, 0.0]

18 Self‐perceived changes in overall health at follow‐up Show forest plot

2

Risk Ratio (M‐H, Random, 95% CI)

Subtotals only

18.1 CT

1

50

Risk Ratio (M‐H, Random, 95% CI)

0.63 [0.36, 1.10]

18.2 CBT

2

368

Risk Ratio (M‐H, Random, 95% CI)

0.71 [0.33, 1.54]

19 Health resource use (follow‐up) [Mean no. of contacts] Show forest plot

1

Mean Difference (IV, Random, 95% CI)

Totals not selected

19.1 Primary care

1

Mean Difference (IV, Random, 95% CI)

0.0 [0.0, 0.0]

19.2 Other doctor

1

Mean Difference (IV, Random, 95% CI)

0.0 [0.0, 0.0]

19.3 Healthcare professional

1

Mean Difference (IV, Random, 95% CI)

0.0 [0.0, 0.0]

19.4 Inpatient

1

Mean Difference (IV, Random, 95% CI)

0.0 [0.0, 0.0]

19.5 Accident and emergency

1

Mean Difference (IV, Random, 95% CI)

0.0 [0.0, 0.0]

19.6 Other health/social services

1

Mean Difference (IV, Random, 95% CI)

0.0 [0.0, 0.0]

19.7 Complementary health care

1

Mean Difference (IV, Random, 95% CI)

0.0 [0.0, 0.0]

19.8 Standardised medical care

1

Mean Difference (IV, Random, 95% CI)

0.0 [0.0, 0.0]

20 Health resource use (follow‐up) [No. of users] Show forest plot

1

Risk Ratio (M‐H, Random, 95% CI)

Totals not selected

20.1 Primary care

1

Risk Ratio (M‐H, Random, 95% CI)

0.0 [0.0, 0.0]

20.2 Other doctor

1

Risk Ratio (M‐H, Random, 95% CI)

0.0 [0.0, 0.0]

20.3 Healthcare professional

1

Risk Ratio (M‐H, Random, 95% CI)

0.0 [0.0, 0.0]

20.4 Inpatient

1

Risk Ratio (M‐H, Random, 95% CI)

0.0 [0.0, 0.0]

20.5 Accident and emergency

1

Risk Ratio (M‐H, Random, 95% CI)

0.0 [0.0, 0.0]

20.6 Medication

1

Risk Ratio (M‐H, Random, 95% CI)

0.0 [0.0, 0.0]

20.7 Complementary health care

1

Risk Ratio (M‐H, Random, 95% CI)

0.0 [0.0, 0.0]

20.8 Other health/social services

1

Risk Ratio (M‐H, Random, 95% CI)

0.0 [0.0, 0.0]

20.9 Standardised medical care

1

Risk Ratio (M‐H, Random, 95% CI)

0.0 [0.0, 0.0]

21 Drop‐out Show forest plot

2

Risk Ratio (M‐H, Random, 95% CI)

Totals not selected

21.1 CBT

1

Risk Ratio (M‐H, Random, 95% CI)

0.0 [0.0, 0.0]

21.2 Supportive listening

1

Risk Ratio (M‐H, Random, 95% CI)

0.0 [0.0, 0.0]

Figures and Tables -
Comparison 2. Exercise therapy versus psychological treatment
Comparison 3. Exercise therapy versus adaptive pacing

Outcome or subgroup title

No. of studies

No. of participants

Statistical method

Effect size

1 Fatigue Show forest plot

1

Mean Difference (IV, Random, 95% CI)

Totals not selected

1.1 Fatigue Scale, FS (11 items/33 points)—end of treatment

1

Mean Difference (IV, Random, 95% CI)

0.0 [0.0, 0.0]

1.2 Fatigue Scale, FS (11 items/33 points)—follow‐up

1

Mean Difference (IV, Random, 95% CI)

0.0 [0.0, 0.0]

2 Participants with serious adverse reactions Show forest plot

1

Risk Ratio (M‐H, Random, 95% CI)

Totals not selected

3 Physical functioning Show forest plot

1

Mean Difference (IV, Random, 95% CI)

Totals not selected

3.1 SF‐36, physical functioning subscale (0 to 100)—end of treatment

1

Mean Difference (IV, Random, 95% CI)

0.0 [0.0, 0.0]

3.2 SF‐36, physical functioning subscale (0 to 100)—follow‐up

1

Mean Difference (IV, Random, 95% CI)

0.0 [0.0, 0.0]

4 Depression Show forest plot

1

Mean Difference (IV, Random, 95% CI)

Totals not selected

4.1 HADS, depression score (7 items/21 points)—follow‐up

1

Mean Difference (IV, Random, 95% CI)

0.0 [0.0, 0.0]

5 Anxiety Show forest plot

1

Mean Difference (IV, Random, 95% CI)

Totals not selected

5.1 HADS, anxiety score (0 to 21 points)—follow‐up

1

Mean Difference (IV, Random, 95% CI)

0.0 [0.0, 0.0]

6 Sleep Show forest plot

1

Mean Difference (IV, Random, 95% CI)

Totals not selected

6.1 Jenkins Sleep Scale (0 to 20 points)—follow‐up

1

Mean Difference (IV, Random, 95% CI)

0.0 [0.0, 0.0]

7 Self‐perceived changes in overall health Show forest plot

1

Risk Ratio (M‐H, Random, 95% CI)

Totals not selected

7.1 End of treatment

1

Risk Ratio (M‐H, Random, 95% CI)

0.0 [0.0, 0.0]

7.2 Follow‐up

1

Risk Ratio (M‐H, Random, 95% CI)

0.0 [0.0, 0.0]

8 Health resource use (follow‐up) [Mean no. of contacts] Show forest plot

1

Mean Difference (IV, Random, 95% CI)

Totals not selected

8.1 Primary care

1

Mean Difference (IV, Random, 95% CI)

0.0 [0.0, 0.0]

8.2 Other doctor

1

Mean Difference (IV, Random, 95% CI)

0.0 [0.0, 0.0]

8.3 Healthcare professional

1

Mean Difference (IV, Random, 95% CI)

0.0 [0.0, 0.0]

8.4 Inpatient

1

Mean Difference (IV, Random, 95% CI)

0.0 [0.0, 0.0]

8.5 Accident and emergency

1

Mean Difference (IV, Random, 95% CI)

0.0 [0.0, 0.0]

8.6 Other health/social services

1

Mean Difference (IV, Random, 95% CI)

0.0 [0.0, 0.0]

8.7 Complementary health care

1

Mean Difference (IV, Random, 95% CI)

0.0 [0.0, 0.0]

8.8 Standardised medical care

1

Mean Difference (IV, Random, 95% CI)

0.0 [0.0, 0.0]

9 Health resource use (follow‐up) [No. of users] Show forest plot

1

Risk Ratio (M‐H, Random, 95% CI)

Totals not selected

9.1 Primary care

1

Risk Ratio (M‐H, Random, 95% CI)

0.0 [0.0, 0.0]

9.2 Other doctor

1

Risk Ratio (M‐H, Random, 95% CI)

0.0 [0.0, 0.0]

9.3 Healthcare professional

1

Risk Ratio (M‐H, Random, 95% CI)

0.0 [0.0, 0.0]

9.4 Inpatient

1

Risk Ratio (M‐H, Random, 95% CI)

0.0 [0.0, 0.0]

9.5 Accident and emergency

1

Risk Ratio (M‐H, Random, 95% CI)

0.0 [0.0, 0.0]

9.6 Medication

1

Risk Ratio (M‐H, Random, 95% CI)

0.0 [0.0, 0.0]

9.7 Complementary health care

1

Risk Ratio (M‐H, Random, 95% CI)

0.0 [0.0, 0.0]

9.8 Other health/social services

1

Risk Ratio (M‐H, Random, 95% CI)

0.0 [0.0, 0.0]

9.9 Standardised medical care

1

Risk Ratio (M‐H, Random, 95% CI)

0.0 [0.0, 0.0]

10 Drop‐out Show forest plot

1

Risk Ratio (M‐H, Random, 95% CI)

Totals not selected

Figures and Tables -
Comparison 3. Exercise therapy versus adaptive pacing
Comparison 4. Exercise therapy + antidepressant placebo versus antidepressant + exercise placebo

Outcome or subgroup title

No. of studies

No. of participants

Statistical method

Effect size

1 Fatigue Show forest plot

1

Mean Difference (IV, Random, 95% CI)

Totals not selected

1.1 Fatigue Scale, FS (14 items/0 to 42 points)—end of treatment

1

Mean Difference (IV, Random, 95% CI)

0.0 [0.0, 0.0]

2 Depression Show forest plot

1

Mean Difference (IV, Random, 95% CI)

Totals not selected

2.1 HADS, depression score (7 items/21 points)—end of treatment

1

Mean Difference (IV, Random, 95% CI)

0.0 [0.0, 0.0]

3 Drop‐out Show forest plot

1

Risk Ratio (M‐H, Random, 95% CI)

Totals not selected

Figures and Tables -
Comparison 4. Exercise therapy + antidepressant placebo versus antidepressant + exercise placebo
Comparison 5. Exercise therapy + antidepressant versus antidepressant + exercise placebo

Outcome or subgroup title

No. of studies

No. of participants

Statistical method

Effect size

1 Fatigue Show forest plot

1

Mean Difference (IV, Random, 95% CI)

Totals not selected

1.1 Fatigue Scale, FS (14 items/0 to 42 points)—end of treatment

1

Mean Difference (IV, Random, 95% CI)

0.0 [0.0, 0.0]

2 Depression Show forest plot

1

Mean Difference (IV, Random, 95% CI)

Totals not selected

2.1 HADS, depression score (7 items/21 points)—end of treatment

1

Mean Difference (IV, Random, 95% CI)

0.0 [0.0, 0.0]

3 Drop‐out Show forest plot

1

Risk Ratio (M‐H, Random, 95% CI)

Totals not selected

Figures and Tables -
Comparison 5. Exercise therapy + antidepressant versus antidepressant + exercise placebo