Introduction

The subjective symptom fatigue is a major source of health care utilization and it is one of the most widespread symptoms in the general population (Lloyd 1998). Prolonged fatigue forms the basis of, among others, chronic fatigue syndrome (Lloyd 1998). Reasonable evidence currently exists to justify the assumption that psychological factors (e.g. chronic stress), mediated by biological factors, are involved in the development of many somatic complaints and disorders (Papousek et al. 2002). This apparently applies to prolonged fatigue as well. Research indicates that chronic fatigue syndrome is frequently preceded by negative life events or chronic stressors, sometimes in combination with viral infections (Theorell et al. 1999; van Houdenhoven et al. 2001; Ware and Kleinman 1992). Chronic stress may in some cases, when over activation of the stress systems is sustained, result in long-term negative effects on biological factors (e.g. the autonomic nervous system) (McEwen 1998; Clements and Turpin 2000; Cohen et al. 2000). The direct relationship between imbalances in the autonomic nervous system and prolonged fatigue has also been studied (Pagani et al. 1994; Stewart 2000).

Heart rate variability (HRV) is a marker that can be used as a non-invasive method to reflect autonomic activity (Task Force of the European Society of Cardiology and the North American Society of Pacing and Electrophysiology 1996). The analysis of HRV allows the deduction of the effects of complex variability in biological pathways (Friedman and Thayer 1998). Cardiovascular processes interact with respiration to meet the highly variable metabolic demands of the organism and to maintain homeostasis (Wientjes 1992). A number of closely interacting neural pools in the brain stem, the formatio reticularis of the medulla oblongata (together often referred to as the respiratory center) control the depth and rate of breathing (Wientjes 1992). In addition to HRV, therefore, respiration rate (RR) may be interesting as a measure of autonomic nervous system functioning in people with prolonged fatigue.

Before HRV and RR can be used in a clinical population of fatigued subjects, it is of great importance to assess the reproducibility of such measurements in a population with prolonged fatigue. Should these measurements remain stable over time and under similar conditions, they would be ideal for tracking modifications in clinical state when treatment plans are started. In this case, changes in the variables would have a high probability of truly representing either alterations in the clinical state or the effects of the experimental condition (Stein et al. 1995). Sandercock et al. (2005a, b) recently reviewed the current literature on the reliability of short-term HRV measurements. They emphasized the need for further studies to assess the reliability of HRV, particularly in clinical populations.

The present study has two goals. The primary goal is to evaluate the reproducibility of HRV and RR (measured with a device that is easy to use in practice) in participants with prolonged fatigue complaints during rest and light physical activity. Because previous research (Guijt et al. 2007) with the same device has yielded reproducible measurements in healthy subjects, good reproducibility can be expected. Should measurements of HRV and respiration appear reproducible, the second goal of the study is to assess the concurrent validity of HRV and RR measurements as indicators of the degree of fatigue. Good concurrent validity can be expected for HRV, as earlier studies have shown diminished HRV in subjects with chronic fatigue (Pagani et al. 1994; Stewart 2000). No expectations were expressed for RR, as no data were found on the effects of chronic stressors on RR, even though increased RR is associated with situational perceived stressors (Grossman 1983).

Methods

Participants

All participants were recruited from among the clients of two outpatient clinics for rehabilitation and medical fitness in the Netherlands. The parameters were evaluated within a heterogenous convenience sample of participants who had subjectively reported prolonged fatigue, which had resulted in functional impairments in their daily lives.

A power analysis using nQuery Advisor (Elashoff 2000) was performed in advance. Results of this analysis showed that 23 participants were needed in order to find intra-class correlations with a 95% confidence interval between 0.80 and 0.95, a power of 0.80 and an α of 0.05. With respect to concurrent validity, 19 subjects were needed in order to find a correlation of 0.75 with a one-sided 95% confidence interval with a lower bound of 0.50, a power of 0.80 and an α of 0.05.

Twenty-seven patients in the age of 18–65 years were asked to participate in this study. Prior to participation, all participants were informed. They also completed the Physical Activity Readiness Questionnaire (Takken 2006) to determine whether performing any physical exercise was safe without a doctor’s approval.

Procedure and design

The study was structured according to a test–retest within subjects design, using HRV and RR as dependent variables and time as an independent variable.

Between March and July 2006, all subjects underwent evaluations of HRV and RR on two occasions, with an interval of 3–4 days between assessments. Two to 3 days before the first assessment of HRV and RR, the subjects completed three questionnaires to measure the extent of their fatigue complaints, subjective health complaints and functional impairment. The questionnaires were completed under the guidance of the test leader. A diagram of the procedure is presented in Fig. 1.

Fig. 1
figure 1

Schematic presentation of the protocol

On both assessment days the participants visited the outpatient clinic. The protocol (Guijt et al. 2007) was performed in a separate room, starting at approximately the same time of day on each occasion. The protocol took 30 min. After the explanation, the subjects were seated in a resting position for 5 min for adaptation purposes, after which they reclined in a supine position for 10 min (reclining). They subsequently performed light exercise for 12 min (cycling), cycling on a bicycle ergometer using a single load of 50 W with a pedal frequency between 60 and 65 min−1 (the posture of the subjects was the same on both occasions).

Parameters

Variation in heart rate, HRV, was evaluated by means of time-domain measures. In a continuous electrocardiographic record (ECG) QRS complexes are shown. The R wave peaks of the QRS complex were detected and the so called normal-to-normal (NN) intervals were determined. Time-domain measures were calculated from these NN intervals and differences between adjacent NN intervals. HRV was assessed as the standard deviation of the NN intervals (SDNN) and the square root of the mean squared differences of successive NN intervals (RMSSD). RR was assessed by means of chest extension, defining the breath frequency per minute.

Measurement device

Heart rate variability and RR were recorded using the Co2ntrol (Decon Medical Systems, Weesp, the Netherlands). The Co2ntrol uses a Polar HR “detection board” (PCBA receiver) to register RR intervals. The QRS detection timing accuracy and detection reliability of the detector system were tested with an artificially generated ECG signal. The tests indicated that timing errors of less than 1 ms can be detected in real measurements, even under noisy conditions (Ruha et al. 1997). The device is attached to an elastic belt. The belt contains a stable case with heart rate electrodes and a polar HR transmitter (Polar T31™ transmitter, Polar Electro, Almere, the Netherlands). The Co2ntrol is built to detect QRS complexes and to determine RR during normal activities. ‘Normal-to-normal’ (NN) intervals (i.e. intervals between adjacent QRS complexes) are defined with an accuracy of 1 ms. To measure RR, inhalation and exhalation times (which are assessed by means of the chest extension) are logged with a frequency of 1 ms. The amplitude resolution of the Co2ntrol recording analog to digital conversion is 10 bits (i.e. 1,024 points).

Data reduction

To define HRV, the raw data were transferred to the Lifestylemanager, a specially developed software package (Decon Medical Systems, Weesp, the Netherlands). First, the last seven of the 10 min of reclining were selected to define resting values and the last nine of the 12 min of cycling were selected to define the light physical activity values. Data recorded at heart rates below 30 beats/min and above 220 beats/min were filtered out. The two HRV parameters, SDNN (ms) and RMSSD (ms), were defined in the Lifestylemanager for each of the selected time periods. To define RR, data were transferred to the Co2ntrol software (Decon Medical Systems, Weesp, the Netherlands). Breath frequency per minute was defined for the same time selection that was used to calculate the HRV parameters.

Questionnaires

Checklist Individual Strength (CIS)

Subjects completed the CIS in order to measure the extent of their fatigue complaints (Vercoulen et al. 1999). This questionnaire has shown good reliability and validity for measuring the extent of fatigue complaints in subjects with chronic fatigue syndrome and within a working population (Beurskens et al. 2000; Vercoulen et al. 1994). The checklist consists of 20 items concerning several aspects of fatigue that the subjects have experienced during the last 2 weeks. Each item is scored on a seven-point Likert scale. The total score range from 20 to 140 with higher scores representing more fatigue (Vercoulen et al. 1999).

Subjective Health Complaints (SHC) questionnaire

Participants also completed the SHC (Eriksen et al. 1999) to measure fatigue. The SHC was developed to determine the degree of subjective health complaints based on the sustained arousal theory of Eriksen and Ursin (2004), consists of 29 items (five subscales) concerning (the severity of) subjective somatic and psychological complaints experienced during the last 30 days. The subscale Pseudoneurology (PN) (63 points maximum), which measures fatigue, was used in this study. The score for each complaint is calculated as the product of the duration of the complaint divided by 10 and the severity of the complaint. A higher score represents a higher degree of fatigue (Eriksen et al. 1999).

MOS 36-item Short-Form Health Survey (SF-36)

To determine the actual level of functional impairments, each participant completed the SF-36 (Ware and Sherbourne 1992), to assess functional status or quality of life. This study uses scores on the four scales that measure functional status: (1) physical functioning, (2) role limitation due to physical health problems, (3) social functioning and (4) role limitation due to emotional problems (Ware et al. 1993, 1994). The score on each scale can range from 0 to 100 and indicates the percentage of the total possible score. Lower scores indicate more impairment (Ware et al. 1993, 1994).

Statistical analyses

All statistical analyses were conducted using SPSS version 12.0 for Windows (SPSS Inc., Chicago, IL, USA). First, the means and standard deviations of the scores from the CIS, the SF-36 and of the subscale PN of the SHC were calculated. Second, the means and standard deviations of the HRV parameters and RR were calculated for each selected time period. The reproducibility of HRV and RR measurements was subsequently quantified, using each of two available methods: first by calculating reliability and second by calculating agreement.

Reliability

Measures of reliability refer to the variance in variation between persons, relative to the total variance of the measurements. This provides information on whether a measurement device can distinguish between persons (de Vet 1998). The intra-class correlation coefficients (ICCs) and the ICC 95% limits of agreement (ICC 95% LoA) of the mean HRV and mean RR, as measured with the Co2ntrol, were computed to determine test–retest reliability. Model 3.1 was used for all intra-class correlations, as this is recommended for reliability analyses (Shrout and Fleiss 2006). Good reproducibility was defined as intra-class correlations ranging from 0.60 to 0.81. Intra-class correlations above 0.81 were considered to indicate excellent reproducibility (Landis and Koch 1977; Marks and Lightfoot 1999; Pitzalis et al. 1996).

Agreement

Measures of agreement refer to the absolute measurement error that is associated with a single measurement taken from a single individual. Agreement provides information on whether a measurement device is able to achieve the same value for the same subject over repeated measurements (de Vet 1998). The standard error of measurement (SEM), the square root of the error-mean-square, was calculated as a measure of agreement (Bland and Altman 1996).

Concurrent validity

Concurrent validity, a component of criterion-related validity, examines the correlation between two constructs assessed that are assessed for the same subject at approximately the same time. The new measure is compared to an existing valued measure or ‘gold standard’ (Innes and Straker 1999). Pearson correlations were calculated to determine the concurrent validity. The Pearson correlation between mean HRV and mean RR at measurement 1 during the conditions of reclining and cycling (as measured with the Co2ntrol) and fatigue (as measured with the CIS) was calculated. Next, the Pearson correlation between mean HRV and mean RR at measurement 1 during the conditions of reclining and cycling (as measured with the Co2ntrol) and the degree of fatigue (as measured with the subscale PN of the SHC) was calculated. Concurrent validity was considered moderate when the Pearson correlation exceeded 0.50 and good when the Pearson correlation exceeded 0.75 (Innes and Straker 1999).

Results

Participants

Of the 27 subjects who were asked to participate in this study, 25 [2 males and 23 females aged 43 ± 3 and 48 ± 8 years (means ± SD), respectively] completed the questionnaires and both HRV and RR assessments. Two subjects dropped out, as they found the procedure to be overly burdensome. Data from the CIS and SF-36 questionnaires were available for all 25 subjects. The SHC yielded usable data from 24 subjects. Data on the HRV parameters (SDNN and RMSSD) for both conditions (reclining and cycling) were available for 24 subjects. For the cycling condition, RR data were available from 25 subjects; for the reclining condition, data were available from 23 subjects.

Questionnaires

Table 1 shows the number of subjects completing the questionnaires as well as the means and the standard deviations of the total score on the CIS, the scores on four subscales of the MOS 36-item Short-Form Health Survey (SF-36) and the score on the subscale PN of the SHC questionnaire.

Table 1 Number of subjects (N) completing the questionnaires and the means and standard deviations of the total score on the Checklist Individual Strength (CIS), the scores on four subscales of the MOS 36-item Short-Form Health Survey (SF-36) and the score on the subscale Pseudoneurology (PN) of the Subjective Health Complaint (SHC) questionnaire

The mean total score of all subjects on the CIS was 100.7. The scores on the four SF-36 subscales ranged from 16.0 to 75.8. Finally, the mean score on the subscale PN of the SHC was 15.7.

Parameters

The number of subjects is presented in Table 2, along with the means and standard deviations of the HRV parameters SDNN, RMSSD and RR.

Table 2 Number of measurements (N) used for analysis and the means and standard deviations for heart rate variability [SDNN (ms) and RMSSD (ms)] and respiration rate [RR (breaths/min)] required at measurement 1 (T1) and measurement 2 (T2)

The mean SDNN was approximately 18 ms for the cycling condition and approximately 41 ms for the reclining condition. The mean values for RMSSD were approximately 7 ms for cycling and approximately 16 ms for reclining. The mean RR values were approximately 18 breaths/min while cycling and approximately 9 breaths/min while reclining.

Reproducibility

The number of measurements used for analysis, ICC, ICC 95% LoA and SEM values for both HRV parameters (SDNN and RMSSD) and for RR are presented in Table 3.

Table 3 Number of measurements (N) used for analysis, intra-class correlation coefficients (ICCs), ICC 95% limits of agreement, standard error measurement values [SEM (ms)] for the heart rate variability (HRV) parameters (SDNN and RMSSD) and for respiration rate

Both SDNN and RMSSD showed excellent ICC values (ICC values ranged from 0.86 to 0.93) during both cycling and reclining. The lower bounds of the ICC 95% LoA were good for RMSSD during cycling and for RMSSD and SDNN during reclining (lower bounds between 0.71 and 0.79). The lower bound of the ICC 95% LoA was excellent (0.84) for SDNN during cycling. The ICC value for RR during cycling (0.85) was excellent. For RR during reclining the ICC value (0.65) was good. The lower bound of the ICC 95% LoA was good (0.69) for RR during cycling and poor (0.34) for RR during reclining. The SEM values for cycling were 2.34 and 1.08 ms for SDNN and RMSSD, respectively. For lying they were 7.71 and 2.50 ms for SDNN and RMSSD, respectively. The SEM values for RR were 1.99 and 1.82 ms for cycling and reclining, respectively.

Concurrent validity

The number of measurements used for analysis, Pearson correlation coefficients between SDNN and RMSSD and fatigue scores on the CIS and the SHC subscale PN are presented in Table 4.

Table 4 Number of measurements used for analysis (N), Pearson correlation coefficients and significance scores between HRV (SDNN and RMSSD) and RR and the CIS total score, and Pearson correlation coefficients and significance scores between HRV (SDNN and RMSSD) and RR and the score on the subscale PN of the SHC

The concurrent validity of HRV (SDNN and RMSSD), for both cycling and reclining, with the CIS score was lower than moderate (non-significant correlations between 0.07 and 0.12). The concurrent validity of RR, for both cycling and reclining, with the CIS score was also lower than moderate (for cycling r = 0.15, P = 0.484 and for reclining r = −0.05, P = 0.813). The concurrent validity of SDNN and RMSSD, for both cycling and reclining, with the score on the subscale PN was also lower than moderate (correlations between −0.21 and 0.19). Finally, the concurrent validity of RR for cycling and reclining, with the score on the subscale PN was also lower than moderate (for cycling r = 0.10, P = 0.639 and for reclining r = −0.21, P = 0.351).

Discussion

The results of this study supported and contradicted the beforehand formulated hypotheses. Good reproducibility was found for measurements of HRV and RR. Measurements of HRV and RR had lower than moderate concurrent validity for determining fatigue, as assessed with the CIS and the SHC subscale PN.

The mean total CIS score of the subjects in this study is much higher than the mean total score of a healthy group, as reported by Vercoulen et al. (1999). This implies that the subjects in this study did indeed suffer from severe fatigue problems, as confirmed by the fact that 84% of the sample scored higher than the established cut-off point for chronic fatigue of >76 (Bultmann et al. 2000). Reeves et al. (2005) reported significantly lower scores on all eight subscales of the SF-36 in subjects with chronic fatigue syndrome, as compared to a healthy control group. Consistent differences between the SF-36 scores of patients with chronic fatigue syndrome and those of control subjects (Buchwald et al. 1996; Schmaling et al. 1998) have been found before and our subjects scored even lower on the four subscales of the SF-36 than did the fatigued subjects in Reeves et al. (2005). It is concluded that although we did not include subjects with CFS criteria, they indeed suffered from substantial functional impairments and considerable fatigue levels.

To our knowledge, for the first time, reproducibility of HRV and RR has been studied in a sample of subjects with prolonged fatigue problems. Earlier reproducibility studies have focused on healthy subjects and other kinds of patient populations (Carrasco et al. 2003; Marks and Lightfoot 1999; Pardo et al. 1996; Sandercock et al. 2004; Schroeder et al. 2004; Sinnreich et al. 1998; Tarkiainen et al. 2005). This study is a sequel to an earlier study that used the same device to measure HRV and RR in healthy subjects (Guijt et al. 2007). The measurement device generated reliable HRV and RR measurements in a sample of healthy subjects and in a sample of subjects with prolonged fatigue complaints. This means that the Co2ntrol is a suitable device to distinguish between both healthy subjects and subjects with prolonged fatigue complaints. Both studies showed good agreement between repeated HRV and RR measurements.

A number of interesting findings emerged from a comparison of the findings of the presents study with those of the earlier study, which evaluated the reliability of HRV and RR measurements with the Co2ntrol in healthy subjects (Guijt et al. 2007). As expected, the sample of healthy subjects in the earlier study showed higher SDNN and RMSSD values (HRV parameters) for cycling and reclining than did the fatigued subjects in this study. The findings for RR are even more interesting. The sample of fatigued participants in the present study showed lower RRs for both cycling and reclining than the healthy subjects had shown. Although many studies have evaluated breathing patterns in relation to perceived situational stressors (Grossman 1983), none to date have evaluated breathing patterns in chronic stress. As mentioned before, perceived situational stressors are associated with higher RRs (Grossman 1983). Recently, however, Anderson and Chesney (2002) reported an association between an inhibited breathing pattern and sustained stress (perceived stress over the past month). According to these authors an inhibited breathing pattern might explain the contribution of chronic stress to the development of hypertension. Comparison of the RR values of the sample of subjects in the present study with those of the healthy subjects suggests a decreased RR in subjects prolonged fatigue, in accordance with the findings of Anderson and Chesney (2002).

No studies are available that evaluate the validity of HRV and RR measurements to determine fatigue. Gurbaxani et al. (2006) correlated questionnaires and biological variables with case classifications of chronic fatigue syndrome. Among other conclusions, they established that the SF-36 correlated highly with the case classification. They further state that biological correlates of chronic fatigue syndrome (e.g. heart rate and HRV) require further investigation. In the present study, HRV and RR measurements did not correlate significantly with either CIS scores or scores on the subscale PN of the SHC. This means that HRV and RR cannot be used to determine fatigue. This does not mean, however, that these subjects with fatigue complaints do not have lowered HRV and/or a higher or lower RR compared to their healthier states before they became fatigued. This should be confirmed in a study with an appropriate design.

A limitation of the present study should be taken into account with respect to its comparability to other studies that measure HRV. The Task Force of the European Society of Cardiology and the North American Society of Pacing and Electrophysiology (Task Force of the European Society of Cardiology and the North American Society of Pacing and Electrophysiology 1996) published guidelines for HRV measurements, which specify that 5-min recordings using frequency domain analyses are preferred for short-term HRV measurements. In this study, the HRV parameters, SDNN and RMSSD, were calculated using time selections of 7 min for reclining and 9 min for cycling. Because the Lifestylemanager software that was used in this study requires 300 data points to calculate SDNN and RMSSD, data selections of more than 5 min were needed for subjects whose heart rates were below 60 beats/min. This practical consideration was the reason for this deviation from the guidelines.

Conclusions

We conclude from our findings that measurements of time-domain HRV (SDNN and RMSSD) and RR are reproducible in this sample of fatigued participants. The results of the repeated measurements do not differ much from each other and the measurement device is capable of discriminating between subjects. Prior to this study, we had suggested that HRV and RR could be suitable for determining the degree of fatigue complaints. This suggestion was not supported in this sample. In addition to a number of earlier studies, however, Broderick et al. (2006) recently reported new evidence indicating the presence of an impaired sympatho-vagal balance in prolonged fatigue. Because HRV and RR have shown good reliability and agreement in both healthy and fatigued subjects, these parameters could be useful for tracking modifications in clinical state when a treatment plan is started. It is possible that fatigued people do have a lowered HRV and/or a higher or lower RR. Examining long-term changes in HRV, RR and fatigue in subjects with prolonged fatigue as they recover from their fatigue through treatment could therefore be an interesting topic for a future longitudinal study.