Introduction

In addition to visual interpretation of myocardial perfusion scintigraphy (MPS) data, quantitative software packages are used to support semiquantitative analysis of the data. Generally, these tools provide volumetric parameters such as left ventricular ejection fraction (LVEF), end-systolic volume (ESV) and end-diastolic volume (EDV). Additionally regional parameters, such as wall thickening, wall motion, and perfusion scores can be produced by most software packages. The introduction of these quantitative parameters has enabled researchers to compare parameters in one patient over time, or between groups of patients [1].

Although quantitative analysis improves accuracy and reduces inter- and intraobserver variability, there are also some important factors that can influence the performance of these software packages. For instance, the ability of a software tool to produce accurate and reproducible parameter values relies predominantly on its ability to determine the myocardial borders. In acquisitions in healthy humans detection of the myocardial border is challenging because of the relatively low resolution of the gamma camera compared to the thickness of the myocardium. Additionally, factors such as Compton photon scatter, attenuation, extracardiac activity and natural variations in orientation of the heart contribute to the complexity of border detection. Acquisitions in patients can also include extensive perfusion and functional abnormalities. Thus, a quantitative software package should provide a robust framework that produces correct estimates of the various parameter values in almost all situations.

Within the Nuclear Department of the Leiden University Medical Centre two software systems, quantitative gated SPECT (QGS) from Cedars-Sinai Medical Center [2] and the 4D-MSPECT software package of Invia Medical Imaging Solutions [3] (University of Michigan), are available. Both quantitative software packages are commonly used and are well validated for their clinical use [47]. Although previous studies have indicated a good correlation for EDV, ESV and LVEF between these two packages, it seems that the values are not exchangeable. So the main objective of this study was to compare the two software packages with respect to their performance on clinical patient data. In order to study the differences between the two software packages, various clinical factors were identified that could influence the relationship between the packages. A phantom study was performed to provide an absolute reference frame to compare the two systems with respect to linearity.

Materials and methods

Patient population

Consecutive patients referred for 99 mTc-tetrofosmin MPS with adenosine- or exercise-induced stress during the period November 2008 to January 2009 were prospectively included (n = 150). All acquisitions were analysed by the QGS and 4D-MSPECT packages. Poststress and rest acquisitions were also evaluated visually and scored as ‘normal’, ‘reversible’, ‘persistent’ or ‘combined reversible and persistent’. Clinical information with respect to relevant medication, risk factors and reason for MPS were gathered.

Gated-SPECT acquisition and analysis

In the present study we used a 2-day stress/rest protocol, with the stress test performed on day 1 and the rest test performed on day 2. Patients were instructed to stop β-blockers and calcium antagonists 48 h and caffeine-containing products 12 h before stress testing. The patients underwent physical exercise limited by symptoms, or, when contraindications to exercise were present, adenosine-induced stress using a standard infusion rate of 140 µg/kg per minute. Injection of the radiopharmaceutical was done at peak exercise, or in the third minute of pharmacological stress induction. Acquisition, both after stress and at rest, was performed with a triple-headed camera system (CGA 9300; Toshiba, Tokyo, Japan) 45 min after injection of 500 MBq of 99 mTc-tetrofosmin. Images were acquired over 360° (6° per step, 40 s per step). Automatic ECG gating was applied on the R-R interval with 16 frames per cardiac cycle, with a tolerance window of 50%.

Data were prefiltered using a Butterworth filter (eighth order, cut-off frequency 0.26 Hz), and reconstructed as transaxial images using a filtered back-projection algorithm. No attenuation or scatter correction was applied. Acquisitions were projected as tomographic slices in the short-axis and vertical/horizontal long-axis directions for visual analysis. Quantitative values were automatically generated for EDV, ESV and LVEF by QGS (version 2.0 rev A) and 4D-MSPECT (version 5.2 SP 2). Heart size was estimated using the left ventricular end-diastolic volume.

Phantom study

The Amsterdam gated (AGATE) dynamic cardiac phantom (model 830.100; VANDERWILT techniques, Boxtel, The Netherlands) was used as a reference for the LVEF. The phantom consists of a water-filled torso in which a twin membrane lumen filled with water and 10 MBq 99 mTc-pertechnetate is placed. The cardiac volume is automatically regulated by a computer-controlled pumping system (Fig. 1). The system produces ECG signals in agreement with the diastolic and systolic phases of the heart. In this study the cardiac output was varied at a constant heart rate of 80 beats per minute to provide a range of LVEF values. The MPS acquisitions of the phantom were analysed by the QGS and 4D-MSPECT packages.

Fig. 1
figure 1

AGATE heart phantom with twin membrane lumen representing the myocardial wall and LV lumen placed in a water-filled Plexiglas torso. The LV volume is adjusted by a computer-controlled pumping system to produce different LVEF values

Statistical methods

All statistical analyses were performed using SPSS v16.0. Differences in EVD, ESD and LVEF between the packages were tested for significance using the paired Student’s t-test, with p < 0.05 considered significant. The agreement between the systems was evaluated using Bland-Altman graphs. Linear regression analysis was performed on the phantom data to provide information about the linearity between the two software packages over a range of LVEF values. Multiple linear regression analysis was performed to identify the main factors explaining differences in ESV, EDV and LVEF derived by the two software packages. Significant factors were used to further identify the relationship between the QGS and 4D-MSPECT packages.

Results

One patient had to be excluded during the study because 4D-MSPECT could not provide accurate border detection, another patient had to be excluded because neither package was able to provide accurate border detection. Details of the included population (n = 148) can be found in Table 1.

Table 1 Patient characteristics

Differences and agreement based on clinical patient data

The clinical patient dataset was used to calculate the EDV, ESV and LVEF using 4D-MSPECT and QGS (Table 2). Mean parameter values are shown, supplemented with the results of a paired Student’s t-test analysis calculating the mean differences between the values calculated by the two packages. Although the correlations were high (correlation coefficients ranging from 0.886 to 0.980 for the different parameters), the EDV, ESV and LVEF calculated by the two software packages differed significantly (p < 0.001). Additionally, a Bland-Altman analysis was performed for rest and poststress acquisitions (Fig. 2). The limits for EDV, ESV and LVEF in the rest acquisitions were −11.8–66.4 ml (mean 27.3 ml), −29.3–46.9 ml (mean 8.8 ml) and −11.7–19.7% (mean 4.0%), respectively. The limits for EDV, ESV and LVEF in the poststress acquisitions were −15.3–69.7 ml (mean 27.2 ml), −29.7–47.3 ml (mean 8.5 ml) and −10.5–19.1% (mean 4.3%), respectively. A positive shift of the mean-line from the line of equality is apparent in all figures, indicating that 4D-MSPECT provided higher values then QGS for all parameters. Furthermore, differences between the software packages increased with increasing average heart size.

Table 2 EDV, ESV and LVEF calculated by 4D-MSPECT and QGS
Fig. 2
figure 2

Bland-Altman analyses of EDV, ESV and LVEF calculated by the two software systems for the rest and poststress acquisitions. Bland-Altman analysis indicates the difference of the estimates obtained by the two systems (4D-MSPECT − QGS) in relation to the average of these estimates. The red dashed lines represent the Bland-Altman limits (±1.96×SD), the red solid lines represent the mean differences, and the blue dashed lines are the lines of equality

Multivariate analysis

Stepwise multiple regression analysis identified the most important factors explaining the differences in EDV, ESV and LVEF between the packages. Factors included in the analysis were relevant risk factors, reason for MPS, stressor type, and average volumetric measurements. The average volumetric measurements were determined by calculating the means of the QGS and 4D-MSPECT values.

The factors that were able to explain the difference between the ESV values determined by two packages were average ESVrest (p < 0.001), gender (p = 0.034) and BMI (rest and poststress, p = 0.036 and p = 0.018, respectively). The difference in the EDV values was explained by BMI (rest and poststress, p = 0.002 and p = 0.005, respectively) and average ESVrest (p < 0.001). The difference in LVEF values was explained by average ESVrest (p = 0.012), average EDVrest (p = 0.048), small heart size (rest and poststress p = 0.001 and p = 0.002, respectively) and stressor type (p = 0.032). These factors were used to produce subgroups for further analysis.

Influence of various factors

Previous studies have shown that QGS overestimates LVEF in smaller hearts (EDV <70 ml), so the study population was subdivided using this cut-off value [8, 9]. The eventual distribution of the population resulted in the following subgroups; gender (104 male, 44 female), stressor type (77 adenosine, 71 exercise), heart size (15 EDV ≤70 ml, 133 EDV >70 ml), defect type (31 normal, 15 reversible, 63 persistent, 39 combination) and BMI (105 <30 kg/m2, 43 ≥30 kg/m2). The paired differences between the values in each group determined by the two packages are presented in Table 3. Since the observed differences were significant over the entire population, the nonsignificant differences (p > 0.05) are marked in this table, as they indicate a deviation from the observed trend. Overall, we found a significant difference between the EDV values (p < 0.001) determined by the two packages in all subgroups and in both rest and poststress acquisitions. A relatively smaller difference was found between the values determined by the two packages in the groups with a small heart (EDVrest 15.1 ± 5.3 ml, EDVstress 14.5 ± 5.4 ml) as well as in the group with normal perfusion (EDVrest 19.5 ± 10.4 ml, EDVstress 17.9 ± 11.1 ml), compared to the difference in the overall population (EDVrest 27.3 ± 19.5ml, EDVstress 27.2 ± 21.3 ml).

Table 3 Influence of various factors on differences in EDV, ESV and LVEF determined by QGS and 4D-MSPECT (means±SD)

Regarding ESV, the difference between the values determined by QGS and 4D-MSPECT was not significant in the normal and reversible perfusion subgroups. In addition, we found a smaller difference in ESV values between the groups with a small heart (ESVrest and ESVstress 3.1 ± 4.6 ml), compared to the overall population (ESVrest 8.8 ± 19.1 ml, ESVstress 8.5 ± 19.1 ml). Finally, the LVEF values determined by the two packages were significantly different in almost all groups, with the exception of the group with a small heart, normal perfusion and female gender.

Phantom study

There was a strong linear relationship between the software package estimates and the phantom data over the entire range of volumes (Fig. 3). Regression between the phantom (P) and 4D-MSPECT for the LVEF data was 1.041P+15.976 (R 2 0.968). Regression between the phantom and the QGS for the LVEF data was 1.140P+2.344 (R 2 0.985). The relationship between the LVEF data determined by the two software packages was also linear, with a regression of QGS=1.057(4D-MSPECT)−12.932 (R 2 0.947). The mean difference in LVEF between the 4D-MSPECT and QGS estimates was 9.61 ± 4.35 (p < 0.001).

Fig. 3
figure 3

Linear relationship between the phantom data and QGS estimates (triangles, solid line) and between the phantom data and the 4D-MSPECT estimates (circles, dashed line). The dotted line is the line of equality representing complete agreement between the phantom data and the software package estimates

Discussion

In contrast to QGS, which has been compared frequently to other imaging techniques such as blood-pool imaging and MRI, 4D-MSPECT seems to be a less well validated tool. In the present study the performances of QGS and 4D-MSPECT on both clinical patient data and phantom data were studied to gain a better understanding of the relationship between these software packages.

The degree of association

A high degree of correlation and linearity between the estimates derived from the two software packages has been found in other studies and by the manufacturers [7, 1013]. These high correlation values have sometimes been incorrectly interpreted as a measure of the interchangeability between QGS and 4D-MSPECT. Therefore, it has to be stated that correlation is a direct measure of association rather than of agreement between the packages. If two methods are compared within a population with large variation between individuals, which is presented in a dataset, correlation analysis should be used with caution. When the variation between two methods is small compared to the variation between individuals, the correlation is mainly determined by the variation within the patient population. So, in addition to a correlation analysis, a Bland-Altman analysis was performed to provide a visual representation of the level of agreement and the presence of bias [14, 15]. If two systems or methods show total agreement, all measured values should be situated on the mean line in a Bland-Altman graph, and this line should coincide with the line of equality. For excellent, but not total, agreement, the measured values should be located around the mean line within a narrow 95% confidence interval. In the Bland-Altman graphs of the agreement between QGS and 4D-MSPECT the mean-line was located above the line of equality (4D-MSPECT=QGS) for all parameters, indicating higher parameter estimates for 4D-MSPECT than for QGS. The high level of correlation found in earlier studies can be explained by the distribution of the calculated values within wide 95% confidence intervals.

Relationship between QGS and 4D-MSPECT

There were excellent linear relationships between the estimates from the two software packages, and between the software estimates and the phantom data. Linearity over an entire range of LVEF values is an important characteristic of a robust software package. The phantom study further indicated overestimations of LVEF by both QGS and 4D-MSPECT, which was more profound in the 4D-MSPECT data.

In a study by Lipke et al. [7], both 4D-MSPECT and QGS were compared with cardiac MRI. They found a significant underestimation of EDV by 4D-MSPECT, and an even higher degree of underestimation by QGS. However, they found no significant differences between the software estimates and cardiac MRI for the ESV values. This result may be explained by the effects of improved count statistics in systole due to wall thickening. For the LVEF values, there was a minimal nonsignificant deviation between 4D-MSPECT and cardiac MRI, whereas QGS provided a significant underestimation. These results are supported by the study of Stegger et al. [13] and Schaefer et al. [11]. Comparing the two packages with each other, Lipke et al. found significant differences for the EDV values between the packages, but only small insignificant differences for the ESV values. In this study there was a high correlation between the packages for all three parameters, which is also in agreement with the studies of Stegger et al. [13] and Schaefer et al. [11]. Nakajima et al. [16] compared data from both a mathematical model of the myocardium and gated blood pool studies with MPS data obtained by QGS and 4D-MSPECT. Although high correlations between the software-determined values and gated blood pool studies were found for LVEF and EDV, no clear significant difference was found between the 4D-MSPECT and QGS values. These results are in contrast with our findings and those of Lipke et al., but may be explained by the relatively small number of patients included (n = 30) and the relatively small heart sizes of these patients.

Important differences between the software algorithms

The general trend to higher estimates provided by 4D-MSPECT than by QGS found in this study may be explained by the underlying differences between the software algorithms. To provide an insight into the differences, we briefly discuss the basic principles of operation for the QGS software. This algorithm consists of three main steps: (1) segmentation of the myocardium, (2) extracting the mid-myocardial line, and (3) determining the myocardial borders. During segmentation the 3-D position of the LV within the ungated SPECT acquisition is located, and a binary mask is made of the heart. A mask is a rough black and white model indicating the location of the myocardium in a 3-D space. This binary mask is then used to locate the mid-myocardial line (the line with the highest count distribution). An ellipsoid model is fitted and adjusted to the mid-myocardial line so that it resembles the myocardial shape. The eventual ellipsoid will serve as a sampling profile for processing the individual intervals with an asymmetric gaussian count profile. For every image in the cardiac cycle the LV cavity is confined by the mitral valve plane, which is estimated at a 25% threshold of heart activity in each frame. Eventually an estimate of the myocardial borders containing holes that represent the valve plane is obtained for each frame. A more detailed description of the QGS algorithms is beyond the scope of this article, but more information can be found in the literature [1719].

The 4D-MSPECT software uses the same basic principles of segmentation, determining the mid-myocardium and border estimates as are determined by the QGS; however, there are also some crucial differences [3, 2022]. First, 4D-MSPECT uses a cylindrical sampling profile within the basal and mid-ventricular parts of the myocardium and a spherical profile in the apical part, whereas QGS uses an ellipsoidal profile throughout the myocardium. This difference may account for deviations in the estimated LV shape, especially within the basal regions. Second, 4D-MSPECT defines the valve plane perpendicular to the long axis, thus enclosing a part of the outflow tract to the LV volume. QGS, on the other hand, allows a somewhat oblique valve plane towards the septum on the long axis, thus excluding a part of the LV volume near the septum. Additionally, 4D-MSPECT also allows a basal plane motion of 5–20 mm towards the apex in systole, whereas QGS uses a fixed basal plane with a maximum motion that is confined to one slice. These differences may account for the overestimation of LVEF, and the relatively smaller difference between the two software packages during the systolic phase.

Effects of heart size

The multivariate analysis indicated that the factors gender, BMI, stressor type, average EDVrest and average ESVrest could explain the differences in parameter values between the software packages. For this reason the factors gender, BMI, defect type, heart size and stressor type were used to create subgroups within the population.

In this study, BMI and stressor type had no particular effect on the relationship between the software packages. The factors that did influence the relationship were gender, defect type and heart size. In the subgroups female gender, normal perfusion, reversible perfusion defects and small heart, the differences in LVEF determined by QGS and 4D-MSPECT were smaller than in the other subgroups tested, but still significant. This result could possibly be explained by the smaller sizes of these subgroups. Nonetheless, it remains apparent that all these subgroups had a relatively smaller heart size than the other groups. Overall, patients with a smaller heart size showed a reduced, but still significant, difference in EDV and ESV, and a small nonsignificant difference in LVEF, between the two packages. This may indicate that size affects the calculation of parameter values differently in the two software packages.

The effects of a small heart size on the values determined by the QGS software package have also been reported by others [810, 23]. The subdivision of the population in this study was based on heart size (EDV <70 ml), and presence or absence of perfusion defects. Although the distribution method differed from our classification, Lum and Coel [10] also did not find significant mean differences between the values determined by the two software packages among patients with a small heart. In the groups with present or absent perfusion defects, a significant overestimation of values calculated by 4D-MSPECT in comparison with those calculated by QGS was found, as in this study.

The partial volume effect, in which one voxel that should only describe the myocardial wall in fact contains a mixture of myocardial, LV and extracardiac values, is an important factor in small hearts. The combination of all these voxels will result in an overestimation of the actual size of the myocardial wall, reducing the estimated size of the LV cavity. As EDVs and especially ESVs, are underestimated, a significant overestimation of LVEF values will occur with decreasing heart size. Factors such as photon scatter and the relatively low camera resolution may limit the ability to obtain good border estimates by many software packages, especially in smaller hearts. In small hearts each voxel will represent a substantial part of the myocardial wall, particularly during the systolic phase. Photon scatter also affects image quality in both small and larger hearts resulting in blurring of the myocardial borders. Observations by Hambye et al. indicate that the presence of scattered photons in the main photo peak is of greater importance in smaller hearts then in large hearts [24]. So scatter compensation techniques such as methods base on the energy window, deconvolution or reconstruction are especially useful for improving image quality in smaller hearts [25].

Clinical consequences

Quantitative functional parameters are generally used to classify a patient with a normal or abnormal myocardial function in the clinical setting. Some studies that have provided normal limits for QGS, but there are very few studies that have provided such limits for 4D-MSPECT. Normal limits for LVEF calculated by QGS are, according to Lomsky et al. [26] and Ababneh et al. [27], in the range of 51–53% for women and 43–47% for men. Based on our findings, 4D-MSPECT overestimated these values by 4% on average in poststress and rest acquisitions.

Although estimates of lower normal limits were provided and good correlations between the software packages were found, it remains inadvisable to process clinical MPS data with different software packages. Despite the linearity of the relationship between the values determined by the two packages, substantial patient-specific variations in parameter estimates were also present. Additionally, the discrepancies between the values provided by these software packages may have been influenced by heart size. Consequently, it is extremely difficult to convert the parameter values for individual patients after transition to the other software.

In a recent study, Lavender et al. [28] found that other factors, such as filtering, may also influence the LVEF. They compared QGS and 4D-MSPECT in 101 patients and found a good correlation, but there was a clinically significant difference between the LVEF calculated by the two packages. In addition, the LVEF was overestimated using Butterworth filters at cut-off frequencies ≤0.8 cycles/cm in 26 of 30 patients. The authors concluded that changing the cut-off frequency by as little as 0.1 cycles/cm can cause clinically significant differences in the LVEF estimate. These aspects were not part of our study, but should definitely be taken into account in clinical practice.

Cardiac MRI is being increasingly used for functional imaging of the myocardium in clinical practice. In a recent study by Wang et al. [29], MRI was used as a reference method to evaluate the software-specific characteristics of QGS, 4D-MSPECT and the Emory cardiac toolbox (ECTB) in patients with dilated cardiomyopathy. EDV and ESV assessed by QGS did not differ significantly from those assessed by cardiac MRI, whereas ECTB and 4D-MSPECT overestimated EDV and ESV. In addition, all software packages overestimated the LVEF compared with MRI. These findings underline the fact that all modalities and tools available for quantitative analysis of myocardial studies will produce different results, which means that the results are not interchangeable.

Finally, in the present study, administration of adenosine resulted in a significant mean decrease in LVEF. The exact mechanism involved in this phenomenon is still unknown and has been reported before [30], but regarding the data presented in the tables, it seems to be related to a slight increase in ESV. Whether this is a direct effect of adenosine or is related to other factors has to be studied. Nevertheless, as mentioned before, since both QGS and 4D-MSPECT demonstrated this phenomenon, there seems to be no relationship with the software packages used in present study.

Conclusion

In this study we sought a common trend for the differences in parameter values obtained by 4D-MSPECT and QGS, and factors that may induce a bias in this relationship. Although the software packages showed a good correlation, 4D-MSPECT overestimated the values by on average 4% for the poststress and rest acquisitions compared to values calculated by QGS. There were excellent linear relationships between the values determined by the two systems and between the values determined by the two systems and the phantom data. The differences between the software package estimates were especially apparent in patients with normal and increased heart sizes, since deviations between the system values were greatly influence by heart size. The discrepancy between the values obtained by the software packages was clinically relevant, and should be taken into account when a new quantitative software system is introduced, or when multiple software systems are used in the same institution.