Elsevier

NeuroImage

Volume 51, Issue 4, 15 July 2010, Pages 1334-1344
NeuroImage

Reliability and validity of MRI-based automated volumetry software relative to auto-assisted manual measurement of subcortical structures in HIV-infected patients from a multisite study

https://doi.org/10.1016/j.neuroimage.2010.03.033Get rights and content

Abstract

The automated volumetric output of FreeSurfer and Individual Brain Atlases using Statistical Parametric Mapping (IBASPM), two widely used and well published software packages, was examined for accuracy and consistency relative to auto-assisted manual (AAM) tracings (i.e., manual correction of automated output) when measuring the caudate, putamen, amygdala, and hippocampus in the baseline scans of 120 HIV-infected patients (86.7% male, 47.3 ± 6.3 y.o., mean HIV duration 12.0 ± 6.3 years) from the NIH-funded HIV Neuroimaging Consortium (HIVNC) cohort. The data was examined for accuracy and consistency relative to auto-assisted manual tracing, and construct validity was assessed by correlating automated and AAM volumetric measures with relevant clinical measures of HIV progression. When results were averaged across all patients in the eight structures examined, FreeSurfer achieved lower absolute volume difference in five, higher sensitivity in seven, and higher spatial overlap in all eight structures. Additionally, FreeSurfer results exhibited less variability in all measures. Output from both methods identified discrepant correlations with clinical measures of HIV progression relative to AAM segmented data. Overall, FreeSurfer proved more effective in the context of subcortical volumetry in HIV-patients, particularly in a multisite cohort study such as this. These findings emphasize that regardless of the automated method used, visual inspection of segmentation output, along with manual correction if necessary, remains critical to ensuring the validity of reported results.

Introduction

Magnetic resonance imaging (MRI) based brain volumetry is a valuable technique for identifying subcortical morphometric changes in vivo and determining the regional neurological impact of psychopathology, disease progression, and advancing therapeutic regimens. This approach has been useful for characterizing the effects of dementia (Carmichael et al., 2005, Teipel et al., 2008, Thompson et al., 2001), psychiatric disorders (Csernansky et al., 1998, Hickie et al., 2005, Konarski et al., 2008, Styner et al., 2004), and normal aging (Brickman et al., 2008, Elderkin-Thompson et al., 2008, Walhovd et al., 2005), as well as uncovering regional and global neurological consequences of systemic diseases such as the Human Immunodeficiency Virus (HIV) (Carmichael et al., 2007, Sporer et al., 2005, Stout et al., 1998, Thompson et al., 2005, Thompson et al., 2006), diabetes (Jongen and Biessels, 2008, Perantie et al., 2007, Tiehuis et al., 2008, Wessels et al., 2007), and scoliosis (Liu et al., 2008). As techniques in MRI continue to advance, in vivo volumetric measurement will become increasingly valuable in the drive to understand the evolution and progression of injury for CNS disorders as well as typical aging.

The range of clinical applications for MRI volumetry has generated intense interest in maximizing the accuracy and efficiency of automated segmentation techniques. For years, manual delineation by trained experts has remained the “gold standard” of accuracy in volumetric analyses. Yet while it remains the current reference standard for segmentation, the accuracy of manual volumetry relative to true structure volume is still widely debated, as results can be influenced by factors such as anatomical protocols, tracer experience, scan acquisition parameters, image quality, and even the computer hardware employed in the tracing procedure (Jack et al., 1990, Jack et al., 1995, Warfield et al., 2004). Moreover, manual tracings are time consuming, taking up to 2 h per structure (though this time may vary depending on structure complexity, slice thickness, and rater experience). Thus, the required time, financial and personnel resources render manual volumetry in large cohort studies impractical.

Multiple automated methods have been developed to reduce tracing time while ensuring excellent reliability (Andersen et al., 2002, Heckemann et al., 2006, Powell et al., 2008). In particular, the FreeSurfer software package (Martinos Center, Boston, MA) and Individual Brain Atlases toolbox (IBASPM; Cuban Neuroscience Center, Havana, Cuba) of the popular Statistical Parametric Mapping package (SPM; Wellcome Trust Centre for Neuroimaging, UK) are widely used and have well-published methods. Both packages are fully automated, employing an atlas-based segmentation approach to generate an individualized anatomical label map for a spatially normalized patient image, based on an atlas composed of manually traced reference scans (Alemán-Gómez et al., 2006, Ashburner and Friston, 1997, Ashburner et al., 1999, Ashburner and Friston, 2005, Fischl et al., 2002, Han and Fischl, 2007, Tzourio-Mazoyer et al., 2002).

While both of these packages have been validated by their creators, their accuracy and/or consistency may vary depending on image quality, scan parameters, and scanning hardware (Jovicich et al., 2009, Han and Fischl, 2007, Tae et al., 2008). Additionally, previous comparisons of competing automated methods have shown notable differences in their performance relative to manual segmentation, despite examining only a limited number of structures (Cherbuin et al., 2009, Klauschen et al., 2009, Morey et al., 2009, Shen et al., 2009, Tae et al., 2008). Some have suggested the patient composition of the source atlas, particularly the inclusion of healthy or diseased subjects, may in fact influence how robust each software package will be with diseased patients or otherwise morphologically different brains (Csapo et al., 2009, Tae et al., 2008, Zhang, 1996). Differences in FreeSurfer, IBASPM processing pipelines in addition to atlas composition, such as the algorithms for registration and statistical application of the information contained in the atlases, underscore the importance of re-validating these packages prior to analyzing data obtained with scan parameters or patient populations that are distinct from those of previous validation studies, especially in the case of a large sample size or multisite study.

The purpose of this study was to address previously described inconsistencies in FreeSurfer and IBASPM subcortical segmentation results by examining the automated volumetric measurement of several clinically relevant subcortical structures from a large multisite consortium study of HIV infection. We compared the accuracy and consistency of volumetric results for the caudate, putamen, hippocampus, and amygdala obtained using three methods: AAM segmentation, FreeSurfer (Martinos Center for Biomedical Imaging, Boston, MA), and IBASPM (Cuban Neuroscience Center, Havana, Cuba). Cognitive decline is a well-described feature of HIV progression, and a small number of studies have linked this to atrophy of subcortical structures (González-Scarano and Martín-García, 2005, Hall et al., 1996, Paul et al., 2002, Ragin et al., 2005, Robertson et al., 2007, Stout et al., 1998). Future investigations of this relationship will call for large-scale studies that will rely on automated volumetric procedures to efficiently obtain data. To ensure the data is interpreted correctly, it will be crucial to anticipate and thereby minimize the possible shortcomings of these automated methods. To this end, we will attempt to characterize the accuracy and variability of these methods, as well as examine the ability of each to uncover significant, valid relationships when correlated with clinical measures of HIV progression.

Section snippets

Subjects

One hundred twenty HIV-infected patients were examined in this study (86.7% male; mean age 47.3 ± 7.2 years). Patients were recruited as part of the ongoing multisite NIH-funded MRS (magnetic resonance spectroscopy) HIV Neuroimaging Consortium (HIVNC) study based on the following inclusion criteria: HIV-positive, age ≥ 18 years, duration of HAART > 12 weeks, nadir CD4 count <  100 cells/ml during HIV history. Patients were considered to be on stable treatment (highly active antiretroviral therapies

Spatial overlap with AAM segmentation

As measured by the dice coefficient, FreeSurfer (FS) segmentations exhibited significantly higher (paired t-test, p < 0.001) mean spatial overlap in all structures (Fig. 1). This difference was most pronounced in the right amygdala (FS 0.740 ± 0.071; IBASPM 0.259 ± 0.114) and right hippocampus (FS 0.749 ± 0.069; IBASPM 0.374 ± 0.112). While the difference in dice coefficients was smallest in the right caudate (FS 0.813 ± 0.065; IBASPM 0.721 ± 0.128), the difference was nonetheless significant (p < 0.001).

The

Performance characteristics of FreeSurfer and IBASPM

Past validation studies examining automated segmentation methods have varied widely in the measures they have used. The analyses in this study were chosen in an attempt to apply the full range of metrics that have appeared in various combinations in prior publications. Moreover, each metric characterizes a slightly different aspect of segmentation performance and must be considered in relation to one another in order to adequately interpret the results of an analysis. For example, absolute

Acknowledgments

We greatly acknowledge the following HIV Neuroimaging Consortium sites for the data used in this study: Stanford University, University of California Los Angeles, UCLA Harbor, University of California San Diego, University of Colorado, University of Pittsburgh, University of Rochester. We also acknowledge the support of the following funding sources: R01 NS036524 and K23 MH073416.

References (64)

  • J. Jovicich et al.

    MRI-derived measurements of human subcortical, ventricular and intracranial brain volumes: reliability effects of scan sessions, acquisition sequences, data analyses, scanner upgrade, scanner vendors and field strengths

    Neuroimage

    (2009)
  • R.A. Morey et al.

    A comparison of automated segmentation and manual tracing for quantifying hippocampal and amygdala volumes

    Neuroimage

    (2009)
  • S. Mori et al.

    Stereotaxic white matter atlas based on diffusion tensor imaging in an ICBM template

    Neuroimage

    (2008)
  • R. Paul et al.

    Relationships between cognition and structural neuroimaging findings in adults with human immunodeficiency virus type-1

    Neurosci Biobehav Rev

    (2002)
  • S. Powell et al.

    Registration and machine learning-based automated segmentation of subcortical and cerebellar brain structures

    Neuroimage

    (2008)
  • D.T. Pulsipher et al.

    MRI volume loss of subcortical structures in unilateral temporal lobe epilepsy

    Epilepsy Behav

    (2007)
  • D.W. Shattuck et al.

    Construction of a 3D probabilistic atlas of human cortical structures

    Neuroimage

    (2008)
  • M. Styner et al.

    Boundary and medial shape analysis of the hippocampus in schizophrenia

    Medical Image Analysis

    (2004)
  • P.M. Thompson et al.

    3D mapping of ventricular and corpus callosum abnormalities in HIV/AIDS

    Neuroimage

    (2006)
  • N. Tzourio-Mazoyer et al.

    Automated anatomical labeling of activations in IBASPM using a macroscopic anatomical parcellation of the MNI MRI single-subject brain

    Neuroimage

    (2002)
  • J.K. Udupa et al.

    A framework for evaluating image segmentation algorithms

    Comput Med Imaging Graph

    (2006)
  • K.B. Walhovd et al.

    Effects of age on volumes of cortex, white matter and subcortical structures

    Neurobiol Aging

    (2005)
  • H. Zaidi et al.

    Comparative assessment of statistical brain MR image segmentation algorithms and their impact on partial volume correction in PET

    Neuroimage

    (2006)
  • Y.J. Zhang

    A survey on evaluation methods for image segmentation

    Pattern Recognition

    (1996)
  • Y. Alemán-Gómez et al.

    IBASPM: toolbox for automatic parcellation of brain structures

  • S.L. Archibald et al.

    Correlation of in vivo neuroimaging abnormalities with postmortem human immunodeficiency virus encephalitis and dendritic loss

    Arch Neurol

    (2004)
  • K.R. Beutner et al.

    Estimating uncertainty in brain region delineations

    Information Processing in Medical Imaging. In Lecture Notes in Computer Science

    (2009)
  • A.M. Brickman et al.

    Brain morphology in older African Americans, Caribbean Hispanics, and whites from northern Manhattan

    Archives of Neurology

    (2008)
  • O.T. Carmichael et al.

    Cerebral ventricular changes associated with transitions between normal cognitive function, mild cognitive impairment, and dementia

    Alzheimer Disease and Associated Disorders

    (2007)
  • N. Cherbuin et al.

    In vivo hippocampal measurement and memory: a comparison of manual tracing and automated segmentation in a large community-based sample

    PLoS ONE

    (2009)
  • I. Csapo et al.

    Effect of patient population specific atlases on automatic segmentation of subcortical structures in freesurfer [Abstract]

  • J.G. Csernansky et al.

    Hippocampal morphometry in schizophrenia by high dimensional brain mapping

    Proc Natl Acad Sci USA

    (1998)
  • Cited by (112)

    • Validity of automated FreeSurfer segmentation compared to manual tracing in detecting prenatal alcohol exposure-related subcortical and corpus callosal alterations in 9- to 11-year-old children

      2020, NeuroImage: Clinical
      Citation Excerpt :

      While FreeSurfer has been shown to perform reasonably well in this regard in patients with Alzheimer’s Disease, demonstrating volume reductions (Lehmann et al., 2010; Shen et al., 2010) and hippocampal atrophy rates (Mulder et al., 2014) similar to manual segmentation, automated methods have been less successful in distinguishing between groups or identifying associations with behavioural/clinical outcomes in other pathologies. For example, in patients with HIV, the association of caudate, putamen, amygdala, and hippocampal volumes with clinical measures of disease progression differed for outputs generated by FreeSurfer, IBASPM (Individual Brain Atlases using Statistical Parametric Mapping) and auto-assisted manual tracings (Dewey et al., 2010). Depression-related hippocampal volume reductions were detected with FreeSurfer but not FSL-FIRST (Morey et al., 2009), and in former National Football League (NFL) players with neurobehavioral symptoms, automated FreeSurfer segmentation identified group differences relative to age-matched controls in 4 of 11 regions, compared to 8 of 11 with manual correction, as well as different regions showing associations with neurobehavioral factors (Guenette et al., 2018).

    View all citing articles on Scopus
    View full text