Reliability and validity of MRI-based automated volumetry software relative to auto-assisted manual measurement of subcortical structures in HIV-infected patients from a multisite study
Introduction
Magnetic resonance imaging (MRI) based brain volumetry is a valuable technique for identifying subcortical morphometric changes in vivo and determining the regional neurological impact of psychopathology, disease progression, and advancing therapeutic regimens. This approach has been useful for characterizing the effects of dementia (Carmichael et al., 2005, Teipel et al., 2008, Thompson et al., 2001), psychiatric disorders (Csernansky et al., 1998, Hickie et al., 2005, Konarski et al., 2008, Styner et al., 2004), and normal aging (Brickman et al., 2008, Elderkin-Thompson et al., 2008, Walhovd et al., 2005), as well as uncovering regional and global neurological consequences of systemic diseases such as the Human Immunodeficiency Virus (HIV) (Carmichael et al., 2007, Sporer et al., 2005, Stout et al., 1998, Thompson et al., 2005, Thompson et al., 2006), diabetes (Jongen and Biessels, 2008, Perantie et al., 2007, Tiehuis et al., 2008, Wessels et al., 2007), and scoliosis (Liu et al., 2008). As techniques in MRI continue to advance, in vivo volumetric measurement will become increasingly valuable in the drive to understand the evolution and progression of injury for CNS disorders as well as typical aging.
The range of clinical applications for MRI volumetry has generated intense interest in maximizing the accuracy and efficiency of automated segmentation techniques. For years, manual delineation by trained experts has remained the “gold standard” of accuracy in volumetric analyses. Yet while it remains the current reference standard for segmentation, the accuracy of manual volumetry relative to true structure volume is still widely debated, as results can be influenced by factors such as anatomical protocols, tracer experience, scan acquisition parameters, image quality, and even the computer hardware employed in the tracing procedure (Jack et al., 1990, Jack et al., 1995, Warfield et al., 2004). Moreover, manual tracings are time consuming, taking up to 2 h per structure (though this time may vary depending on structure complexity, slice thickness, and rater experience). Thus, the required time, financial and personnel resources render manual volumetry in large cohort studies impractical.
Multiple automated methods have been developed to reduce tracing time while ensuring excellent reliability (Andersen et al., 2002, Heckemann et al., 2006, Powell et al., 2008). In particular, the FreeSurfer software package (Martinos Center, Boston, MA) and Individual Brain Atlases toolbox (IBASPM; Cuban Neuroscience Center, Havana, Cuba) of the popular Statistical Parametric Mapping package (SPM; Wellcome Trust Centre for Neuroimaging, UK) are widely used and have well-published methods. Both packages are fully automated, employing an atlas-based segmentation approach to generate an individualized anatomical label map for a spatially normalized patient image, based on an atlas composed of manually traced reference scans (Alemán-Gómez et al., 2006, Ashburner and Friston, 1997, Ashburner et al., 1999, Ashburner and Friston, 2005, Fischl et al., 2002, Han and Fischl, 2007, Tzourio-Mazoyer et al., 2002).
While both of these packages have been validated by their creators, their accuracy and/or consistency may vary depending on image quality, scan parameters, and scanning hardware (Jovicich et al., 2009, Han and Fischl, 2007, Tae et al., 2008). Additionally, previous comparisons of competing automated methods have shown notable differences in their performance relative to manual segmentation, despite examining only a limited number of structures (Cherbuin et al., 2009, Klauschen et al., 2009, Morey et al., 2009, Shen et al., 2009, Tae et al., 2008). Some have suggested the patient composition of the source atlas, particularly the inclusion of healthy or diseased subjects, may in fact influence how robust each software package will be with diseased patients or otherwise morphologically different brains (Csapo et al., 2009, Tae et al., 2008, Zhang, 1996). Differences in FreeSurfer, IBASPM processing pipelines in addition to atlas composition, such as the algorithms for registration and statistical application of the information contained in the atlases, underscore the importance of re-validating these packages prior to analyzing data obtained with scan parameters or patient populations that are distinct from those of previous validation studies, especially in the case of a large sample size or multisite study.
The purpose of this study was to address previously described inconsistencies in FreeSurfer and IBASPM subcortical segmentation results by examining the automated volumetric measurement of several clinically relevant subcortical structures from a large multisite consortium study of HIV infection. We compared the accuracy and consistency of volumetric results for the caudate, putamen, hippocampus, and amygdala obtained using three methods: AAM segmentation, FreeSurfer (Martinos Center for Biomedical Imaging, Boston, MA), and IBASPM (Cuban Neuroscience Center, Havana, Cuba). Cognitive decline is a well-described feature of HIV progression, and a small number of studies have linked this to atrophy of subcortical structures (González-Scarano and Martín-García, 2005, Hall et al., 1996, Paul et al., 2002, Ragin et al., 2005, Robertson et al., 2007, Stout et al., 1998). Future investigations of this relationship will call for large-scale studies that will rely on automated volumetric procedures to efficiently obtain data. To ensure the data is interpreted correctly, it will be crucial to anticipate and thereby minimize the possible shortcomings of these automated methods. To this end, we will attempt to characterize the accuracy and variability of these methods, as well as examine the ability of each to uncover significant, valid relationships when correlated with clinical measures of HIV progression.
Section snippets
Subjects
One hundred twenty HIV-infected patients were examined in this study (86.7% male; mean age 47.3 ± 7.2 years). Patients were recruited as part of the ongoing multisite NIH-funded MRS (magnetic resonance spectroscopy) HIV Neuroimaging Consortium (HIVNC) study based on the following inclusion criteria: HIV-positive, age ≥ 18 years, duration of HAART > 12 weeks, nadir CD4 count < 100 cells/ml during HIV history. Patients were considered to be on stable treatment (highly active antiretroviral therapies
Spatial overlap with AAM segmentation
As measured by the dice coefficient, FreeSurfer (FS) segmentations exhibited significantly higher (paired t-test, p < 0.001) mean spatial overlap in all structures (Fig. 1). This difference was most pronounced in the right amygdala (FS 0.740 ± 0.071; IBASPM 0.259 ± 0.114) and right hippocampus (FS 0.749 ± 0.069; IBASPM 0.374 ± 0.112). While the difference in dice coefficients was smallest in the right caudate (FS 0.813 ± 0.065; IBASPM 0.721 ± 0.128), the difference was nonetheless significant (p < 0.001).
The
Performance characteristics of FreeSurfer and IBASPM
Past validation studies examining automated segmentation methods have varied widely in the measures they have used. The analyses in this study were chosen in an attempt to apply the full range of metrics that have appeared in various combinations in prior publications. Moreover, each metric characterizes a slightly different aspect of segmentation performance and must be considered in relation to one another in order to adequately interpret the results of an analysis. For example, absolute
Acknowledgments
We greatly acknowledge the following HIV Neuroimaging Consortium sites for the data used in this study: Stanford University, University of California Los Angeles, UCLA Harbor, University of California San Diego, University of Colorado, University of Pittsburgh, University of Rochester. We also acknowledge the support of the following funding sources: R01 NS036524 and K23 MH073416.
References (64)
- et al.
Automated segmentation of multispectral brain MR images
J Neurosci Methods
(2002) - et al.
Multimodal image coregistration and partitioning—a unified framework
Neuroimage
(1997) - et al.
Unified segmentation
Neuroimage
(2005) - et al.
High-dimensional image registration using symmetric priors
Neuroimage
(1999) - et al.
Atlas-Based hippocampus segmentation in Alzheimer's disease and mild cognitive impairment
Neuroimage
(2005) - et al.
Whole brain segmentation: automated labeling of neuroanatomical structures in the human brain
Neuron
(2002) - et al.
Automatic anatomical brain MRI segmentation combining label propagation and decision fusion
Neuroimage
(2006) - et al.
An automated registration algorithm for measuring MRI subcortical brain structures
Neuroimage
(1997) - et al.
MRI-based hippocampal volumetrics: data acquisition, normal ranges, and optimal protocol
Magnetic Resonance Imaging
(1995) - et al.
Structural brain imaging in diabetes: a methodological perspective
European Journal of Pharmacology
(2008)
MRI-derived measurements of human subcortical, ventricular and intracranial brain volumes: reliability effects of scan sessions, acquisition sequences, data analyses, scanner upgrade, scanner vendors and field strengths
Neuroimage
A comparison of automated segmentation and manual tracing for quantifying hippocampal and amygdala volumes
Neuroimage
Stereotaxic white matter atlas based on diffusion tensor imaging in an ICBM template
Neuroimage
Relationships between cognition and structural neuroimaging findings in adults with human immunodeficiency virus type-1
Neurosci Biobehav Rev
Registration and machine learning-based automated segmentation of subcortical and cerebellar brain structures
Neuroimage
MRI volume loss of subcortical structures in unilateral temporal lobe epilepsy
Epilepsy Behav
Construction of a 3D probabilistic atlas of human cortical structures
Neuroimage
Boundary and medial shape analysis of the hippocampus in schizophrenia
Medical Image Analysis
3D mapping of ventricular and corpus callosum abnormalities in HIV/AIDS
Neuroimage
Automated anatomical labeling of activations in IBASPM using a macroscopic anatomical parcellation of the MNI MRI single-subject brain
Neuroimage
A framework for evaluating image segmentation algorithms
Comput Med Imaging Graph
Effects of age on volumes of cortex, white matter and subcortical structures
Neurobiol Aging
Comparative assessment of statistical brain MR image segmentation algorithms and their impact on partial volume correction in PET
Neuroimage
A survey on evaluation methods for image segmentation
Pattern Recognition
IBASPM: toolbox for automatic parcellation of brain structures
Correlation of in vivo neuroimaging abnormalities with postmortem human immunodeficiency virus encephalitis and dendritic loss
Arch Neurol
Estimating uncertainty in brain region delineations
Information Processing in Medical Imaging. In Lecture Notes in Computer Science
Brain morphology in older African Americans, Caribbean Hispanics, and whites from northern Manhattan
Archives of Neurology
Cerebral ventricular changes associated with transitions between normal cognitive function, mild cognitive impairment, and dementia
Alzheimer Disease and Associated Disorders
In vivo hippocampal measurement and memory: a comparison of manual tracing and automated segmentation in a large community-based sample
PLoS ONE
Effect of patient population specific atlases on automatic segmentation of subcortical structures in freesurfer [Abstract]
Hippocampal morphometry in schizophrenia by high dimensional brain mapping
Proc Natl Acad Sci USA
Cited by (112)
Altered white matter integrity in the corpus callosum in adults with HIV: a systematic review of diffusion tensor imaging studies
2022, Psychiatry Research - NeuroimagingValidity of automated FreeSurfer segmentation compared to manual tracing in detecting prenatal alcohol exposure-related subcortical and corpus callosal alterations in 9- to 11-year-old children
2020, NeuroImage: ClinicalCitation Excerpt :While FreeSurfer has been shown to perform reasonably well in this regard in patients with Alzheimer’s Disease, demonstrating volume reductions (Lehmann et al., 2010; Shen et al., 2010) and hippocampal atrophy rates (Mulder et al., 2014) similar to manual segmentation, automated methods have been less successful in distinguishing between groups or identifying associations with behavioural/clinical outcomes in other pathologies. For example, in patients with HIV, the association of caudate, putamen, amygdala, and hippocampal volumes with clinical measures of disease progression differed for outputs generated by FreeSurfer, IBASPM (Individual Brain Atlases using Statistical Parametric Mapping) and auto-assisted manual tracings (Dewey et al., 2010). Depression-related hippocampal volume reductions were detected with FreeSurfer but not FSL-FIRST (Morey et al., 2009), and in former National Football League (NFL) players with neurobehavioral symptoms, automated FreeSurfer segmentation identified group differences relative to age-matched controls in 4 of 11 regions, compared to 8 of 11 with manual correction, as well as different regions showing associations with neurobehavioral factors (Guenette et al., 2018).
Automated MRI volumetry as a diagnostic tool for Alzheimer's disease: Validation of icobrain dm
2020, NeuroImage: Clinical