Implications of Dataset Heterogeneity on Deep Learning Performance in Medical Image Segmentation
- This thesis is about medical image segmentation using deep learning, with a particular focus on the influence of the training data. The performance of deep learning algorithms is impacted by the training set quality and heterogeneity, here grouped into three categories: technical image quality, reference segmentations and study populations.
Different training strategies are compared for parotid gland segmentation in CT data. All yield robust segmentation results, also in the presence of artifacts, and outperform non-deep learning methods on public data. Typical errors coincide with regions of high inter-observer variability. Training on contours from clinical routine, and on curated contours yield similar accuracy and results.
Bladder, rectum and uterus are segmented in cone-beam CT data, that is noisier and less well-calibrated than CT data. Using CT data for augmenting the anatomical variability is proposed and found to improve the performance. Prior knowledge about the presence of typical artifacts is integrated into the data sampling. Curriculum learning seems promising to increase the robustness to the particular artifact.
The hippocampus is segmented in CT data. A CT-only approach for generating the training contours could facilitate the data collection, but it is found that MRI-based training contours yield significantly higher performance and lower uncertainty.
White matter hyperintensity lesions are an imaging biomarker linked to stroke and cognitive decline. It is shown that a single neural network can segment these lesions in heterogeneous MRI data with varying image quality and lesion loads, and for a wide range of training set compositions, generated by pooling and systematic sampling. A challenge is the co-occurrence of stroke lesions. An approach that uses stroke segmentations for guiding the sampling, but not for optimizing the training loss, is proposed and found to outperform other sampling approaches with respect to false positive detections.