Purpose This study compares 3 deep learning models (UNet, TransUNet, and MIST) for left atrium (LA) segmentation of cardiac computed tomography (CT) images from patients with congenital heart disease (CHD). It investigates how architectural variations in the MIST model, such as spatial squeeze-and-excitation attention, impact Dice score and HD95.
Methods We analyzed 108 publicly available, de-identified CT volumes from the ImageCHD dataset. Volumes underwent resampling, intensity normalization, and data augmentation. UNet, TransUNet, and MIST models were trained using 80% of 97 cases, with the remaining 20% employed for validation. Eleven cases were reserved for testing. Performance was evaluated using the Dice score (measuring overlap accuracy) and HD95 (reflecting boundary accuracy). Statistical comparisons were performed via one-way repeated measures analysis of variance.
Results MIST achieved the highest mean Dice score (0.74; 95% confidence interval, 0.67–0.81), significantly outperforming TransUNet (0.53; P<0.001) and UNet (0.49; P<0.001). Regarding HD95, TransUNet (9.09 mm) and MIST (5.77 mm) similarly outperformed UNet (27.49 mm; P<0.0001). In ablation experiments, the inclusion of spatial attention did not further enhance the MIST model’s performance, suggesting redundancy with existing attention mechanisms. However, the integration of multi-scale features and refined skip connections consistently improved segmentation accuracy and boundary delineation.
Conclusion MIST demonstrated superior LA segmentation, highlighting the benefits of its integrated multi-scale features and optimized architecture. Nevertheless, its computational overhead complicates practical clinical deployment. Our findings underscore the value of advanced hybrid models in cardiac imaging, providing improved reliability for CHD evaluation. Future studies should balance segmentation accuracy with feasible clinical implementation.