/Perspective-Unet

[MICCAI2024] This repo holds the official code for work "Perspective+ Unet: Enhancing Segmentation with Bi-Path Fusion and Efficient Non-Local Attention for Superior Receptive Fields"

Primary LanguagePython

Perspective+ Unet: Enhancing Segmentation with Bi-Path Fusion and Efficient Non-Local Attention for Superior Receptive FieldsπŸš€

Abstract: Precise segmentation of medical images is fundamental for extracting critical clinical information, which plays a pivotal role in enhancing the accuracy of diagnoses, formulating effective treatment plans, and improving patient outcomes. Although Convolutional Neural Networks (CNNs) and non-local attention methods have achieved notable success in medical image segmentation, they either struggle to capture long-range spatial dependencies due to their reliance on local features, or face significant computational and feature integration challenges when attempting to address this issue with global attention mechanisms. To overcome existing limitations in medical image segmentation, we propose a novel architecture, Perspective+ Unet. This framework is characterized by three major innovations: (i) It introduces a dual-pathway strategy at the encoder stage that combines the outcomes of traditional and dilated convolutions. This not only maintains the local receptive field but also significantly expands it, enabling better comprehension of the global structure of images while retaining detail sensitivity. (ii) The framework incorporates an efficient non-local transformer block, named ENLTB, which utilizes kernel function approximation for effective long-range dependency capture with linear computational and spatial complexity. (iii) A Spatial Cross-Scale Integrator strategy is employed to merge global dependencies and local contextual cues across model stages, meticulously refining features from various levels to harmonize global and local information. Experimental results on the ACDC and Synapse datasets demonstrate the effectiveness of our proposed Perspective+ Unet.

image


1. Dependencies and Installation

  • Clone this repo:
https://github.com/tljxyys/Perspective-Unet.git
cd Perspective-Unet
  • Create a conda virtual environment and activate:
conda create -n perspective_unet python=3.7 -y
conda activate perspective_unet
  • Install packages:
pip install -r requirements.txt

2. Data Preparation

  • The Synapse dataset we used are provided by TransUnet's authors. . If you would like to use the preprocessed data, please use it for research purposes and do not redistribute it (following the TransUnet's License). The ACDC dataset can be obtained from .
  • I am not the owner of these two preprocessed datasets. Please follow the instructions and regulations set by the official releaser of these two datasets. The directory structure of the whole project is as follows:
.
β”œβ”€β”€ datasets
β”‚   └──
β”œβ”€β”€ lists
β”‚   └── 
β”œβ”€β”€ data
β”‚   β”œβ”€β”€ Synapse
β”‚   β”‚   β”œβ”€β”€ train_npz
β”‚   β”‚   β”‚   β”œβ”€β”€ case0005_slice000.npz
β”‚   β”‚   β”‚   └── *.npz
β”‚   β”‚   └── test_vol_h5
β”‚   β”‚       β”œβ”€β”€ case0001.npy.h5
β”‚   β”‚       └── *.npy.h5
β”‚   └── ACDC
β”‚       β”œβ”€β”€ train
β”‚       β”‚   β”œβ”€β”€ case_001_sliceED_0.npz
β”‚       β”‚   └── *.npz
β”‚       β”œβ”€β”€ test
β”‚       β”‚   β”œβ”€β”€ case_002_volume_ED.npz
β”‚       β”‚   └── *.npz
β”‚       └── train
β”‚           β”œβ”€β”€ case_019_sliceED_0.npz
β”‚           └── *.npz
β”œβ”€β”€ networks
β”‚   └── 
β”œβ”€β”€ train
β”œβ”€β”€ test
└── trainer

3. Training

  • Run the train script on synapse dataset. The batch size and epoch we used is 12 and 600, respectively.
python train.py --dataset Synapse --output_dir './model_output_Synapse' --max_epochs 600 --batch_size 12
  • Run the train script on ACDC dataset. The batch size and epoch we used is 12 and 1000, respectively.
python train.py --dataset Synapse --output_dir './model_output_ACDC' --max_epochs 1000 --batch_size 12

4. Testing

  • Download the pretrained model for inference. [Get pretrained model in this link] . Please save the .pth file in the ./model_output_Synapse or ./model_output_ACDC.
python test.py --dataset Synapse --is_saveni True --output_dir './model_output_Synapse' --max_epoch 600 --batch_size 12 --test_save_dir './model_output_Synapse/predictions'
python test.py --dataset ACDC --is_saveni True --output_dir './model_output_ACDC' --max_epoch 1000 --batch_size 12 --test_save_dir './model_output_ACDC/predictions'

5. Results

  • Segmentation accuracy of different methods on the Synapse multi-organ CT dataset. The best results are shown in bold.
Methods DSC⬆️ HD⬇️ Aorta Gallbladder Kidney(L) Kidney(R) Liver Pancreas Spleen Stomach
U-Net 76.85 39.70 89.07 69.72 77.77 68.60 93.43 53.98 86.67 75.58
R50 Att-UNet 75.57 36.97 55.92 63.91 79.20 72.71 93.56 49.37 87.19 74.95
Att-UNet 77.77 36.02 89.55 68.88 77.98 71.11 93.57 58.04 87.30 75.75
R50 ViT 71.29 32.87 73.73 55.13 75.80 72.20 91.51 45.99 81.99 73.95
TransUnet 77.48 31.69 87.23 63.13 81.87 77.02 94.08 55.86 85.08 75.62
SwinUNet 79.12 21.55 85.47 66.53 83.28 79.61 94.29 56.58 90.66 76.60
AFTer-UNet 81.02 - 90.91 64.81 87.90 85.30 92.20 63.54 90.99 72.48
ScaleFormer 82.86 16.81 88.73 74.97 86.36 83.31 95.12 64.85 89.40 80.14
MISSFormer 81.96 18.20 86.99 68.65 85.21 82.00 94.41 65.67 91.92 80.81
FCT 83.53 - 89.85 72.73 88.45 86.60 95.62 66.25 89.77 79.42
MSAANet 82.85 18.54 89.40 73.20 84.31 78.53 95.10 68.85 91.60 81.78
Perspective+ (Ours) 84.63 11.74 89.38 70.80 87.57 85.78 95.30 70.71 94.41 83.06
  • Segmentation accuracy of different methods on the ACDC dataset. The best results are shown in bold.
Methods DSC⬆️ RV Myo LV
R50 U-Net 87.55 87.10 80.63 94.92
R50 Att-UNet 86.75 87.58 79.20 93.47
R50 ViT 87.57 86.07 81.88 94.75
TransUNet 89.71 88.86 84.53 95.73
SwinUNet 90.00 88.55 85.62 95.83
ScaleFormer 90.17 87.33 88.16 95.04
UNETR 88.61 85.29 86.52 94.02
MCTE 91.31 89.14 89.51 95.27
MISSFormer 91.19 89.85 88.38 95.34
nnFormer 92.06 90.94 89.58 95.65
Perspective+ (Ours) 92.54 90.92 90.49 96.20
  • Ablation study on the impact of modules. BPRB: Bi-Path Residual Block, ENLTB: Efficient Non-Local Transformer Block, SCSI: Spatial Cross-Scale Integrator.
Model BPRS SCSI ENLTB DSC⬆️ HD⬇️
Setting 1 βœ–οΈ βœ–οΈ βœ–οΈ 84.04 16.63
Setting 2 βœ–οΈ βœ–οΈ βœ”οΈ 84.55 13.60
Setting 3 βœ–οΈ βœ”οΈ βœ–οΈ 83.79 21.71
Setting 4 βœ–οΈ βœ”οΈ βœ”οΈ 84.35 17.88
Setting 5 βœ”οΈ βœ–οΈ βœ–οΈ 83.36 14.70
Setting 6 βœ”οΈ βœ–οΈ βœ”οΈ 85.07 14.60
Setting 7 βœ”οΈ βœ”οΈ βœ–οΈ 83.92 13.94
Setting 8 βœ”οΈ βœ”οΈ βœ”οΈ 84.63 11.74
  • Visualized segmentation results of different methods on the Synapse multi-organ CT dataset. Our method (the last column) exhibits the smoothest boundaries and the most accurate segmentation outcomes.

image

  • Visualization of attention heat maps from the intermediate layers of the network. Highlighting areas are closely aligned with segmentation labels, demonstrating our Perspective+ Unet’s accuracy in feature identification and localization.

image

6. Reference

7. Bibtex

@misc{hu2024perspectiveunetenhancingsegmentation,
      title={Perspective+ Unet: Enhancing Segmentation with Bi-Path Fusion and Efficient Non-Local Attention for Superior Receptive Fields}, 
      author={Jintong Hu and Siyan Chen and Zhiyi Pan and Sen Zeng and Wenming Yang},
      year={2024},
      eprint={2406.14052},
      archivePrefix={arXiv},
      primaryClass={eess.IV},
      url={https://arxiv.org/abs/2406.14052}, 
}