[MICCAI2024] This repo holds the official code for work "Perspective+ Unet: Enhancing Segmentation with Bi-Path Fusion and Efficient Non-Local Attention for Superior Receptive Fields"
Python
Perspective+ Unet: Enhancing Segmentation with Bi-Path Fusion and Efficient Non-Local Attention for Superior Receptive Fieldsπ
Abstract: Precise segmentation of medical images is fundamental for extracting critical clinical information, which plays a pivotal role
in enhancing the accuracy of diagnoses, formulating effective treatment plans, and improving patient outcomes. Although Convolutional Neural
Networks (CNNs) and non-local attention methods have achieved notable success in medical image segmentation, they either struggle to capture
long-range spatial dependencies due to their reliance on local features, or face significant computational and feature integration challenges
when attempting to address this issue with global attention mechanisms. To overcome existing limitations in medical image segmentation, we propose
a novel architecture, Perspective+ Unet. This framework is characterized by three major innovations: (i) It introduces a dual-pathway strategy
at the encoder stage that combines the outcomes of traditional and dilated convolutions. This not only maintains the local receptive field but
also significantly expands it, enabling better comprehension of the global structure of images while retaining detail sensitivity. (ii) The framework
incorporates an efficient non-local transformer block, named ENLTB, which utilizes kernel function approximation for effective long-range dependency
capture with linear computational and spatial complexity. (iii) A Spatial Cross-Scale Integrator strategy is employed to merge global dependencies
and local contextual cues across model stages, meticulously refining features from various levels to harmonize global and local information.
Experimental results on the ACDC and Synapse datasets demonstrate the effectiveness of our proposed Perspective+ Unet.
1. Dependencies and Installation
Clone this repo:
https://github.com/tljxyys/Perspective-Unet.git
cd Perspective-Unet
The Synapse dataset we used are provided by TransUnet's authors. . If you would like to use the preprocessed data, please use it for research purposes and do not redistribute it (following the TransUnet's License). The ACDC dataset can be obtained from .
I am not the owner of these two preprocessed datasets. Please follow the instructions and regulations set by the official releaser of these two datasets. The directory structure of the whole project is as follows:
Download the pretrained model for inference. [Get pretrained model in this link] . Please save the .pth file in the ./model_output_Synapse or ./model_output_ACDC.
Segmentation accuracy of different methods on the Synapse multi-organ CT
dataset. The best results are shown in bold.
Methods
DSCβ¬οΈ
HDβ¬οΈ
Aorta
Gallbladder
Kidney(L)
Kidney(R)
Liver
Pancreas
Spleen
Stomach
U-Net
76.85
39.70
89.07
69.72
77.77
68.60
93.43
53.98
86.67
75.58
R50 Att-UNet
75.57
36.97
55.92
63.91
79.20
72.71
93.56
49.37
87.19
74.95
Att-UNet
77.77
36.02
89.55
68.88
77.98
71.11
93.57
58.04
87.30
75.75
R50 ViT
71.29
32.87
73.73
55.13
75.80
72.20
91.51
45.99
81.99
73.95
TransUnet
77.48
31.69
87.23
63.13
81.87
77.02
94.08
55.86
85.08
75.62
SwinUNet
79.12
21.55
85.47
66.53
83.28
79.61
94.29
56.58
90.66
76.60
AFTer-UNet
81.02
-
90.91
64.81
87.90
85.30
92.20
63.54
90.99
72.48
ScaleFormer
82.86
16.81
88.73
74.97
86.36
83.31
95.12
64.85
89.40
80.14
MISSFormer
81.96
18.20
86.99
68.65
85.21
82.00
94.41
65.67
91.92
80.81
FCT
83.53
-
89.85
72.73
88.45
86.60
95.62
66.25
89.77
79.42
MSAANet
82.85
18.54
89.40
73.20
84.31
78.53
95.10
68.85
91.60
81.78
Perspective+ (Ours)
84.63
11.74
89.38
70.80
87.57
85.78
95.30
70.71
94.41
83.06
Segmentation accuracy of different methods on the ACDC dataset. The best
results are shown in bold.
Methods
DSCβ¬οΈ
RV
Myo
LV
R50 U-Net
87.55
87.10
80.63
94.92
R50 Att-UNet
86.75
87.58
79.20
93.47
R50 ViT
87.57
86.07
81.88
94.75
TransUNet
89.71
88.86
84.53
95.73
SwinUNet
90.00
88.55
85.62
95.83
ScaleFormer
90.17
87.33
88.16
95.04
UNETR
88.61
85.29
86.52
94.02
MCTE
91.31
89.14
89.51
95.27
MISSFormer
91.19
89.85
88.38
95.34
nnFormer
92.06
90.94
89.58
95.65
Perspective+ (Ours)
92.54
90.92
90.49
96.20
Ablation study on the impact of modules. BPRB: Bi-Path Residual Block, ENLTB: Efficient Non-Local Transformer Block, SCSI: Spatial Cross-Scale Integrator.
Model
BPRS
SCSI
ENLTB
DSCβ¬οΈ
HDβ¬οΈ
Setting 1
βοΈ
βοΈ
βοΈ
84.04
16.63
Setting 2
βοΈ
βοΈ
βοΈ
84.55
13.60
Setting 3
βοΈ
βοΈ
βοΈ
83.79
21.71
Setting 4
βοΈ
βοΈ
βοΈ
84.35
17.88
Setting 5
βοΈ
βοΈ
βοΈ
83.36
14.70
Setting 6
βοΈ
βοΈ
βοΈ
85.07
14.60
Setting 7
βοΈ
βοΈ
βοΈ
83.92
13.94
Setting 8
βοΈ
βοΈ
βοΈ
84.63
11.74
Visualized segmentation results of different methods on the Synapse multi-organ CT dataset. Our method (the last column) exhibits the smoothest boundaries
and the most accurate segmentation outcomes.
Visualization of attention heat maps from the intermediate layers of the network. Highlighting areas are closely aligned with segmentation labels, demonstrating
our Perspective+ Unetβs accuracy in feature identification and localization.
@misc{hu2024perspectiveunetenhancingsegmentation,
title={Perspective+ Unet: Enhancing Segmentation with Bi-Path Fusion and Efficient Non-Local Attention for Superior Receptive Fields},
author={Jintong Hu and Siyan Chen and Zhiyi Pan and Sen Zeng and Wenming Yang},
year={2024},
eprint={2406.14052},
archivePrefix={arXiv},
primaryClass={eess.IV},
url={https://arxiv.org/abs/2406.14052},
}