We are excited to announce that our paper was accepted for publication at IEEE TMI 2024! ๐ฅณ๐ฅณ๐ฅณ
This repository contains the implementation of our paper. You can access the paper here.
This project introduces a new setting in surgical image segmentation, termed Referring Surgical Video Instrument Segmentation (RSVIS). RSVIS aims to automatically identify and segment the target surgical instruments from each video frame, referred by a given language expression, in a more natural and flexible way of human-computer interaction.
Fig. 1. Comparison of (a) existing instrument segmentation task and (b) our referring surgical video instrument segmentation (RSVIS).
conda create --name RSVIS --file requirements.txt
conda activate RSVIS
python main.py -rm train -c configs/RS17.yaml -ws 3 -bs 5 -gids 1
-rm means the running model, -ws means window size, -bs means the training batch size per GPU, -gids means the GPU id. For the folders 'pretrained_swin_transformer' (swin_tiny_patch244_window877_kinetics400_1k.pth) and 'roberta-base' (pytorch_model.bin), two pretrained weights need to be placed inside respectively. you can download in Google Drive.
For object detection, please refer to YoloV5, DETR, DINO. We also referenced parts of the MTTR code (an excellent project), and we acknowledge the contribution of the above projects. Since the code is from a long time ago and we have tried many variations, I uploaded a preliminary version first and will sort it out later with more accuracy.
I acknowledge that from my perspective, the work isn't perfect and there's room for improvement. To satisfy the demands of the major revision period, the content of the paper has also become longer and more tedious. However, work is only part of our life and everyone needs to eat, I've done my best with open-source code and dataโlet's show a little patience so we can all thrive together.
Revisiting data and code from long ago isn't a walk in the park ๐ด (the paper takes months to publish). Got questions? Just ping meโlet's make improvements, no gripes, skip the scolding, please! ๐ซก ๐ฎ: hongqiuwang16@gmail.com (Wechat: whqqq7).
The datasets have been organized!
Please contact Hongqiu (hongqiuwang16@gmail.com) for the dataset. One step is needed to download the dataset: **1) Use your Google email to apply for the download permission (Goole Driven BaiduPan). We will get back to you within three days, so please don't send them multiple times. We just handle the real-name email and your email suffix must match your affiliation. The email should contain the following information:
Name/Homepage/Google Scholar: (Tell us who you are.)
Primary Affiliation: (The name of your institution or university, etc.)
Job Title: (E.g., Professor, Associate Professor, Ph.D., etc.)
Affiliation Email: (the password will be sent to this email, we just reply to the email which is the end of "edu".)
How to use: (Only for academic research, not for commercial use or second-development.)
The data set is stored as follows:
RSVIS/
โโโ EndoVis-RS18/
โโโ train/
โ โโโ JPEGImages/
โ โ โโโ */ (video folders)
โ โ โโโ *.png (frame image files)
โ โโโ Annotations/
โ โโโ */ (video folders)
โ โโโ *.png (mask annotation files)
โโโ valid/
โ โโโ JPEGImages/
โ โ โโโ */ (video folders)
โ โ โโโ *.png (frame image files)
โ โโโ Annotations/
โ โโโ */ (video folders)
โ โโโ *.png (mask annotation files)
โโโ meta_expressions/
โโโ train/
โ โโโ meta_expressions.json (text annotations)
โโโ valid/
โโโ meta_expressions.json (text annotations)
We build our RSVIS dataset based on previous works. We acknowledge with gratitude the organizers of the previous two challenge competitions. To access the raw surgical video data, please see EndoVis2017 and EndoVis2018. If you utilize these data, please remember to cite their respective papers.
If you find our work useful or relevant to your research, please consider citing:
@article{wang2024video,
title={Video-instrument synergistic network for referring video instrument segmentation in robotic surgery},
author={Wang, Hongqiu and Yang, Guang and Zhang, Shichen and Qin, Jing and Guo, Yike and Xu, Bo and Jin, Yueming and Zhu, Lei},
journal={IEEE Transactions on Medical Imaging},
year={2024},
publisher={IEEE}
}