Fig. 1. Sample consecutive video frames from the model output using pseudo ground truth VR as input in columns 1-4 and real videos VA as input in columns 5-8.
CycleSTTN: A Learning-Based Temporal Model for Specular Augmentation in Endoscopy
Rema Daher, O. León Barbed, Ana C. Murillo, Francisco Vasconcelos, and Danail Stoyanov
The 26th International Conference on Medical Image Computing and Computer Assisted Intervention, MICCAI 2023.
If any part of our paper and repository is helpful to your work, please generously cite with:
@inproceedings{daher2023cyclesttn,
title={CycleSTTN: A Learning-Based Temporal Model for Specular Augmentation in Endoscopy},
author={Daher, Rema and Barbed, O Le{\'o}n and Murillo, Ana C and Vasconcelos, Francisco and Stoyanov, Danail},
booktitle={International Conference on Medical Image Computing and Computer-Assisted Intervention},
pages={570--580},
year={2023},
organization={Springer}
}
Since this code is based on STTN and Endo-STTN, please also cite:
@inproceedings{yan2020sttn,
author = {Zeng, Yanhong and Fu, Jianlong and Chao, Hongyang},
title = {Learning Joint Spatial-Temporal Transformations for Video Inpainting},
booktitle = {The Proceedings of the European Conference on Computer Vision (ECCV)},
year = {2020}
}
@article{daher2023temporal,
title={A Temporal Learning Approach to Inpainting Endoscopic Specularities and Its Effect on Image Correspondence},
author={Daher, Rema and Vasconcelos, Francisco and Stoyanov, Danail},
journal={Medical Image Analysis},
year={2023}
}
- We propose the CycleSTTN training pipeline as an extension of STTN to a cyclic structure.
- We use CycleSTTN to train a model for synthetic generation of temporally consistent and realistic specularities in endoscopy videos. We compare results of our method against CycleGAN.
- We demonstrate CycleSTTN as a data augmentation technique that improves the performance of SuperPoint feature detector in endoscopy videos.
Fig. 2. CycleSTTN training pipeline with 3 main steps: $\fbox{1}$ Paired Dataset Generation, $\fbox{2}$ $STTN_A$ Pre-training, and $\fbox{3}$ $STTN_R, STTN_A$ Joint Training.
git clone https://github.com/RemaDaher/CycleSTTN.git
cd CycleSTTN/
conda create --name sttn python=3.8.5
pip install -r requirements.txt
To install Pytorch, please refer to Pytorch. In our experiments we use the following installation for cuda 11.1:
pip install torch==1.10.1+cu111 torchvision==0.11.2+cu111 torchaudio==0.10.1 -f https://download.pytorch.org/whl/cu111/torch_stable.html
Navigate to ./dataset_prep/README.md for more details.
For your reference, we provide our pretrained models:
- $STTN_{R0}, D_{R0}$ named as STTN_removal_model.
- From Endo-STTN.
- $STTN_{A0}, D_{A0}$ named as STTN_addition_model.
- Tested with test.py
- $STTN_{A1}, D_{A1}$; $STTN_{R1}, D_{R1}$ named as CycleSTTN_model.
- Tested with test-cycle.py
Download and unzip them in ./release_model/
- Arguments that can be set with test.py and test-cycle.py:
- --overlaid: used to overlay the original frame pixels outside the mask region on your output.
- --shifted: used to inpaint using a shifted mask.
- --framelimit: used to set the maximum number of frames per video (Default = 927).
- --Dil: used to set the size of the structuring element used for dilation (Default = 8). If set to 0, no dilation will be made.
- --nomask: used to only take image without mask as input.
-
To test using test.py or test-cycle.py on all the test videos in your dataset, listed in your test.json:
python <<test.py or test-cycle.py>> --gpu <<INSERT GPU INDEX>> --nomask \ --output <<INSERT OUTPUT DIR>> \ --frame <<INSERT FRAMES DIR>> \ --mask <<INSERT ANNOTATIONS DIR>> \ -c <<INSERT PRETRAINED PARENT DIR>> \ -cn <<INSERT PRETRAINED MODEL NUMBER>> \ --zip
- For example, using our pretrained models:
python test.py --gpu 1 --nomask \ --output results/STTN_addition_model/ \ --frame datasets/EndoSTTN_dataset/JPEGImages/ \ --mask datasets/EndoSTTN_dataset/Annotations/ \ -c release_model/STTN_addition_model/ \ -cn 3 \ --zip
python test-cycle.py --gpu 1 --nomask \ --output results/CycleSTTN_model/ \ --frame datasets/EndoSTTN_dataset/JPEGImages/ \ --mask datasets/EndoSTTN_dataset/Annotations/ \ -c release_model/CycleSTTN_model/ \ -cn 2 \ --zip
NOTE: When running this script the loaded frames and masks are saved as npy files in datasets/EndoSTTN_dataset/files/so that loading them would be easier if you want to rerun this script. To load these npy files use the --readfiles argument. This is useful when experimenting with a large dataset.
- For example, using our pretrained models:
-
To test on 1 video:
python <<test.py or test-cycle.py>> --gpu <<INSERT GPU INDEX>> --nomask \ --output <<INSERT VIDEO OUTPUT DIR>> \ --frame <<INSERT VIDEO FRAMES DIR>> \ --mask <<INSERT VIDEO ANNOTATIONS DIR>> \ -c <<INSERT PRETRAINED PARENT DIR>> \ -cn <<INSERT PRETRAINED MODEL NUMBER>>
-
For example, for a folder "ExampleVideo1_Frames" containing the video frames, using our pretrained models:
python test.py --gpu 1 --nomask \ --output results/STTN_addition_model/ \ --frame datasets/ExampleVideo1_Frames/ \ --mask datasets/ExampleVideo1_Annotations/ \ -c release_model/STTN_addition_model/ \ -cn 3
python test-cycle.py --gpu 1 --nomask \ --output results/CycleSTTN_model/ \ --frame datasets/ExampleVideo1_Frames/ \ --mask datasets/ExampleVideo1_Annotations/ \ -c release_model/CycleSTTN_model/ \ -cn 2 ``
-
-
Single frame testing:
To test a single frame at a time and thus removing the temporal component, follow the same steps above but use test-singleframe.py instead of test.py and test-cycle-singleframe.py instead of test-cycle.py.
Once the dataset is ready, new models can be trained:
- Prepare the configuration file (ex: STTN_addition_model.json, CycleSTTN_model.json):
- "gpu": <INSERT GPU INDICES EX: "1,2">
- "data_root": <INSERT DATASET ROOT>
- "name": <INSERT NAME OF DATASET FOLDER>
- "frame_limit": used to set the maximum number of frames per video (Default = 927)
- "Dil": used to set the size of the structuring element used for dilation (Default = 8). If set to 0, no dilation will be made.
python <<train.py or train-cycle.py>> --model sttn \
--config <<INSERT CONFIG FILE DIR>> \
-c <<INSERT INITIALIZATION MODEL PARENT DIR>> \
-cn <<INSERT INITIALIZATION MODEL NUMBER>>
For train-cycle.py, in addition to the arguments (-c, -cn), we added (-cRem, cnRem) for the removal model and (-cAdd, -cnAdd) for the addition model. This was done for the case of separate initialization models for removal and addition.
- For example:
python train.py --model sttn \ --config configs/STTN_addition_model.json
python train-cycle.py --model sttn \ --config configs/CycleSTTN_model.json \ -cRem release_model/STTN_removal_model/ \ -cnRem 9 \ -cAdd release_model/STTN_addition_model/ \ -cnAdd 3
- MAKE SURE evaluation works and write section
To quantitatively evaluate results using the pseudo-ground truth:
- Test all videos using Testing Script (2.) with removed specularity frames as input (JPEGImagesNS folder instead of JPEGImages)
- Use quantifyResultsAddingSpecs.ipynb to generate csv files containing the quantitative results.
We provide training monitoring on losses by running:
tensorboard --logdir release_model
If you have any questions or suggestions about this paper, feel free to contact me (remadaher711@gmail.com).