/SwinMLP_TranCAP

Primary LanguagePythonMIT LicenseMIT

SwinMLP-TranCAP: End-to-End Window-Based MLP Transformer Using Patches

This repository contains the reference code for the paper "Rethinking Surgical Captioning: End-to-End Window-Based MLP Transformer Using Patches (MICCAI 2022)"

If you find this repo useful, please cite our paper.

SwinMLP-TranCAP

window-based model using patches

result

Environment setup

  • Python 3
  • PyTorch 1.3+ (along with torchvision)
  • cider (already been added as a submodule)
  • coco-caption (already been added as a submodule) (Remember to follow initialization steps in coco-caption/README.md)
  • yacs
  • lmdbdict

If you have difficulty running the training scripts in tools. You can try installing this repo as a python package:

python -m pip install -e .

Data preparation

  • DAISI Dataset

Since we are not allowed to release the dataset, please require dataset access from the DAISI Dataset Creator. The AI-Medic: an artificial intelligent mentor for trauma surgery. It is worth highlighting that we use the cleaned DAISI Dataset from the following work: Surgical Instruction Generation with Transformers

  • EndooVision18 Dataset

Please download images from endovissub2018-roboticscenesegmentation Please download the caption annotation from the CIDACaptioning.

Data preprocess

Please follow ImageCaptioning/data/README to implement the data preprocess.

Training procedure

Our code is build on top of ImageCaptioning. We add our model (Swin_TranCAP, SwinMLP_TranCAP, Video_Swin_TranCAP, and Video_SwinMLP_TranCAP) into their captioning/models/, and also add the related dataloader file.

Our training config files can be found in configs folder.

Please run

$ python tools/train_vision_transformer.py --cfg configs/daisi/transformer/SwinMLP_TranCAP_L.yml --id daisi_SwinMLP_TranCAP

Similary, you can run other models by using our provided configs files.

Acknowledgements

We thank the following repos providing helpful components/functions in our work. neuraltalk2, ImageCaptioning