This work is proposed for online surgical phase recognition task, and the full code is not available now because of the patent and review process. But the extracted features and trained spatial feature extractor weight on Cholec80 and AutoLaparo are available now.
The core code of getting transition map is also available.
The trained temporally-rich spatial feature extractor weight and the extracted features on Cholec80 and AutoLapro are available at: OneDrive Link.
def extracted_spatial_feature(video, TIMM):
'''
video shape: B, C, len, h, w #Note that the features are extracted separately
'''
feats = TIMM(video_gpu) #B, 768, len, 1, 1
TIMM = torch.load('Trained_VIT_Cholec80.pth') # Should have the `models' folder that saves the class, timm version: 0.4.12
with open(f"DATA/Cholec80/{video_indx}.pkl", 'rb') as f:
feature= torch.tensor(pickle.load(f)) # 768, len, 1, 1
@article{liu2023lovit,
title={LoViT: Long Video Transformer for Surgical Phase Recognition},
author={Liu, Yang and Boels, Maxence and Garcia-Peraza-Herrera, Luis C and Vercauteren, Tom and Dasgupta, Prokar and Granados, Alejandro and Ourselin, Sebastien},
journal={arXiv preprint arXiv:2305.08989},
year={2023}
}