Testing Endo-FM trained model on a new dataset [Classification]

Hi all for this fantastic work!

I wanted to ask what would be the best way to apply the pretrained model on a new classification dataset?
https://github.com/med-air/Endo-FM/blob/main/scripts/test_finetune_polypdiag.sh

I see the script here, but am not sure if this would work on 1 video/ 1 image.

Is it possible to do this in a python way and not via command line?

Thanks in advance :)

Hi all for this fantastic work!

I wanted to ask what would be the best way to apply the pretrained model on a new classification dataset? https://github.com/med-air/Endo-FM/blob/main/scripts/test_finetune_polypdiag.sh

I see the script here, but am not sure if this would work on 1 video/ 1 image.

Is it possible to do this in a python way and not via command line?

Thanks in advance :)

Hi, This model giving metrices as an output and it will not store any predicted images in a output dir then how you will check using video???if it is possible give the code for it..it will be helpful for us too

Hi, @ShreyasFadnavis
Thanks for your interest!

I am supposing that you want to fine-tune the pretrained Endo-FM model on your own dataset? You can refer to this script for fine-tuning: https://github.com/med-air/Endo-FM/blob/main/scripts/eval_finetune_polypdiag.sh.
You need to do the following steps:

specify the data path in DATA_PATH, and put the videos under the folder ${DATA_PATH}/videos
split the data for training and testing and store the spliting files in ${DATA_PATH}/splits, namely train.txt and val.txt.
change the num_labels to the number of classes for your task

Moreover, if you want to apply Endo-FM for image tasks, you can unsqueeze the image input to make it as a 1-frame video.

For only 1 video/image inference, unfortunately it is currently not supported here, maybe the easiest way to do this is to use a sample list with only one sample and run the code via https://github.com/med-air/Endo-FM/blob/main/scripts/test_finetune_polypdiag.sh

If you want to do inference on only 1 video/image, you can just change the dataset loader, for example, just load the specified video/image as the only sample in the dataset. Hope to be helpful to you~

Hi @Kyfafyd - this is very helpful! Lastly, is there a way to get frame level embeddings out of EndoFM? Something like following since you build on top of DINO:

Image -> EndoFM -> Embedding

Let me know if this is not clear?

Thanks in advance😊

Hi @ShreyasFadnavis
It is clear, you may obtain the frame-level embeddings from Endo-FM after this line:

Endo-FM/models/timesformer.py

Line 339 in c0979d2

x = blk(x, B, T, W)

with the code:

x = rearrange(x, '(b t) n m -> b t n m', b=B, t=T)

Thanks @Kyfafyd ! Closing this issue for now :)

@Kyfafyd Quick question: Is it possible to provide sample .txt files that the model expects for train, val and test? I am confused about where the ground truth labels will be provided corresponding to each video.

Thanks!

Hi, you can refer to this txt file:
https://mycuhk-my.sharepoint.com/:t:/g/personal/1155167044_link_cuhk_edu_hk/EXvfI1xbf2FAguh3t7pXr5IBKw98D3L9ZBMmFvKQ5A4x2w?e=abcEqp
with a format as path,label

Thanks @Kyfafyd !