CSCI 544 - NLP Project for Video Description and Semantic Retrieval via Key Frame Stitching
- Python 3.12
- Pytorch 2.5.1
- Numpy 1.26.4
- Transformers 4.46.3
- OpenCV 4.10.0
- Katna 0.9.2
- Supervision
- Pytubefix
- VideoXum dataset can be downloaded from Hugging Face here
- Train and test VideoXum files are located under
./Dataset
- Video Captioning Pipeline: dense_video_description_pipeline.ipynb
- Video Captioning baseline: baseline.ipynb