/NLPVideoDescription

CSCI 544 - NLP Project for Video Description and Semantic Retrieval via Key Frame Stitching

Primary LanguageJupyter Notebook

NLPVideoDescription

CSCI 544 - NLP Project for Video Description and Semantic Retrieval via Key Frame Stitching

Requirements:

  • Python 3.12
  • Pytorch 2.5.1
  • Numpy 1.26.4
  • Transformers 4.46.3
  • OpenCV 4.10.0
  • Katna 0.9.2
  • Supervision
  • Pytubefix

Dataset

  • VideoXum dataset can be downloaded from Hugging Face here
  • Train and test VideoXum files are located under ./Dataset

Code

  • Video Captioning Pipeline: dense_video_description_pipeline.ipynb
  • Video Captioning baseline: baseline.ipynb