Simple! I will break this into 5 steps:
- We use Video summarization techniques to extract the short summary that is descriptive of the video.
- We then extract the keyframes using histogram analysis
- We then generate the image captions for each keyframe.
- We then remove the stop words, to get the final keywords that are descriptive of the video that can be used for indexing purposes!
- Video summarization using DSNet - Zhu, Wencheng, et al. "Dsnet: A flexible detect-to-summarize network for video summarization." IEEE Transactions on Image Processing 30 (2020): 948-962.
- Image captioning using ExpansionNetV2 - Hu, Jia Cheng, Roberto Cavicchioli, and Alessandro Capotondi. "ExpansionNet v2: Block Static Expansion in fast end to end training for Image Captioning." arXiv preprint arXiv:2208.06551 (2022). ( Can also be swapped for ClipClap - Mokady, Ron, Amir Hertz, and Amit H. Bermano. "Clipcap: Clip prefix for image captioning." arXiv preprint arXiv:2111.09734 (2021).)
\nodeserver
consists of the all the backend
\nodeserver\pythonscripts\DSNet
consists of the DSNet model
\nodeserver\pythonscripts\ExpansionNet
consists of ExpansionNetV2 model
\nodeserver\pythonscripts\imagecaption
consists of CLIPCLAP model.
In order to switch between ExpansionNet or CLIPCLAP for Image captioning, modify this line.
- Download
rf_model.pth
from here and place it innodeserver\pythonscripts\ExpansionNet
- Download
model_weights.pt
from here and place it innodeserver\pythonscripts\imagecaption\model
npm install
to install required modulespip install -r requirements.txt
to install python modulesnpm start
to start the electron app!
The keywords for video will be written to nodeserver\pythonscripts\DSNet\outputs\captions.txt
For detailed evaluations, please refer to comparisons.ipynb. Evaluations include:
- Performance in seconds
- Bleu score
- ROUGE-1 Precision
- ROUGE-1 Recall
- ROUGE-L Precision
- ROUGE-L Recall
- Search engine Recall score