Program to generate captions for the keyframes of a video, given a video file as input.
Usage steps for Ubuntu 16.04:
Downloads
- Download/clone this repository
- Download the file "model_checkpint.pth.tar" from https://drive.google.com/file/d/1OMnMuMuxEtKVmws2zNAlB3nhCVnwUTS4/view?usp=sharing
- Place the file "model_checkpint.pth.tar" in the repo directory
- model checkpoint source: https://github.com/sgrvinod/a-PyTorch-Tutorial-to-Image-Captioning
Ubuntu python setup
- sudo apt-get update
- sudo apt-get install -y build-essential tk-dev libncurses5-dev libncursesw5-dev libreadline6-dev libdb5.3-dev libgdbm-dev libsqlite3-dev libssl-dev libbz2-dev libexpat1-dev liblzma-dev zlib1g-dev libffi-dev tar wget vim
- cd /opt
- sudo wget https://www.python.org/ftp/python/3.8.5/Python-3.8.5.tgz
- sudo tar xzf Python-3.8.5.tgz
- cd Python-3.8.5
- sudo ./configure --enable-optimizations
- sudo make -j 4
- sudo make altinstall
- cd /opt
- sudo rm -f Python-3.8.5.tgz
Ubuntu setup
- Change Directory to vidcaption_aavan
- python3.8 -m pip install virtualenv
- virtualenv -p python3.8 venv-vidcap
- source venv-vidcap/bin/activate
Installing requirements
- pip install -r requirements.txt
OR
- pip install pandas
- pip install torch
- pip install opencv-python
- pip install click
- pip install torchvision
- pip install matplotlib
- pip install scikit-image
- pip install PyTube (for video downlaod)
- pip install Flask (for API)
- pip install pycocotools (for evaluation)
- pip install pycocoevalcap (for evaluation)
Videos can be downloaded using the PyTube based video downloading script. Videos downloaded using this method are automatically saved as mp4 to the "video_uploads" folder.
- Run from terminal using $ python download_video.py <youtube_URL> <name_to_save_as_without_extension>
- Example: python download_video.py https://www.youtube.com/watch?v=DocxmW2bOdc&t=80s singapore_dorm_cases
Video keyframe extraction supports most video formats including: mp4, ts, MOV, avi, y4m, mkv, flv, wmv.
Running without API:
- Activate venv: $ source venv-vidcap/bin/activate
- Put videos to caption in "video_uploads" folder
- Run from terminal using $ python caption_video.py <videofile_name>
- Example: python caption_video.py elsa.mp4
- To keep the video frames, run from terminal using $ python caption_video.py <videofile_name> keepframes
- Example: python caption_video.py elsa.mp4 keepframes
Running with API:
- Activate venv: $ source venv-vidcap/bin/activate
- Run from terminal using $ python api_start.py
- Go to http://127.0.0.1:5001/captionvideo on browser
- Browse disk for video file
- Click upload
- Uploaded videos will be saved to video_uploads directory
If not evaluating, eval directory can be removed.
To caption and evaluate against custom dataset in COCO format:
- pip install pycocoevalcap
- Move model checkpoint file and wordmap into "eval" directory (default: model_checkpint.pth.tar and wordmap.json)
- Put folder with images to caption into "eval" directory
- Change directory to "eval"
- Ensure caption file in COCO format(for images to be captioned) is in "eval" directory, eg: "DATASET_coco_captions.json"
- Run from terminal using $ python caption_and_eval.py <directory_name>
- Example: python caption_and_eval.py DATASET
- To keep output file of captioning, run from terminal using $ python caption_and_eval.py <directory_name> keepoutput
- Example: python caption_and_eval.py DATASET keepoutput
To evaluate against video captions in COCO format, using video captioning output(output from caption_video.py):
- Move the output file eg: covid.mp4-OUTPUT.json to eval directory
- Change directory to "eval"
- Ensure caption file in COCO format is in "eval" directory, eg: "covid.mp4-coco_captions.json"
- Run from terminal using $python eval_video_output.py <videofile_name>
- Example: python eval_video_output.py covid.mp4