Esperlight
This repo contains simple scripts to get up and running on a new video dataset. The idea here is to provide a minimal set of functionality that depends only on our basic Esper-ecosystem tools: vgrid, rekall, etc. and to exchange data between these tools in simple formats like json.
Generating video metadata
The first step in getting a video dataset ready for manipulation is to produce a list of videos in the collection. This list will contain metadata about videos that is needed for visualization purposes. The Esperlight script create_video_metadata.py
generates this list from a directory of videos. For example, if you have a directory of .mp4
videos named VIDEO_SOURCE_DIR
, the following command line generates a json file that contains basic information about all the videos (height, width, fps, duration, etc.)
python create_video_metadata.py VIDEO_SOURCE_DIR myvideolist.json --suffix mp4
At the moment, create_video_metadata.py
does not support recursive search of the source directory tree. This would be a reasonable addition in the future since I expect that many datasets we download from the internet will have subdirectory structure.
As an example, running create_video_metadata.py
on the directory of videos containing the kayvon10 dataset, yields the video metadata file you see here: https://olimar.stanford.edu/hdd/kayvon10/kayvon10.json
Running mask-rcnn on the videos
// TODO
Generating bbox embeddings from mask-rcnn output
The next step is to use the mask-rcnn dump to generate the bounding box and the embeddings information Esperlight expects. create_embeddings.py
takes in video, mask-rcnn output and thresholding parameters for the bounding boxes to be considered. The thresholding parameters that you can control are bounding box width --minx
, height --miny
and score --score
. The script depends on Tensorflow's slim
package and the pretrained ResNet
model. Please download them and set the appropriate path in tensorflow_config.ini
. You can also pass GPU id to create_embeddings.py
to run on a specific GPU (defaults to 0)
python create_embeddings.py video maskrcnn_dump -x 8 -y 8 -s 0.2 -g 2
Starting a knn server
The embeddings generated in the previous step are useful in finding similarity across bounding boxes. With these embeddings now generated, you can run a knn search to find other boxes most similar to a query bounding box.
To achieve this, we start up a server and run different queries against it. start_knn_server.py
achieves this functionality. It depends on the Flask framework and is backed by Faiss for efficient similarity search. Please set the appropriate path of the directory containing videos and embeddings, and also set other parameters in knn_server_config.ini
.
Say you have the server running on ip_address:port
, the server now has two functions you can access.
- Requesting
ip_address:port/status
gets you the status of the server along with the list of videos and parameters it is currently using. - Requesting
ip_address:port/knn
with a video name (video_name
), bounding box id (bbox_id
) and number of nearest neighbors (k
) returns a list ofk
video name and bounding box id tuples (video_name
,bbox_id
) closest to the query. A sample client is available hereexamples/run_knn_client.py
.
You can start the server the following command:
export FLASK_APP=start_knn_server.py
flask run --host=0.0.0.0 --port=8321
Visualizing a collection of videos
Now let's take a look at all the videos. Take a look at the notebook view_collection.ipynb
for a demonstration of how to read the video metafile file, and use the Vgrid widget to display the videos.
Note that running this notebook depends on rekall, vgrid, and vgrid_jupyter.