These are the results of the experiments with the cityscape dataset.
The experiments were conducted with the frankfurt part of the cityscape dataset.
- The Frankfurt dataset contains around 80,000 frames, consisting of 3 stages (which we will refer to as long stages) This part of the dataset was used mainly for compression experiments, as ground truth is not available for ll the frames.
- A subset of the Frankfurt dataset: 50 sequences of 30 frames each, was used for majority of the experiments. Every 20th frame in the 30 frame have ground truth pixel-level object segmentation information (both object type and class information).
Sample (frame):
Sample ground truth:
- Compression vs distortion experient
- Impact of denoising
- Optical Flow experiments
- Object Detection/Segmentation experiment
The raw results are in the logs folder. While all the scripts are in the main folder.
- logs: contains logs
- create_videos.sh: Script for creating videos from consecutive frames
- retrieve_frames.sh: Retrieve the 20th frame (and 19th frame) for experiments and comparison with ground truth
- run_obj_detect.sh: Run object detection experiment which compares the result with the GT. We use the the LRR (Laplacian Pyramid Reconstruction and Refinement for Semantic Segmentation) algorithm, one of the best amongst the benchmarks on the Cityscape datasets.
- run_opt_flow.sh: Run the Farneback optical flow algorithm on a specific codec and crf values mentioned in the config.ini file
- run_opt_flow_deepFlow2.sh: Run the DeepFLow2 optical flow algorithm
These results are for 30 frame snippets (average over 50 snippets) The original dataset is in the form of png images and has average size of 70MB.
CRF | x264 | x265 | vp9 |
---|---|---|---|
crf0 | 44M | 53M | 48M |
crf2 | 31M | 20M | 20M |
crf4 | 24M | 16M | 12M |
crf6 | 19M | 12M | 8.5M |
crf8 | 16M | 9M | 6.2M |
crf16 | 5M | 2.5M | 3.1M |
crf24 | 1.5M | 0.8M | 2.1M |
The speeds of compression are as follows:
Speed | x264 | x265 | vp9 |
---|---|---|---|
in fps | 5.5 | 1.3 | 0.2 |
These are the commands which can be used to create compressed videos. (I used slightly different commands, as videos were being created from individual frames, but very similar)
- Lossless Compression: Lossless compression commands (lossless in the YUV space) are slightly different for different codecs
## Lossless compression
INPUT=/path/to/input
OUTPUT=/path/to/output
#x264
ffmpeg -i $INPUT -c:v libx264 -crf 0 -pix_fmt yuv444p $OUTPUT
#x265
ffmpeg -i $INPUT -c:v libx265 -x265-params lossless=1 -pix_fmt yuv444p $OUTPUT
#vp9
ffmpeg -i $INPUT -c:v libvpx-vp9 -qmin 0 -qmax 0 -lossless 1 -pix_fmt yuv444p $OUTPUT
- Lossy Compression: The CRF level based compression commands are almost the same for all the codecs:
## Lossy compression
INPUT=/path/to/input
OUTPUT=/path/to/output
CRF=4 #this is an example
#x264
ffmpeg -i $INPUT -c:v libx264 -crf $CRF -pix_fmt yuv444p $OUTPUT
#x265
ffmpeg -i $INPUT -c:v libx265 -crf $CRF -pix_fmt yuv444p $OUTPUT
#vp9
ffmpeg -i $INPUT -c:v libvpx-vp9 -crf $CRF -pix_fmt yuv444p $OUTPUT
These experiments are for 5min video blocks (5100 frames at 17fps)
CRF | x264 |
---|---|
crf0 | 7.2GB |
crf8 | 2.35GB |
crf16 | 560MB |
crf24 | 150MB |
(per frame wise: This is better than 30 frame compression by: 20-25%)
I used a veryslow preset to check what is the best quality achievable on x264. I ran this both on long as well as short videos.
CRF | x264 | x264(veryslow preset) |
---|---|---|
crf0 | 44M | 43.6M |
crf2 | 31M | 28.8M |
crf4 | 24M | 23.1M |
crf6 | 19M | 17.7M |
crf8 | 16M | 13.2M |
crf16 | 5M | 3.36M |
crf24 | 1.5M | 1M |
For 5min videos, similar trend is observed (5100 frames).
CRF | x264 | x264 (veryslow preset) |
---|---|---|
crf0 | 7.2GB | 7.1GB |
crf8 | 2.35GB | 2.1GB |
crf16 | 560MB | 341MB |
crf24 | 150MB | 95MB |
The command used for this can be obtained by applying an appropriate preset. An example for x264 is given below:
## compression with presets
INPUT=/path/to/input
OUTPUT=/path/to/output
CRF=4 #this is an example
#x264
ffmpeg -i $INPUT -c:v libx264 -preset veryslow -crf $CRF -pix_fmt yuv444p $OUTPUT
During the last discussion, we noticed that x265 CRF0 results were different that x264. I checked into this issue, and seems the difference is with respect to x265 codec usage. For almost all other codecs: x264,vp9,vp8,ffv1 etc. "-crf 0 or -qp 0" denotes perfectly lossless over the YUV space. For x265, it still performs transform coding, and then is lossless after that for "-crf 0". I switched off the transform coding and performed the experiments again, and the results are the same as x264, vp9. To be sue, I also used framehash, to compare hash values for each and every frame fro all the three codecs, and the hash values are exactly the same. The log files for the hash values can be accessed here: (these are for the first video only)
- default_hash: The hash for the original png frames converted to YUV444 format
- x264_hash: x264 hash
- x265_hash: x265 hash
- vp9_hash: vp9 hash
Note that although these are perfectly the same after conversion to YUV444, there is still some loss in conversion from rGB (in which the original frames are provided) to YUV444. I also checked for the amount of loss this incurs, and this is about 0.2% RMSE error.
For fair comparison, we only consider dense optical flow algorithms (as it is unclear how should we compare feature-based optical flow algorithms). Attempted the following Optical Flow algorithms. However, was able to successfully conduct the Farneback's algorithm.
- Farneback's Algorithm: The results are for this algorithm (openCV implementation)
CRF | x264 | x265 | vp9 |
---|---|---|---|
crf0 | 0.000 | 0.000 | 0.000 |
crf2 | 0.022 | 0.034 | 0.035 |
crf4 | 0.027 | 0.042 | 0.047 |
crf6 | 0.032 | 0.052 | 0.057 |
crf8 | 0.038 | 0.064 | 0.079 |
crf16 | 0.082 | 0.122 | 0.122 |
crf24 | 0.310 | 0.210 | 0.147 |
The relevant scripts are in: scripts/opt_flow_scripts/
cd scripts/opt_flow_scripts
## Extract frames from the videos
# Extracts the 18th and 19th frames for every video (these will need modification for other settings)
./get_frames_for_optflow.sh
## Compare optical flows for various videos:
# This script uses the Farneback Optical flow algorithm implemented in opt_flow_ked_2.py
# & stores output in the mentioned log files
./run_opt_flow.sh
- DeepFLow2: The results are shown below:
CRF | x264 | x265 | vp9 |
---|---|---|---|
crf0 | 0.000 | 0.000 | 0.000 |
crf2 | 0.013 | 0.024 | 0.019 |
crf4 | 0.017 | 0.028 | 0.030 |
crf6 | 0.022 | 0.035 | 0.037 |
crf8 | 0.027 | 0.040 | 0.053 |
crf16 | 0.056 | 0.071 | 0.752 |
crf24 | 0.12 | 0.132 | 0.090 |
cd scripts/opt_flow_scripts
## Extract frames from the videos
# Extracts the 18th and 19th frames for every video (these will need modification for other settings)
./get_frames_for_optflow.sh
## Compare optical flows for various videos:
# This script uses the Deepflow algorithm implemented in opt_flow_ked_3.py
# It uses compare_flow.py script to compare the flows wrt to the lossless
./run_opt_flow_DeepFlow2.sh
- EpicFLow: TBD
- SimpleFlow: For some frames (even lossless, gives incorrect flow (nan), and is probably unreliable
Algorithms used:
- LRR: (Laplacian Pyramid Reconstruction and Refinement for Semantic Segmentation) algorithm, one of the best amongst the benchmarks on the Cityscape datasets.
CRF | meanIU | pixelAcc | meanAcc |
---|---|---|---|
crf0 | 71.27 | 95.04 | 82.07 |
crf2 | 71.24 | 95.03 | 82.09 |
crf4 | 71.27 | 95.02 | 82.09 |
crf6 | 71.14 | 95.01 | 82.05 |
crf8 | 70.86 | 94.97 | 81.70 |
crf16 | 70.32 | 94.69 | 81.27 |
crf24 | 62.17 | 90.66 | 77.79 |
#####x265
CRF | meanIU | pixelAcc | meanAcc |
---|---|---|---|
crf0 | 71.27 | 95.04 | 82.07 |
crf2 | 71.23 | 95.04 | 81.09 |
crf4 | 71.16 | 95.00 | 81.76 |
crf6 | 71.18 | 94.96 | 81.76 |
crf8 | 70.58 | 94.87 | 81.26 |
crf16 | 69.38 | 94.29 | 80.36 |
crf24 | 60.79 | 90.64 | 74.39 |
#####VP9
CRF | meanIU | pixelAcc | meanAcc |
---|---|---|---|
crf0 | 71.27 | 95.04 | 82.07 |
crf2 | 71.10 | 94.97 | 81.96 |
crf4 | 70.30 | 94.71 | 81.32 |
crf6 | 69.58 | 94.45 | 80.93 |
crf8 | 67.34 | 93.21 | 80.09 |
crf16 | 65.67 | 93.21 | 80.09 |
crf24 | 63.95 | 90.73 | 78.63 |
The relevant parameters are:
- meanIU: mean intersection-over-union metric IoU
- pixelAcc: Pixel Level Accuracy
- meanAcc: Average Accuracy
The relevant scripts are in: scripts/obj_detect_scripts/
cd scripts/obj_detect_scripts
## Extract frames from the videos
# Extracts the 19th frames for every video (these will need modification for other settings)
./retrieve_frames.sh
## Compare optical flows for various videos:
# This script runs the MAtlab code for LRr object detection
# & stores output in the mentioned log files
./run_obj_detect.sh
There is approximately 20-25% saving on denoising, with very less impact on algorithms.
CRF | x264 | x264_denoise |
---|---|---|
crf0 | 44M | 32M |
crf8 | 16M | 12M |
crf16 | 5M | 3.8M |
crf24 | 1.5M | 1.3M |
Its observed that the specific hqdn3d denoiser does not result in significant gains (as compared with the experiments with Ford videos). Probably different denoiser might work. Also, the noise seems really low in the videos
CRF | x264 | denoise_x264 |
---|---|---|
crf0 | 0 | 0 |
crf8 | 0.038 | 0.040 |
crf16 | 0.081 | 0.080 |
crf24 | 0.185 | 0.16 |
crf32 | 0.31 | 0.28 |
CRF | x264 | denoise_x264 |
---|---|---|
crf0 | 0 | 0 |
crf8 | 0.027 | - |
crf16 | 0.056 | - |
crf24 | 0.123 | - |
crf32 | 0.28 | - |
- CRF0 obj detection check
- multipass ffmpeg compression