Cityscape Experiments

These are the results of the experiments with the cityscape dataset.

The experiments were conducted with the frankfurt part of the cityscape dataset.

Dataset

The Frankfurt dataset contains around 80,000 frames, consisting of 3 stages (which we will refer to as long stages) This part of the dataset was used mainly for compression experiments, as ground truth is not available for ll the frames.
A subset of the Frankfurt dataset: 50 sequences of 30 frames each, was used for majority of the experiments. Every 20th frame in the 30 frame have ground truth pixel-level object segmentation information (both object type and class information).

Sample (frame):

Sample ground truth:

Experiments

Compression vs distortion experient
Impact of denoising
Optical Flow experiments
Object Detection/Segmentation experiment

The raw results are in the logs folder. While all the scripts are in the main folder.

logs: contains logs
create_videos.sh: Script for creating videos from consecutive frames
retrieve_frames.sh: Retrieve the 20th frame (and 19th frame) for experiments and comparison with ground truth
run_obj_detect.sh: Run object detection experiment which compares the result with the GT. We use the the LRR (Laplacian Pyramid Reconstruction and Refinement for Semantic Segmentation) algorithm, one of the best amongst the benchmarks on the Cityscape datasets.
run_opt_flow.sh: Run the Farneback optical flow algorithm on a specific codec and crf values mentioned in the config.ini file
run_opt_flow_deepFlow2.sh: Run the DeepFLow2 optical flow algorithm

Compression vs distortion experiment (Total memory for 50 videos together)

These results are for 30 frame snippets (average over 50 snippets) The original dataset is in the form of png images and has average size of 70MB.

CRF	x264	x265	vp9
crf0	44M	53M	48M
crf2	31M	20M	20M
crf4	24M	16M	12M
crf6	19M	12M	8.5M
crf8	16M	9M	6.2M
crf16	5M	2.5M	3.1M
crf24	1.5M	0.8M	2.1M

The speeds of compression are as follows:

Speed	x264	x265	vp9
in fps	5.5	1.3	0.2

These are the commands which can be used to create compressed videos. (I used slightly different commands, as videos were being created from individual frames, but very similar)

Lossless Compression: Lossless compression commands (lossless in the YUV space) are slightly different for different codecs

 ## Lossless compression
 INPUT=/path/to/input
 OUTPUT=/path/to/output
 
 #x264
 ffmpeg -i $INPUT -c:v libx264 -crf 0 -pix_fmt yuv444p $OUTPUT
 
 #x265
 ffmpeg -i $INPUT -c:v libx265 -x265-params lossless=1  -pix_fmt yuv444p $OUTPUT
 
 #vp9
 ffmpeg -i $INPUT -c:v libvpx-vp9 -qmin 0 -qmax 0 -lossless 1 -pix_fmt yuv444p $OUTPUT

Lossy Compression: The CRF level based compression commands are almost the same for all the codecs:

 ## Lossy compression
 INPUT=/path/to/input
 OUTPUT=/path/to/output
 CRF=4 #this is an example
 
 #x264
 ffmpeg -i $INPUT -c:v libx264 -crf $CRF -pix_fmt yuv444p $OUTPUT
 
 #x265
 ffmpeg -i $INPUT -c:v libx265 -crf $CRF  -pix_fmt yuv444p $OUTPUT
 
 #vp9
 ffmpeg -i $INPUT -c:v libvpx-vp9 -crf $CRF -pix_fmt yuv444p $OUTPUT

Compression experiment for longer duration videos

These experiments are for 5min video blocks (5100 frames at 17fps)

CRF	x264
crf0	7.2GB
crf8	2.35GB
crf16	560MB
crf24	150MB

(per frame wise: This is better than 30 frame compression by: 20-25%)

Compression with presets:

I used a veryslow preset to check what is the best quality achievable on x264. I ran this both on long as well as short videos.

CRF	x264	x264(veryslow preset)
crf0	44M	43.6M
crf2	31M	28.8M
crf4	24M	23.1M
crf6	19M	17.7M
crf8	16M	13.2M
crf16	5M	3.36M
crf24	1.5M	1M

For 5min videos, similar trend is observed (5100 frames).

CRF	x264	x264 (veryslow preset)
crf0	7.2GB	7.1GB
crf8	2.35GB	2.1GB
crf16	560MB	341MB
crf24	150MB	95MB

The command used for this can be obtained by applying an appropriate preset. An example for x264 is given below:

 ## compression with presets
 INPUT=/path/to/input
 OUTPUT=/path/to/output
 CRF=4 #this is an example
 
 #x264
 ffmpeg -i $INPUT -c:v libx264 -preset veryslow -crf $CRF -pix_fmt yuv444p $OUTPUT

CRF0 Difference Analysis

During the last discussion, we noticed that x265 CRF0 results were different that x264. I checked into this issue, and seems the difference is with respect to x265 codec usage. For almost all other codecs: x264,vp9,vp8,ffv1 etc. "-crf 0 or -qp 0" denotes perfectly lossless over the YUV space. For x265, it still performs transform coding, and then is lossless after that for "-crf 0". I switched off the transform coding and performed the experiments again, and the results are the same as x264, vp9. To be sue, I also used framehash, to compare hash values for each and every frame fro all the three codecs, and the hash values are exactly the same. The log files for the hash values can be accessed here: (these are for the first video only)

default_hash: The hash for the original png frames converted to YUV444 format
x264_hash: x264 hash
x265_hash: x265 hash
vp9_hash: vp9 hash

Note that although these are perfectly the same after conversion to YUV444, there is still some loss in conversion from rGB (in which the original frames are provided) to YUV444. I also checked for the amount of loss this incurs, and this is about 0.2% RMSE error.

Optical Flow experiments

For fair comparison, we only consider dense optical flow algorithms (as it is unclear how should we compare feature-based optical flow algorithms). Attempted the following Optical Flow algorithms. However, was able to successfully conduct the Farneback's algorithm.

Farneback's Algorithm: The results are for this algorithm (openCV implementation)

CRF	x264	x265	vp9
crf0	0.000	0.000	0.000
crf2	0.022	0.034	0.035
crf4	0.027	0.042	0.047
crf6	0.032	0.052	0.057
crf8	0.038	0.064	0.079
crf16	0.082	0.122	0.122
crf24	0.310	0.210	0.147

The relevant scripts are in: scripts/opt_flow_scripts/

cd scripts/opt_flow_scripts

## Extract frames from the videos
# Extracts the 18th and 19th frames for every video (these will need modification for other settings)
./get_frames_for_optflow.sh 

## Compare optical flows for various videos:
# This script uses the Farneback Optical flow algorithm implemented in opt_flow_ked_2.py 
# & stores output in the mentioned log files
./run_opt_flow.sh

DeepFLow2: The results are shown below:

CRF	x264	x265	vp9
crf0	0.000	0.000	0.000
crf2	0.013	0.024	0.019
crf4	0.017	0.028	0.030
crf6	0.022	0.035	0.037
crf8	0.027	0.040	0.053
crf16	0.056	0.071	0.752
crf24	0.12	0.132	0.090

cd scripts/opt_flow_scripts

## Extract frames from the videos
# Extracts the 18th and 19th frames for every video (these will need modification for other settings)
./get_frames_for_optflow.sh 

## Compare optical flows for various videos:
# This script uses the Deepflow algorithm implemented in opt_flow_ked_3.py 
# It uses compare_flow.py script to compare the flows wrt to the lossless
./run_opt_flow_DeepFlow2.sh

EpicFLow: TBD
SimpleFlow: For some frames (even lossless, gives incorrect flow (nan), and is probably unreliable

Object Detection/Segmentation experiment

Algorithms used:

LRR: (Laplacian Pyramid Reconstruction and Refinement for Semantic Segmentation) algorithm, one of the best amongst the benchmarks on the Cityscape datasets.

x264

CRF	meanIU	pixelAcc	meanAcc
crf0	71.27	95.04	82.07
crf2	71.24	95.03	82.09
crf4	71.27	95.02	82.09
crf6	71.14	95.01	82.05
crf8	70.86	94.97	81.70
crf16	70.32	94.69	81.27
crf24	62.17	90.66	77.79

#####x265

CRF	meanIU	pixelAcc	meanAcc
crf0	71.27	95.04	82.07
crf2	71.23	95.04	81.09
crf4	71.16	95.00	81.76
crf6	71.18	94.96	81.76
crf8	70.58	94.87	81.26
crf16	69.38	94.29	80.36
crf24	60.79	90.64	74.39

#####VP9

CRF	meanIU	pixelAcc	meanAcc
crf0	71.27	95.04	82.07
crf2	71.10	94.97	81.96
crf4	70.30	94.71	81.32
crf6	69.58	94.45	80.93
crf8	67.34	93.21	80.09
crf16	65.67	93.21	80.09
crf24	63.95	90.73	78.63

The relevant parameters are:

meanIU: mean intersection-over-union metric IoU
pixelAcc: Pixel Level Accuracy
meanAcc: Average Accuracy

The relevant scripts are in: scripts/obj_detect_scripts/

cd scripts/obj_detect_scripts

## Extract frames from the videos
# Extracts the 19th frames for every video (these will need modification for other settings)
./retrieve_frames.sh 

## Compare optical flows for various videos:
# This script runs the MAtlab code for LRr object detection
# & stores output in the mentioned log files
./run_obj_detect.sh

Impact of denoising

There is approximately 20-25% saving on denoising, with very less impact on algorithms.

FileSizes

CRF	x264	x264_denoise
crf0	44M	32M
crf8	16M	12M
crf16	5M	3.8M
crf24	1.5M	1.3M

Its observed that the specific hqdn3d denoiser does not result in significant gains (as compared with the experiments with Ford videos). Probably different denoiser might work. Also, the noise seems really low in the videos

Farneback Optical Flow

CRF	x264	denoise_x264
crf0	0	0
crf8	0.038	0.040
crf16	0.081	0.080
crf24	0.185	0.16
crf32	0.31	0.28

DeepFlow2 Optical Flow

CRF	x264	denoise_x264
crf0	0	0
crf8	0.027	-
crf16	0.056	-
crf24	0.123	-
crf32	0.28	-

TODO

CRF0 obj detection check
multipass ffmpeg compression

kedartatwawadi/cityscape_experiments