Given: data/drive.mp4 8616 frames in data/IMG each frame is 640(w) x 840(h) x 3 (RGB) Given ground_truth data in drive.json with [time, speed] for each of the 8616 frames.

Method 2: 15 epoch train,(weight = model-weights-Vtest2.h5). MSE: ~5.6

Watch Video Here

Mean Squared Error for v2(15 epochs)

Check out the medium article

TRAIN:

  • VideoToDataset.ipynb (This is what I used to write the ground truth data to a dataframe and store my images separately, this helped with testing)
  • NvidiaModel-OpticalFlowDense_kerasnew.ipynb (this is how I trained the model and demonstrated the MSE, I also processed the dataset into a video which is shown in HTML inline, notes on how I did certain things are in here)

TEST: (also found in test_suite.zip)

  • test.py
  • model.py
  • opticalHelpers.py
  • model-weights-Vtest.h5 (trained on 10 epochs, MSE ~ 10)
  • model-weights-Vtest2.h5 (trained on 15 epochs, MSE ~ 5.6) (preloaded)
  • setupstuff.sh

To test the model:

  1. run ./setupstuff.sh - this will create the necessary folders (driving_test.csv, test_IMG, test_predict)
  2. create paths to your own data.json and movie.mp4 file on lines 21 and 22 inside test.py
  3. python test.py - this will log out the MSE for a given sample size (you pick the sample size on line 14, weights should be prespecified on line 13)
  4. python makeVideo.py - this will create a video with the prediction values overlayed on-top of each image feel free to delete the ./data/predict folder after step 4
  • Requires moviepy

Dense Optical Flow network feeding.

Strategies:

Dense optical flow network feeding explanation:

  • Method 1: append images to give 3rd dimension an angular and a magnitude layer. In NvidiaModel-OpticalFlowDense I changed up my generator to yield (66, 220, 5) images with (Height , Width, R, G, B, Ang, Mag) Angles and Magnitudes are a result of computing the Dense Optical Flow using Farneback parameters. This did not help my MSE was still ~20 and I did not observe any special results.

  • Method 2: Convert optical flow angles and magnitude HSV to RGB and pass that into the network as (66, 220, 3) RGB values.

  • Hyperparameter selection: I trained the model with 400 samples per epoch, with batch sizes of 32. Therefore I sent ~16,000 images into the generator, resulting in 8k optical flow differentials. I also used an adam optimizer, and ELU activation functions because they lead to convergence faster!

Method 2 was the winner. I guess there was just too much noise when doing a simple image_1 (RGB) - image_2 (RGB). The network model held up because I converted the optical flow parameters to an RGB image, as you can see in the above video.

Other approaches:

  1. Nvidia Model: PilotNet based implementation that compares the differences between both images and sends that through a network and performs regression based on the image differences
  2. DeepVO: AlexNet like implementation that performs parallel convolutions on two images and them merges them later in the pipeline to extract special features between them
  1. DeepFlow: Large displacement optical flow with deep matching link
  • I considered using DeepFlow

Implement Dense optical flow analysis, get optical flow per each pixel. as seen in this example

Architecture Design:

architecture design

Tools used