DeepRacer 2019 Sandbox

This is my full collection of tools, notebooks, scraps for participation in 2019 AWS DeepRacer Virtual League.

What you'll find in this repo:

Local training assets: container Dockerfiles, launch scripts mostly in bash, monitoring scripts
AWS cloud-based training scripts: pre-dating my local training setup, but most useful for cloud-based evaluations of local training
Analysis Notebook
Models/Experiments: all the training sessions hyperparameters, reward functions, action space
RoboMaker simapp: scripts to build the bundle, source files to add or replace files within the bundle
Twitch streaming assets: UI (flask-based), ffmpeg tools to stream from simulation
Airflow automation DAGs

NB: I am not an expert in ML/RL and participation in DeepRacer was a way to educate myself. Forgive me any naive or wrong approaches taken. Feel free to send me any observations, suggestions for different approaches, related papers or projects, or just to drop me a line.

Race Results

Race	Standing
August 2019 Virtual Race Shanghai Sudu	102 of 1375
September 2019 Virtual Race Cumulo Carrera	132 of 1338
October 2019 Virtual Race Toronto Turnpike	60 of 1983
November 2019 Virtual Race Championship Cup Warm-up	8 of 904
AI Driving Olympics at NeurIPS	Phase I: Perception challenge: Top 10 Phase II: Simulation to Reality challenge: did not place

The code and scripts are shared here unfiltered. Some items may be broken or hacky. The goal was to educate myself about reinforcement learning and train competitive models, sometimes at the expense of good coding practices. I'll be starting a new repo for any work I do on the 2020 DeepRacer races and won't be adding any more changes to this code.

I'll follow here with some select items that I hope may be of interest to those looking to compete in the 2020 DeepRacer League.

RoboMaker Bundle Management

The official SimApp bundle for DeepRacer is publicly readable and located at https://s3.amazonaws.com/deepracer-managed-resources/deepracer-github-simapp.tar.gz

robomaker/deepracer-simapp.tar.gz.md5 - MD5 of the bundle to verify we're using the correct base for file patches

airflow/monitor_deepracer_simapp.py - Script to monitor the hosted simapp bundle for changes. Currently uses a date-based validation comparing official bundle to a copy stored in an S3 bucket I own

patch/* - overlay files to add or replace files within the bundle. These are mostly local edits to markov package, additional gazebo assets, added parameters to launch files.

scripts/bundle.sh - Create a bundle using the base simapp and overlaying files from patch/.

scripts/publish.sh - Upload the patched bundle to an S3 bucket owned by me, consumable by RoboMaker for running patched simulations in the cloud

Local Training

This was grown out of necessity and not out of convenience. Therefore it is completely custom for my preferences and does not use the well-known DeepRacer Community training stack on GitHub.

Goals for my local stack were:

Full access to the simapp bundle code to edit or add files
Fast iteration on code changes to the simapp bundle using Docker volumes to patch containers
Unified logging for later analysis
Replication of all training artifacts to S3, effectively making local storage a "cache" that can be cleared

Components:

dr-training - Sagemaker/TensorFlow training
dr-simulation - RoboMaker/ROS/Gazebo simulation
dr-redis - pub/sub between dr-simulation and dr-training
dr-logger - "sidecar" logger to aggregate all container logs and write them to JSON files
dr-uploader - background synchronization of training assets and logs to S3 bucket
minio - S3 replacement to store training checkpoints locally

Interesting bits:

container/Dockerfile.* - Dockerfiles for the local training setup

scripts/launch_local.sh - Entrypoint for local training kickoff

models/* - Inputs for local training, a unique folder for each training session with hyperparameters, action space, reward function

docker-compose.yml - container configuration

Twitch Streaming

I streamed training at https://www.twitch.tv/deepstig later in the season. I used OBS to host a browser-based UI with a VLC stream overlay, sending frames out of my local training simulation via ffmpeg over udp.

twitch/app.py - Flask app to show a UI with some near-real-time metrics

container/streamer.sh - In-container script to listen to ROS camera node RGB image messages and pipe them directly to ffmpeg stdin in order to generate a mpegts stream over udp

scripts/monitor_video.sh - Script to launch streamer.sh within the container, passing in the udp stream destination

Log Analysis

Based on AWS DeepRacer Workshop Jupyter notebook but heavily modified. Any time I had a question about training progress or simulation behavior I would add some new features to this. Its really overgrown now but gives me a full and complete picture of training as I run it.

For brevity, I'll pull out a few interesting sections but you can click to the [full notebook](log_analysis/DeepRacer Log Analysis.ipynb) to see the code.

Description
Training progress, loss. I would watch this to discover points at which I would need to stop training or adjust hyperparameters.
Action space usage. This helped me to know if there were unused actions that could be culled out.
Car performance during training. Mostly scatterplots of episodic metrics, with mean for the iteration overlaid in orange. The most intersting is the fourth graph which shows progress per lap, but also ratio of completed laps. If the completion ratio was between 20% (red) and 40% (green) lines, I would submit the model for racing. If the completion ratio was more than 40%, I would push the speed a little further and retrain.
Correlate high rewards to high speed. If they don't correlate then there is most likely a problem in the reward function.
Heatmap showing rewards for each step. A good indicator of the line that is rewarded traversing the track.
Exit points plot. Clumped exit points may indicate an action space can be modified to have a better turn angle, or that reward function might be rewarding a wrong action.
Actions mapping. Only really useful for an action space with one speed per steering angle. Correlates actions with track waypoints.
Single episode summary. Shows: step location, heading angle (black), steering angle (red), episode pace
Speed. Blue line is the actual speed, measured as incremental distance between steps. Yellow is throttle and cyan is steering. This helps to easily see the effect of steering and throttle position on speed.
Correlate steering with heading change.
Reward and Progress. This graph verifies higher rewards for higher progress per step.
Try to detect slippage. The car can wipe out on turns if speed is too high. This graph shows when heading and direction of movement over ground don't correlate.
Run inference on an image to find its action probabilities. This can indicate the health of the model.
GradCAM. Finds the aspects of the image that the network is focusing on.
Convolutional layer activations. This is mostly for making the convolutional layers more interpretable by seeing the features they activate on.

Interesting items:

log_analysis/DeepRacer Log Analysis.ipynb - The notebook

log_analysis/images/* - Still image captures of a variety of tracks to use in analysis, such as running it through the model to get action probabilities

Airflow Automation

I had aspired to use airflow to work through a queue of training and evaluation jobs but ultimately didn't end up spending the time automating to that level. The primary usage of airflow was to submit the model to the virtual league every ~30 minutes.

It was unfortunate but the winners were so close that luck and brute force had a large part in getting to the top positions. This would use Selenium and ChromeDriver submit the model, and also handle any authentication that might need to happen as part of that workflow.

airflow/deepracer_submit_dag.py - Submit a model for evaluation every 30 minutes

Resources

Official AWS Resources

AWS DeepRacer Documentation https://docs.aws.amazon.com/deepracer/index.html#lang/en_us
AWS DeepRacer League https://aws.amazon.com/deepracer/league/
AWS Cost Management https://console.aws.amazon.com/cost-reports/home?region=us-east-1#/dashboard
AWS SageMaker Python SDK https://github.com/aws/sagemaker-python-sdk

Components of DeepRacer

AWS SageMaker
AWS RoboMaker
AWS Kenesis
AWS CloudWatch Logs
AWS S3
AWS Lambda

Useful Tools

rviz: rviz is a 3d visualization tool for ROS applications https://docs.aws.amazon.com/robomaker/latest/dg/simulation-tools-rviz.html
rqt hosts a number of different plugins for visualizing ROS information https://docs.aws.amazon.com/robomaker/latest/dg/simulation-tools-rqt.html
Gazebo lets you build 3D worlds with robots, terrain, and other objects https://docs.aws.amazon.com/robomaker/latest/dg/simulation-tools-gazebo.html
ROS: Robot Operating System which AWS RoboMaker is based on https://www.ros.org/
TensorFlow ML which AWS SageMaker is based on https://www.tensorflow.org/
Pandas: Python Data Analysis Library https://pandas.pydata.org/
Actual RoboMaker simulation environment for DeepRacer https://s3.amazonaws.com/deepracer-managed-resources/deepracer-github-simapp.tar.gz
Coach is the implementation of RL algorithms (PPO, CPPO, TRPO, etc) used in the RLEstimator that aggregates training data back into the model in SageMaker https://github.com/NervanaSystems/coach
OpenVINO is used to execute the RL model on the car hardware or ROS simulator https://01.org/openvinotoolkit
PyTorch: ML framework that effectively is SageMaker; also usable on Google Cloud and Azure https://pytorch.org/

cdthompson/deepracer-training-2019

DeepRacer 2019 Sandbox

RoboMaker Bundle Management

Local Training

Twitch Streaming

Log Analysis

Airflow Automation

Resources

Official AWS Resources

Components of DeepRacer

Useful Tools

Community Resources

Education

Other Useful Resources

Discussion Groups

DeepRacer Service Map