Technologies:
- Raspberry Pi
- OpenCV
- TensorFlow
Complete documentation coming soon ...
The TensorFlow machine learning algorithm/model needs data to learn from. The data consists of two parts.
- Video data: A video is nothing more than a series of photos. I use OpenCV to represent each frame as a matrix of numbers, where each number represents a pixel. If you want black-and-white video, OpenCV will give you a 2-dimensional matrix consisting of a frame's height and width. If you decide you want color video, then OpenCV will give you a 3-dimensional matrix consisting of not just height and width but also color depth: red, green, and blue. The machine learning model will use the pixels of the matrix as predictors.
- Command data: Commands tell the motors what to do and represent the target variable. The model will try to learn which commands to send to the motor based on the video that it sees. I went for simplicity, so I defined only three types of commands: left, right, and foward. This makes driving a multinomial classification problem. The downside of having just three commands is that the instructions can be rather crude. There is no distinction between a wide and a sharp turn.
This part of the project is very unintuitive and could probably be designed much better. There are multiple components.
- Raspberry Pi video streaming using a complicated Linux video utility
- A tangled mess for viewing and saving the streamed video data
- A restful API webserver that runs on the Pi and takes commands from a web browser like Google Chrome running on your laptop
- Something for moving API command files from the Pi to your laptop where the video data is saved
- A data cleaning tool that matches your target and predictor data by timestamp and outputs a final clean numpy file that TensorFlow can ingest
First, you'll need to make sure that the timezone on your Raspberry Pi matches that on your laptop. The code won't be able to match the timestamps on the Pi (the driving commands) with those of the video frames on the laptop if the timezones don't match. Enter the command below into the Pi to update its timezone:
sudo dpkg-reconfigure tzdata
Turn on video streaming from the Pi. Log into the Raspberry Pi if you haven't already and enter the following commands:
# Go to wherever you installed ffmpeg
cd /usr/src/ffmpeg
# Run ffmpeg. I have no idea how this command works since I copy-and-pasted it from some website off of Google
sudo ffserver -f /etc/ff.conf_original & ffmpeg -v quiet -r 5 -s 320x240 -f video4linux2 -i /dev/video0 http://localhost/webcam.ffm
At this point the streaming video should be available at the URL below. You won't be able to view the raw video from your browser though; your browser will endlessly try to download the streaming file. Note that the IP will probably be different for you.
http://192.168.0.35/webcam.mjpeg
Start the API webserver. Log into the Raspberry Pi in another tab. Clone this repo on the Pi and move into the folder. Then run this command.
sudo python3 drive_api.py
On my Pi the drive API script fails if I call it with Python 2 or if I don't call it with root, but this all depends on how you set everything up and might differ based on how you did your installation.
Next run the script that displays and saves the incoming video data. Enter the command below using the IP address of your Raspberry Pi.
python save_streaming_video_data.py --ip 192.168.1.82
Finally, open up a web browser and point it to the URL below (IP address will likely be different for you).
http://192.168.0.35:81/drive
Click on the page and use the arrow keys (left, right, up, down) to drive the car. The page you just clicked on has some hacky javascript I wrote that fires an API call to the webserver running on the Pi each time you hit one of the arrow keys.
When you get to the end of your driving session, change the URL in the browser to:
http://192.168.0.35:81/StoreLogEntries
Then hit enter. This runs an important data cleaning step on all of the commands that the webserver received during the driving session. Once the webpage says "Finished", navigate to the Terminal/Putty tab running the server, and hit control+c to kill the process. You shold now see two files.
- session.txt: contains valid and invalid accidental commands
- clean_session.txt: contains only valid commands
Now kill the save_streaming_video_data.py
script. This script should have generated two files.
- video_timestamps.txt: contains timestamps for each of the saved video frames
- output.mov: contains video data
So, in total, there are four files for each driving session. I usually create a new folder for each session. Note that two of the files are on the Pi and two are on your laptop. However, all four files need to be in the same place for processing, so I usually copy the Pi files over to my laptop. You'll need to generate lots of driving data, and so copying the files from the Pi to your laptop can become tedious. I created scp_car_data.sh
to make this easier.
Once all of the files are in the same place, it's time to clean up all of your data and create files that TensorFlow will be able to digest for model training. All of this happens in the save_all_runs_as_numpy_files.py
script. This script assigns a label (left, right, straight) to each image and performs basic data cleaning. It saves each driving session separately as a .npz file.
I highly recommend backing up your data somewhere like AWS's S3. See command-line examples below.
Note that without the --delete
flag, the aws synch
command won't delete data from S3 but will add it if it doesn't exist. This is helpful so that you don't accidentally obliterate your entire backup.
The sync
command is recursive, so it can copy files within nested folders. You can find the official AWS docs on this command here.
# Specify your own locations
LOCAL_FOLDER='/Users/ryanzotti/Documents/repos/Self_Driving_RC_Car/data'
S3_FOLDER='s3://self-driving-car/data'
# To back up to AWS
aws s3 sync ${LOCAL_FOLDER} ${S3_FOLDER}
# To restore backup from AWS
aws s3 sync ${S3_FOLDER} ${LOCAL_FOLDER}
# You can also delete unwanted files from the AWS backup
aws s3 sync ${LOCAL_FOLDER} ${S3_FOLDER} --delete
The command above can take an extremely long time depending on your internet connection speed. At one point I had basic a cheap AT&T internet plan with only 250 kbps upload speed (advertised at 5 Mbps), and it took me 5-8 hours to upload about an hour's worth of driving data.
EDIT: I've since ditched AT&T's 5 Mbps $40/month package and replaced it with San Francisco's Google Fiber (via Webpass) package at $42/month for 1,000 Mbps (1 Gbps). Actual upload speed ranges between 400-900 Mbps. Now uploading 4-5 hours of driving data takes just 1-2 minutes. Google Fiber is amazing. I love it.
To run all of these AWS commands locally, you need to tell AWS that you have access. AWS does this with the aws_secret_access_key
and aws_access_key_id
. When you spin up an AWS instance (e.g., a GPU), you can assign an AWS IAM Role
to the instance and the instance will inherit these credentials. However, AWS can't assign an IAM Role to your laptop, so you'll need to update ~/.aws/credentials
so that it looks something like the contents below. These are obviously fake values, but the real values look just as much like long gibberish strings. You can get the actual values associated with your account through the AWS IAM console. You should never expose your real values to the public -- thieves could take control of your entire AWS account and, for example, run up a massive bill, among other things.
[default]
aws_access_key_id = ASDFSDFSDFSDFSDFKKJSDFEUSXN
aws_secret_access_key = SKJE8ss3jsefa3sjKSDWdease3kjsdvna21
region = us-east-1
The video data takes up a lot of space on your local machine. I periodically check how much storage I have used by running the following command.
DATA_DIR='/Users/ryanzotti/Documents/repos/Self_Driving_RC_Car/data'
du -sh ${DATA_DIR}
At the start of my project I relied on dataprep.py
to aggregate all of my sessions' image and label data into a single file for model training. As my dataset grew, my 16 GB memory laptop started having memory issues when processing all of the files simultaneously. My limit seemed to be 44,000 240x320x3 images.
Since I don't want to spend money on a GPU Apache Spark cluster, I decided to sample my data using the Dataset.py
script and Dataset
class. Dataset
assumes that you have already run the save_all_runs_as_numpy_files.py
script. The Dataset
class has to be instantiated in each model training script, since it now takes care of creating batches as well.
Training on the GPU is so much faster than training on the CPU that I now only train on the GPU except when debugging. I get about a 14x speedup when running on one of AWS's Tesla K80 GPUs (p2.xlarge) compared to my Mac's CPU. Mac's don't have a Tensorflow-supported built-in GPU, so I rely on AWS to do my GPU training. Check out this link for details on GPU training (how to build your own AWS GPU AMI, etc). As of now, Amazon Web Services, Google Compute Engine, and Microsoft Azure all provide the same Nvidia K80 GPU. AWS charges $0.90 per hour, and Google charges $0.70 per hour. Microsoft doesn't make it easy to compare to AWS, so I have no idea what they charge. Ultimately I plan to try all three services and go with the cheapest.
I've written scripts for training many different types of models. To avoid confusion I've standardized the command-line interface inputs across all scripts by leveraging the same Trainer
class in Trainer.py
. All scripts automatically sync/archive data with AWS's S3. This means that the model will always train on the latest batch of training data. It also means that you need to be prepared to download ALL of the training data, which as of now is about 50 GB. Make sure your laptop or GPU has enough space before attempting.
Each script syncs with S3 before training so that it's possible to train multiple models in parallel without the backups overwriting each other. The Trainer
class writes backups to S3 after each epoch.
Training a new model is simple. See the example below. The nohup
and &
tell the model to train in the background so that you can close your computer (assuming your code is running in the cloud and not locally).
S3_BUCKET=self-driving-car # Specify your own S3 bucket
SCRIPT=train_conv_net.py
# All scripts follow the same command-line interface
nohup python3 ${SCRIPT} --datapath /root/data \
--epochs 100 \
--s3_bucket ${S3_BUCKET} &
Training still takes a long time (e.g., 10+ hours) even when training on a GPU. To make recovery from unexpected failures easier, I use Tensorflow's checkpoint feature to be able to start and stop my models. These are included in the model backups sent to the cloud. Tensorflow model checkpointing also makes it possible to rely on AWS Spot Instances, which I haven't tried yet.
I created a script called resume_training.py
that is agnostic to the model whose training is being restarted. It reads in a Tensorflow checkpoint file that you specify and reonconstructs the model in memory before it resumes training. You can call it like this:
# Your paths will differ
DATA_PATH='/Users/ryanzotti/Documents/repos/Self_Driving_RC_Car/data'
EPOCHS=100
MODEL_DIR='/Users/ryanzotti/Documents/repos/Self_Driving_RC_Car/data/tf_visual_data/runs/1'
S3_BUCKET=self-driving-car
# Run the script
python resume_training.py \
--datapath $DATA_PATH \
--epochs $EPOCHS \
--model_dir $MODEL_DIR \
--s3_bucket ${S3_BUCKET}
Q: How do I log into the Pi?
A: By default the id is pi and the password is raspberry. I usually ssh into mine using the Terminal application on my Mac. On Windows you could probably use Putty. See example command below. Note: your IP will probably be different.
ssh pi@192.168.0.35
Q: How do I find the IP address for my PI?
A: There are probably multiple ways to do this, but whenever I connect my Raspberry Pi to a new wifi network I always have to plug in my keyboard, mouse, and HDMI cable so that I can view the Pi on a monitor or TV. Then open up the Pi console and type the command below, which will print the IP to your console.
hostname -I
- TensorFlow Timeslines example: Show execution time for each node in your TensorFlow graph.