! Work in progress !
(you have been warned)
Pilotguru is a project aiming to make it possible, with just a smartphone and zero hardware work, to gather all the necessary data to train your own computer vision based car autopilot/ADAS:
- Use a smartphone as a dashcam to record a ride (video + GPS data + accelerometer).
- With pilotguru, annotate frames of the recording with angular velocity (for steering) and
forward speed (for acceleration and braking).
- Repeat 1-2 until you have enough data.
- Train a machine learning model to control a car based on your garthered data. Pick an existing good one or build your own.
- If you are really feeling adventurous, intergrate your model with openpilot. You will have to have a compatible car and get some hardware to get it to actually control the car. Make sure to comply with the local laws, you assume all the responsibility for consequences, this software comes with no guarantees, yada yada yada.
- Self-driving cars solved! Enjoy ;)
- Steering angle data from just the video - available.
- Forward velocity from video/GPS/accelerometer - under development, nothing to show yet.
Pretty easy actually, see below.
Linux, docker. Everything runs in a docker image, the necessary dependencies will be pulled when you build the image.
-
Clone this repo.
git clone https://github.com/waiwnf/pilotguru.git cd pilotguru
-
Build the docker image (this will take a while).
./build-docker.sh
-
Download extra data for the SLAM (simultaneous localization and mapping) library. We rely on ORB_SLAM2 by Raul Mur-Artal.
./fetch-vocabulary.sh
-
Build everything (inside a docker container).
./run-docker.sh mkdir build cd build cmake .. make -j
Everything below should be run from inside the docker container (i.e. after executing ./run-docker.sh from pilotguru root directory).
Every camera has some lens distortion. Calibration is a process of estimating the distortion parameters, so that we can warp the captured image and remove most of the distortion.
-
Print out a calibration board. Examples: A4, US Letter, A3. Larger is better. Makes sure to disable "fit to page" and other autoscaling options when you print.
-
Stick the printout somewhere so that it is flat.
-
Take a video of the printout with the same camera settings as you will use in the dashcam mode in your car. Do not try to go for the maximum resolution the phone can handle - the SLAM system typically struggles with the noise from high-res videos anyway. Something like 720x480 resolution should work fine. Make sure to capture the calibration pattern at different angles, distances, and in different parts of the frame.
-
Grab the video from the phone, to, say,
pilotguru/data/calibration-video.mp4
. -
Run the calibration code (substitute your board parameters for
X
,Y
andZ
). Do not forget that board side width and height must be the numbr of inner corners on your calibration board, which is one fewer than the number of squares of that side../calibrate \ --board_side_width X \ --board_side_height Y \ --square_size Z \ --input /opt/pilotguru/data/calibration-video.mp4 \ --skip_frames 10 \ --out_file /opt/pilotguru/data/calibration.yml
Press
g
once the video shows up on the screen. In the lower right corner is the number of frames where the calibration pattern was successfully detected. Make sure there was enough of those. If not, take a longer video and repeat. -
Done! Don't lose
calibration.yml
, you will need it later.
-
Get a windshield/dash phone mount for your car. We will use the phone motion to capture the motion of the car, so the less flex and wobble in the mount, the better. Long-armed mounts like this are probaby not very good for our use. Something like this would be better. Make sure the phone camera has a good view out of the windshield.
-
Go for a ride! Set the phone to record a video. Remember to use the same settings as you were using for camera calibration.
-
Back at your computer, grab the video from the phone, call it
pilotguru/data/ride.mp4
. -
Process the video to extract the camera motion:
mkdir /opt/pilotguru/data/ride-trajectories ./optical_trajectories \ --vocabulary_file /opt/pilotguru/data/orb-vocabulary/ORBvoc.txt \ --camera_settings /opt/pilotguru/data/calibration.yml \ --out_dir /opt/pilotguru/data/ride-trajectories \ --in_video /opt/pilotguru/data/ride.mp4
This will run the optical SLAM logic on the video, extract camera motion and write it as JSON files to
/opt/pilotguru/data/ride-trajectories
. The SLAM system may needs some time to pick up localization, and also may lose tracking in the middle of the video, so there may be multiple non-overlapping tracked segments, with a separate JSON file written for each. Those JSON files should be treated as completely independent (i.e. there are no relations between world coordinate systems in different JSON files).If the SLAM system fails to start tracking at all, it may be that your camera calibration results are off. Try to re-calibrate a few times to see how much variance there is. Too high a resolution may also be a problem.
The SLAM system estimates the full 3D trajectory of the camera (both translation and rotation), but the translation information tends to be less reliable, so we will ignore it for now.
-
Raw SLAM trajectory estimates tend to be quite noisy. Apply Gaussian smoothing across frames to the rotation values to remove the high-frequency noise:
./smooth_heading_directions \ --trajectory_in_file /opt/pilotguru/data/ride-trajectories/trajectory-0.json \ --trajectory_out_file /opt/pilotguru/data/ride-trajectories/trajectory-0-smoothed.json \ --sigma 5
Larger sigma leads to more smoothing. Try out a couple of values to see what is the smallest one that still removes most of the jitter.
-
./render_turning \ --in_video /opt/pilotguru/data/ride.mp4 \ --trajectory_json /opt/pilotguru/data/ride-trajectories/trajectory-0-smoothed.json \ --steering_wheel /opt/pilotguru/img/steering_wheel_image.jpg \ --out_video /opt/pilotguru/data/ride-0-smoothed.mp4
This will produce a video like this with a rotating steering wheel tiled next to the original ride recording. The steering wheel rotation should generally match the car steering.
Caveat: the steering wheel rotation in the rendering is somewhat misleading: it is proportional to the angular velocity of the car, so for turns at slow speed the rendered steering wheel will rotate less than the real one in the car, and vice versa for high speed.
Steps that happen to extract the steering angle from video, and output JSON data format.
-
SLAM (simultaneous localization and mapping) library infers camera motion from video, annotating every successfully tracked frame with the 6 degrees of freedom camera pose (translation + heading direction). In the output JSON file, this data is stored as
pose
field for every frame, along with the frame number in the original video (frame_id
) and time offset from the start in microseconds (time_usec
):"frame_id": 1172, "time_usec": 39062900, "pose": { "rotation": { "w": 0.895, "x": 0.0067, "y": -0.437, "z": 0.086 }, "translation": [ -0.102, 0.009, 0.050 ] },
Rotations are repsresented as unit quaternions.
The reference coordinatre frame is chose by the SLAM library arbitrarily, so there is no inherent notion of what the horizontal (road) plane is or what are left/right turn directions. Which is why we need the next steps..
-
Infer (approximately) the horizontal (i.e. road surface) plane using principal components analysis. As long as the car does not drive just in a straight line, or goes on a mountain switchback road, the distance traveled in any horizontal direction will be much larger than in the vertical direction. So we simply take the top two PCA components as the basis vectors of the horizontal plane. The basis vectors are stored in every JSON file as the
plane
field:"plane": [ [ 0.914, 0.097, 0.393 ], [ -0.404, 0.168, 0.899 ] ],
-
Project the 3D rotation onto the 2D inferred horizontal plane. Stored for every frame as
(x,y)
unit vectors in the reference frame of the plane basis vectors:"planar_direction": [ 0.324955405433993, 0.93074165275327 ],
-
Finally, from the horizontal directions we can compute the relative horizontal rotation between the two frames. Use the direction of the cross product to distinguish left and right turns:
v = cross(prev_frame_flat_direction, current_frame_flat_direction) alpha = acos(dot(prev_frame_flat_direction, current_frame_flat_direction)) * sign(v_z)
The results are stored as
turn_angle
field for every frame (in radians):"turn_angle": -0.0054
-
All of the above steps are prone to high-fequency noise, showing up in the SLAM camera trajectory and propagating to the rotation magnitudes. To reduce the noise,
smooth_heading_directions
binary applies Gaussian blur to the 3D camera rotations and propagates the results downstream.