Level 2-3 Autonomous Driving System - Using Carla

This repo is now archived, was a fun research project and I learnt a lot, but all good things must come to an end...

orbslam.py - toy implementation of monocular SLAM (ORBSlam) - INCOMPLETE
model.py - Neural Network Model - architecture needs improvement, is currently just a basic ResNet
renderer.py - Radar point cloud visualization, very slow lol

TODO:

Machine Learning
- Research on different ANN Architectures
  - Implement recurrence, temporal dynamics of scene
  - Semantic Segmentation
- Look more into automatic labelling of moving vs. non-moving objects in scene (have my own little idea wanna implement 😄).
- Set up data trigger infrastructure - data collection for edge cases
SLAM
- Read more of Multiple View Geometry
- Figure out what to do after getting Camera Matrix (projection matrix?)
- Get at least a toy implementation of ORBSlam up and running
Misc.
- Leverage Cloud GPU for training and stuff (Google Colab)
- git gud

My final goal is to get a Level 2-3 Self-Driving System up and running, whilst also learning about relevant concepts

A bold-face symbol such as x always represents a column vector, and its transpose is the row vector x^T.
a point in the plane will be represented by the column vector (x,y)^T, rather than its transpose, the row vector (x,y).
a line may naturally be represented by the vector (a, b, c)^T
The set of equivalence classes of vectors in R³ − (0, 0, 0)T forms the projective space P².
A conic is a curve described by a second-degree equation in the plane. In Euclidean geometry conics are of three main types: hyperbola, ellipse, and parabola
The equation of a conic in inhomogeneous coordinates is: ax² + bxy + cy² + dx + ey + f = 0
i.e. a polynomial of degree 2. “Homogenizing” this by the replacements: x → x1/x3, y → x2/x3 gives ax₁² + bx₁x₂ + cx₂² + dx₁x³ + ex₂x₃ + fx₃² = 0 (2.1) or in matrix form: x^TCx = 0

Get two images points, first image being x and next one being x'
Extract features(keypoints) from each image u,v, along with descriptors of every point using ORB.
Use KNN with point descriptors to match points from each image
Filter out points using ratio test (distance from points)
Extract the Fundamental matrix F, using RANSAC to filter matches and the 8-point algorithm.
Perform SVD on F, and find focal length values from s₁ and s₁ of matrix S (need to explore different method, currently unsure)
Set the Intrinsic Matrix K using focal length and center point of pinhole (need to investigate further on using better methods, extracting focal length and getting intrinsic params.).
Repeat steps from 1-4, then normalize homogenous image points (x,y) -> (x,y,z) x and x' into x^^ and x^^':

x^^ = K^-1x

x^^' = K^-1x'
Find the Essential Matrix *E *using (x,y)_ from normalized coords
Perform Singular Value Decomposition on E
Extract Rotation and translation as demonstrated here: Determining R and t from E
And finally, extract 3D points as shown here: 3D points from image points