/AutoPercept

YOLO-KITTI Detection, Depth Estimation and SLAM Integration

Primary LanguageJupyter NotebookGNU General Public License v3.0GPL-3.0

AutoPercept

A project focused on Autonomous Vehicle Perception processes. This repository serves as a starting point for implementing object detection and depth estimation capabilities in autonomous systems using YOLO architecture and Vision Transformers.

Overview

AutoPercept uses YOLO (You Only Look Once) for real-time object detection and tracking and a vision transformer called MiDaS for a Monocular Depth Estimation. The model is trained on the KITTI dataset, enabling accurate detection and depth estimation of objects in various driving scenarios. The open sourced YOLOv8 weights from Ultralytics have been utilized for object detection model training. Zero Shot Depth Estimation is done using MiDAS since the model has already been trained on the KITTI dataset (You can learn more about MiDaS at https://github.com/isl-org/MiDaS)

Key Features

  • YOLO Object Detection: Real-time detection of objects using YOLO architecture.
  • Real Time Counting: Functionality for counting and displaying the amount of detections for each class has been added
  • Pythonic UI : AutoPercept can also be used as a Wrapper or Interface to run inference on videos using custom model weights
  • Depth Estimation: Monocular Depth Estimation has been added using MiDaS, a Vision Transformer.
  • Saveable Results : Functionality to save the video with the detections in a given directory with a given name has also been added
  • KITTI Dataset: Trained on the KITTI dataset, which includes various object categories commonly encountered in autonomous driving scenarios

Getting Started

  1. Clone the Repository:

    git clone https://github.com/yourusername/AutoPercept.git
    cd AutoPercept
  2. Setup Environment:

    # Install required dependencies
    pip install -r requirements.txt
  3. Running the Gooey App:

    python AutoPercept.py

    or

    python ViT.py
  4. Specifying Pre-inference parameters image

    • NOTE : Trained Weights can be found in the "Model_Weights" directory
  5. Inference image

    • After specifying the pre-inference params, hit the "Start" Button to start the inference process
    • A new window will open up showing the detections being made for each frame of the video
    • After inference is completed, you will find the video saved in the 'output_path' directory

Performance Metrics

  1. Precision, Recall, Mean Average Precision@IoU=0.5 and Mean Average Precision@IoU=0.5-0.95 for each class over the validation dataset

image

  1. Model Training Performance

results

Working Examples

README_EX.mp4

readme_ex

49d37214-86df-4ed1-99c1-8cf15eaa3364.mp4

Future Scope

  • Monocular Depth Estimation : I aim to add functionality of simultaneous depth estimation and objection detection very soon. (This functionality has been added)
  • FPS Optimization : The model seems to be working at a mediocre FPS. I will be looking to solve this soon enough. (This issue was being caused because of a bug that was not letting the program utilize the GPU. It has now been fixed)
  • Quantization : Currently, the MiDaS depth estimation model runs on 1-10 fps depending upon the resolution of the image. To makes the inference faster I am going to add 4 bit Quantization to the model. This will involve converting the model hyperparameters from their 32 bit floating point representation to a 4 bit one
  • Simultaneous Localization and Mapping (SLAM) : Due to the fact that the KITTI dataset also contains LiDAR and 3D Point Cloud data, It will be possible to add functionalities to visualize SLAM Processes in real time.
  • Object Tracking and Projection : Using Kalman Filter, I am currently working on creating methods to Track these object's movements and visualize them in real time.
  • YOLOv10 : Since the initiation of this project, YOLOv10 was released. Ultralytics state that this new model beats all SOTA Object Detection Benchmarks. I will soon add functionality for YOLOv10 inference