Semantic Segmentation for Autonomous Driving (SSAD)

Thesis Project @ Amirkabir University of Technology

Table of Contents

About The Project
Installation
Usage
- Training
- Testing
Data
- Datasets
Models
- U-NET
- GAN
- Diffusion
Contributing
License
Contact
Acknowledgments

About The Project

This project aims to benchmark light-weight models tailored specifically for the task of Semantic Segmentation in autonomous self-driving vehicles. To fulfil our desired behavior, models must be balanced between both precision, computational efficiency and real-time responsiveness, a crucial requirement for safe and effective autonomous navigation systems.

Installation

Create a virtual environment and install dependencies:

>> python -m venv venv
>> ./venv/Scripts/activate
>> pip install -r requirements.txt

Usage

Training

Testing

To test on CamVid, download the raw videos from here and move them under data/datasets/CamVid/videos. File names are as followed:

01TP_extract.avi
0005VD.mxf
0006R0.mxf
0016E5.MXF (zipped)

Next, run the following script to extract the frames:

python ./data/tools/camvid_video_process.py

Caution

Large Files: Downloading the videos require 8GB of storage, and extracted frames will take up to 25GB of space.

Data

Datasets

There are many Semantic Segmentation datasets available for the task of Autonomous Driving. The following datasets are used in this project:

• Cityscapes (Kaggle)

The Cityscapes Dataset focuses on semantic understanding of urban street scenes. In the following, we give an overview on the design choices that were made to target the dataset’s focus. It involves 5000 fine and 20000 coarse annotated images for 30 semantic classes.

• CamVid (Kaggle)

The Cambridge-driving Labeled Video Database (CamVid) was one of the first semantically segmented datasets to be released in the self-driving space in late 2007. They used their own image annotation software to annotate 700 images from a video sequence of 10 minutes. The camera was set up on the dashboard of a car, with a similar field of view as that of the driver. There are 32 semantic classes for this dataset.

• KITTI (Kaggle)

KITTI consists of 200 semantically annotated train as well as 200 test images corresponding to the KITTI Stereo and Flow Benchmark 2015. The data format and metrics are conform with The Cityscapes Dataset.

• DUS

The Daimler Urban Segmentation Dataset (DUS) is a dataset for semantic segmentation. It consists of video sequences recorded in urban traffic. The dataset consists of 5000 rectified stereo image pairs with a resolution of 1024x440. 500 frames (every 10th frame of the sequence) come with pixel-level semantic class annotations into 5 classes: ground, building, vehicle, pedestrian, sky.

• Mapillary (Kaggle)

Mapillary Vistas Dataset is a diverse street-level imagery dataset with pixel‑accurate and instance‑specific human annotations for understanding street scenes around the world. It contains 25000 high-resolution images and 124 semantic object categories collected from 6 continents with a variety of weather, season, time of day, camera, and viewpoints.

Models

...

Contributing

Thank you for considering contributing to this project! Contributions are welcome and encouraged.

Please ensure that your pull request adheres to the following guidelines:

Describe the problem or feature in detail.
Make sure your code follows the project's coding style and conventions.
Include relevant tests and ensure they pass.
Update the documentation to reflect your changes if necessary.

By contributing to this project, you agree to abide by the Code of Conduct. Thank you for your contributions to making this project better!

License

This project is licensed under the MIT License - see the LICENSE file for details. The MIT License is a permissive open-source license that allows you to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the software. It is a simple and flexible license that is widely used in the open-source community.

By contributing to this project, you agree that your contributions will be licensed under the MIT License unless explicitly stated otherwise.

Contact

If you have any questions, suggestions, or feedback, don't hesitate to get in touch!

Email · LinkedIn · GitHub · StackOverFlow

keivanipchihagh/SSAD