/dgp

ML Dataset Governance Policy for Autonomous Vehicle Datasets

Primary LanguagePythonMIT LicenseMIT

Dataset Governance Policy (DGP)

build-docker license open-issues coverage badge docs

To ensure the traceability, reproducibility and standardization for all ML datasets and models generated and consumed within Toyota Research Institute (TRI), we developed the Dataset-Governance-Policy (DGP) that codifies the schema and maintenance of all TRI's Autonomous Vehicle (AV) datasets.

3d-viz-proj

Components

  • Schema: Protobuf-based schemas for raw data, annotations and dataset management.
  • DataLoaders: Universal PyTorch DatasetClass to load all DGP-compliant datasets.
  • CLI: Main CLI for handling DGP datasets and the entrypoint of visulization tools.

Getting Started

Please see Getting Started for environment setup.

Getting started is as simple as initializing a dataset-class with the relevant dataset JSON, raw data sensor names, annotation types, and split information. Below, we show a few examples of initializing a Pytorch dataset for multi-modal learning from 2D bounding boxes, and 3D bounding boxes.

from dgp.datasets import SynchronizedSceneDataset

# Load synchronized pairs of camera and lidar frames, with 2d and 3d
# bounding box annotations.
dataset = SynchronizedSceneDataset('<dataset_name>_v0.0.json',
    datum_names=('camera_01', 'lidar'),
    requested_annotations=('bounding_box_2d', 'bounding_box_3d'),
    split='train')

Examples

A list of starter scripts are provided in the examples directory.

  • examples/load_dataset.py: Simple example script to load a multi-modal dataset based on the Getting Started section above.

Build and run tests

You can build the base docker image and run the tests within docker container via:

make docker-build
make docker-run-tests

Contributing

We appreciate all contributions to DGP! To learn more about making a contribution to DGP, please see Contribution Guidelines.

CI Ecosystem

Job CI Notes
docker-build Build Status Docker build and push to container registry
pre-merge Build Status Pre-merge testing
doc-gen Build Status GitHub Pages doc generation
coverage Build Status Code coverage metrics and badge generation

💬 Where to file bug reports

Type Platforms
🚨 Bug Reports GitHub Issue Tracker
🎁 Feature Requests GitHub Issue Tracker

👩‍💻 The Team 👨‍💻

DGP is developed and currently maintained by Quincy Chen, Arjun Bhargava, Chao Fang, Chris Ochoa and Kuan-Hui Lee from ML-Engineering team at Toyota Research Institute (TRI), with contributions coming from ML-Research team at TRI, Woven Planet and Parallel Domain.