Computer Vision Datasets

Autonomous Driving/ADAS

Argoverse 2 Sensor Dataset Argoverse 2 Motion Forecasting Dataset Argoverse 2 Lidar Dataset Argoverse 2 Map Change Dataset
1,000 3D annotated scenarios with lidar, stereo imagery, and ring camera imagery.
This dataset improves upon the Argoverse 1 3D Tracking dataset.
250,000 scenarios with trajectory data for many object types.
This dataset improves upon the Argoverse 1 Motion Forecasting Dataset.
20,000 unannotated lidar sequences. 1,000 scenarios, 200 of which depict real-world HD map changes
  • 5000 images with high quality annotations
  • 20000 images with coarse annotations
  • 50 different cities

All-In-One Drive, A Large-Scale Comprehensive Perception Dataset with High-Density Long-Range Point Clouds

  • Full sensor suite (3x LiDAR, 1x SPAD-LiDAR, 4x Radar, 5x RGB, 5x depth camera, IMU, GPS)
  • 100 sequences with 1000 frames (100s) each
  • 500,000 annotated images for 5 camera viewpoints
  • 100,000 annotated frames for each LiDAR/Radar sensor
  • 26M 2D/3D bounding boxes precisely annotated for 4 object classes (car, cyclist, motorcycle, pedestrian)

DAIR-V2X, The world's first vehicle-road collaboration dataset release

  • Totally 71254 LiDAR frames and 71254 Camera images:

    DAIR-V2X Cooperative Dataset
    (DAIR-V2X-C)
    DAIR-V2X Infrastructure Dataset
    (DAIR-V2X-I)
    DAIR-V2X Vehicle Dataset
    (DAIR-V2X-V)
    38845 LiDAR frames
    38845 Camera images
    10084 LiDAR frames
    10084 Camera images
    22325 LiDAR frames
    22325 Camera images
  • The training and validation scenes are 5 or 10 seconds long and consist of 50 or 100 samples with corresponding Luminar-H2 pointcloud and six image frames including intrinsic and extrinsic calibration.
  • The training set contains 150 scenes with a total of 12650 individual samples (75900 RGB images), and the validation set contains 50 scenes with a total of 3950 samples (23700 RGB images).
  • train+val 257 GB
  • The dataset features 2D semantic segmentation, 3D point clouds, 3D bounding boxes, and vehicle bus data
  • Sensor setup:
Five LiDAR sensors Front centre camera Surround cameras (5x)
  • Up to 100 m range
  • +/- 3 cm accuracy
  • 16 channels
  • 10 Hz rotation rate
  • 360° horizontal field of view
  • +/- 15° vertical field of view
  • 1920 × 1208 resolution
  • 60° horizontal field of view
  • 38° vertical field of view
  • 30 fps framerate
  • 1920 × 1208 resolution
  • 120° horizontal view angle
  • 73° vertical view angle
  • 30 fps framerate

The ONCE dataset is a large-scale autonomous driving dataset with 2D&3D object annotations.

  • 1 Million LiDAR frames, 7 Million camera images
  • 200 km² driving regions, 144 driving hours
  • 15k fully annotated scenes with 5 classes (Car, Bus, Truck, Pedestrian, Cyclist)
  • Diverse environments (day/night, sunny/rainy, urban/suburban areas)

Thermal Imaging

  • A total of 26,442 fully annotated frames with 520,000 bounding box annotations across 15 different object categories
  • 9,711 thermal and 9,233 RGB training/validation images with a suggested training/validation split. Includes 16-bit pre-AGC frames
  • 7,498 total video frames recorded at 24Hz. 1:1 match between thermal and visible frames. Includes 16-bit pre-AGC frames

Motion Planning

Synthetic Dataset

  • SYNTHIA consists of a collection of photo-realistic frames rendered from a virtual city and comes with precise pixel-level semantic annotations for 13 classes: misc, sky, building, road, sidewalk, fence, vegetation, pole, car, sign, pedestrian, cyclist, lane-marking.

Face

Human

  • Crowdhuman

Video Object Segmentation

  1. Occluded Video Instance Segmentation (OVIS)
  • Highlights: occluded video instances
  • OVIS consists of 296k high-quality instance masks from 25 semantic categories, where object occlusions usually occur.
  1. VIPSeg DataSet: A large-scale VIdeo Panoptic Segmentation dataset
  • CVPR 2022: Large-scale Video Panoptic Segmentation in the Wild: A Benchmark

Dataset Search Engine

  1. BIFROST