/dtc-KITTI-For-Beginners

Brief explanations about KITTI dataset

Primary LanguagePythonMIT LicenseMIT

dtc-KITTI-For-Beginners

License: MIT   Codacy Badge

Some explanations about KITTI dataset that might be helpful for beginners.

KITTI official website: http://www.cvlibs.net/datasets/kitti/

Raw Data

KITTI include many different types of data, e.g., RGB cameras for images, Velodyne Laserscanner for point clouds, sensor data (e.g., GPS, acceleration). Details are given here.

For most of computer-vision works, the data of interest are

  • 2D RGB images
  • Labels
  • 3D point clouds
  • Calibration files for synchronizing images and point clouds

Illustration

Let's have a quick look at the 2D images and 3D point clouds. The download page is 3D Object Detection Evaluation 2017

  • Left color images of object dataset (12 GB)
  • Velodyne point clouds (29 GB)

After uncompressing, we'll have two folders data_object_image_2 and data_object_velodyne.

data_object_image_2/training/image_2/000000.png is shown below

We can use the pptk tool to show point clouds.

pip install pptk

A python script to show the point cloud (python/point_cloud_viewer.py)

import numpy as np
import pptk


path_to_point_cloud = 'KITTI/data_object_velodyne/training/velodyne/000000.bin'

point_cloud_data = np.fromfile(path_to_point_cloud, '<f4')  # little-endian float32
point_cloud_data = np.reshape(point_cloud_data, (-1, 4))    # x, y, z, r
pptk.viewer(point_cloud_data[:, :3])

data_object_velodyne/training/velodyne/000000.bin is shown below (Can you see the person walking on street?)

Note: if you are using Ubuntu 18.04 and your program hangs without showing the viewer window, try fix it following heremaps/pptk#3. Something like below if you are using Anaconda

cd $CONDA_PREFIX/lib/python3.7/site-packages/pptk/libs
mv libz.so.1 libz.so.1.old
sudo ln -s /lib/x86_64-linux-gnu/libz.so.1

Cropping Point Clouds

The point clouds are scanned in 360 degrees while the RGB cameras are not (they have a much narrower field of view). In addition, KITTI only provides labels for objects that are within the images. Therefore, we usually need to remove points outside the image coordinates.

Download data from here

  • Left color images of object data (12 GB)
  • Velodyne point clouds (29 GB)
  • Camera calibration matrices of object data set (16 MB): synchronizing different data sources, e.g., camera, point clouds.

python/crop_point_cloud.py is a script from this github that can be used for this purpose. You only need to specify the paths to the training folders of images, point clouds, camera calibration, and the path you want to save the cropped point cloud.

# path to data_object_image_2/training/image_2
IMG_ROOT = '/home/dtc/Data/KITTI/data_object_image_2/training/image_2/'
# path to data_object_velodyne/training/velodyne
PC_ROOT = '/home/dtc/Data/KITTI/data_object_velodyne/training/velodyne/'
# path to data_object_calib/training/calib
CALIB_ROOT = '/home/dtc/Data/KITTI/data_object_calib/training/calib/'
# path to the folder for saving cropped point clouds
SAVE_ROOT = '/home/dtc/Data/KITTI/save'

The cropped 000000.bin point cloud is shown below, which only includes points within the image coordinates. In this example, the number of points is reduced from 115384 to 19030.

Labels

KITTI provides labels (location of objects, etc.) for both 2D images and 3D point clouds. Download labels from here

  • training labels of object data set (5 MB)

An example data_object_label_2/training/label_2/000000.txt, meaning that there is a pedestrian in this data frame

Pedestrian 0.00 0 -0.20 712.40 143.00 810.73 307.92 1.89 0.48 1.20 1.84 1.47 8.41 0.01

The formats (could also refer here): 15 columns.

  1. (column 0). There are 9 types of objects. The corresponding counts are tabulated below. See python/label_info.py
Car DontCare Pedestrian Van Cyclist Truck Misc Tram Person_sitting
28742 11295 4487 2914 1627 1094 973 511 222
  1. (column 1). Float from 0 (non-truncated) to 1 (truncated), where truncated refers to the object leaving image boundaries
  2. (column 2). Integer (0,1,2,3) indicating occlusion state: 0 = fully visible, 1 = partly occluded, 2 = largely occluded, 3 = unknown
  3. (column 3). Observation angle of object, ranging [-pi..pi]
  4. (column 4-7). 2D bounding box of object in the image (0-based index): contains left, top, right, bottom pixel coordinates
  5. (column 8-10). 3D object dimensions: height, width, length (in meters)
  6. (column 11-13). The center location x, y, z of the 3D object in camera coordinates (in meters)
  7. (column 14). Rotation ry around Y-axis in camera coordinates [-pi..pi]

Labeled Objects in 2D Images

Let's add boxes around the objects in the images. The label information we need is column 0 (type of object), and column 4-7 (left, top, right, and bottom pixel coordinates in the image). show_object_in_image() in python/object_viewer.py can be used to draw objects in 2D images.

Need data if you haven't downloaded from here.

  • Left color images of object data (12 GB)
  • Training labels of object data set (5 MB)

Showcase: 000000.png only has one object of interest, which is boxed.

Labeled Objects in 3D Point Cloud

To label 3D point clouds, we need label information of column 0 (type of object), columns 8-10 (height, width, and length), columns 11-13 (center location of x, y, z), and column 14 (rotation around axis y). However, these location information are given in RGB camera coordinate, and thus we need to project them into the point cloud coordinate.

We need data

  • Velodyne point clouds (29 GB)
  • Camera calibration matrices of object data set (16 MB)
  • Training labels of object data set (5 MB)

show_object_in_point_cloud() in python/object_view.py shows the labeled object in point clouds.

Showcase: 000000.bin

Point Cloud Data Format

KITTI point cloud is a (x, y, z, r) point cloud, where (x, y, z) is the 3D coordinates and r is the reflectance value. Each value is in 4-byte float. The raw data is in the form of [x0 y0 z0 r0 x1 y1 z1 r1 ...]. Below are the codes to read point cloud in python, C/C++, and matlab.

Python code

import numpy as np

pc_data = np.fromfile(point_cloud_filename, '<f4')
pc_data = np.reshape(pc_data, (-1, 4))

C++ code

std::FILE *pFile = fopen(point_cloud_filename, "rb");
fseek(pFile, 0, SEEK_END);    // file pointer goes to the end of the file
long fileSize = ftell(pFile); // file size
rewind(pFile);                // rewind file pointer to the beginning
float *rawData = new float[fileSize];
fread(rawData, sizeof(float), fileSize/sizeof(float), pFile);
long number_of_points = fileSize / 4 / sizeof(float);
float **pc_data = new float*[number_of_points];
for (int i = 0; i < number_of_points; i++) {
  pc_data[i] = new float[4];
  for (int j = 0; j < 4; j++) {
    pc_data[i][j] = rawData[4*i+j];
  }
}

matlab code

fileId = fopen(point_cloud_filename, 'r');
pc_data = fread(fileId, 'float');
pc_data = reshape(pc_data, length(pc_data)/4, 4);