###########################################################################
#   THE KITTI VISION BENCHMARK SUITE: VISUAL ODOMETRY / SLAM BENCHMARK    #
#              Andreas Geiger    Philip Lenz    Raquel Urtasun            #
#                    Karlsruhe Institute of Technology                    #
#                Toyota Technological Institute at Chicago                #
#                             www.cvlibs.net                              #
###########################################################################

This file describes the KITTI visual odometry / SLAM benchmark package.
Accurate ground truth (<10cm) is provided by a GPS/IMU system with RTK
float/integer corrections enabled. In order to enable a fair comparison of
all methods, only ground truth for the sequences 00-10 is made publicly
available. The remaining sequences (11-21) serve as evaluation sequences.

NOTE: WHEN SUBMITTING RESULTS, PLEASE STORE THEM IN THE SAME DATA FORMAT IN
WHICH THE GROUND TRUTH DATA IS PROVIDED (SEE 'POSES' BELOW), USING THE
FILE NAMES 11.txt TO 21.txt. CREATE A ZIP ARCHIVE OF THEM AND STORE YOUR
RESULTS IN ITS ROOT FOLDER.

File description:
=================

Folder 'sequences':

Each folder within the folder 'sequences' contains a single sequence, where
the left and right images are stored in the sub-folders image_0 and
image_1, respectively. The images are provided as greyscale PNG images and
can be loaded with MATLAB or libpng++. All images have been undistorted and
rectified. Sequences 0-10 can be used for training, while results must be
provided for the test sequences 11-21.

Additionally we provide the velodyne point clouds for point-cloud-based
methods. To save space, all scans have been stored as Nx4 float matrix into
a binary file using the following code:

  stream = fopen (dst_file.c_str(),"wb");
  fwrite(data,sizeof(float),4*num,stream);
  fclose(stream);

Here, data contains 4*num values, where the first 3 values correspond to
x,y and z, and the last value is the reflectance information. All scans
are stored row-aligned, meaning that the first 4 values correspond to the
first measurement. Since each scan might potentially have a different
number of points, this must be determined from the file size when reading
the file, where 1e6 is a good enough upper bound on the number of values:

  // allocate 4 MB buffer (only ~130*4*4 KB are needed)
  int32_t num = 1000000;
  float *data = (float*)malloc(num*sizeof(float));

  // pointers
  float *px = data+0;
  float *py = data+1;
  float *pz = data+2;
  float *pr = data+3;

  // load point cloud
  FILE *stream;
  stream = fopen (currFilenameBinary.c_str(),"rb");
  num = fread(data,sizeof(float),num,stream)/4;
  for (int32_t i=0; i<num; i++) {
    point_cloud.points.push_back(tPoint(*px,*py,*pz,*pr));
    px+=4; py+=4; pz+=4; pr+=4;
  }
  fclose(stream);

x,y and y are stored in metric (m) Velodyne coordinates.

IMPORTANT NOTE: Note that the velodyne scanner takes depth measurements
continuously while rotating around its vertical axis (in contrast to the cameras,
which are triggered at a certain point in time). This effect has been
eliminated from this postprocessed data by compensating for the egomotion!!
Note that this is in contrast to the raw data.

The relationship between the camera triggers and the velodyne is the following:
We trigger the cameras when the velodyne is looking exactly forward (into the
direction of the cameras). After compensation, the point cloud data should
correspond to the camera data for all static elements of the scene. Dynamic
ones are still slightly distorted, of course. If you want the raw velodyne
scans, please have a look at the section 'mapping to raw data' below.

The base directory of each folder additionally contains:

calib.txt: Calibration data for the cameras: P0/P1 are the 3x4 projection
matrices after rectification. Here P0 denotes the left and P1 denotes the
right camera. Tr transforms a point from velodyne coordinates into the
left rectified camera coordinate system. In order to map a point X from the
velodyne scanner to a point x in the i'th image plane, you thus have to
transform it like:

  x = Pi * Tr * X

times.txt: Timestamps for each of the synchronized image pairs in seconds,
in case your method reasons about the dynamics of the vehicle.

Folder 'poses':

The folder 'poses' contains the ground truth poses (trajectory) for the
first 11 sequences. This information can be used for training/tuning your
method. Each file xx.txt contains a N x 12 table, where N is the number of
frames of this sequence. Row i represents the i'th pose of the left camera
coordinate system (i.e., z pointing forwards) via a 3x4 transformation
matrix. The matrices are stored in row aligned order (the first entries
correspond to the first row), and take a point in the i'th coordinate
system and project it into the first (=0th) coordinate system. Hence, the
translational part (3x1 vector of column 4) corresponds to the pose of the
left camera coordinate system in the i'th frame with respect to the first
(=0th) frame. Your submission results must be provided using the same data
format.

Mapping to Raw Data
===================

Note that this section is additional to the benchmark, and not required for
solving the object detection task.

In order to allow the usage of the laser point clouds, gps data, the right
camera image and the grayscale images for the TRAINING data as well, we
provide the mapping of the training set to the raw data of the KITTI dataset.
The following table lists the name, start and end frame of each sequence that
has been used to extract the visual odometry / SLAM training set:

Nr.     Sequence name     Start   End
---------------------------------------
00: 2011_10_03_drive_0027 000000 004540
01: 2011_10_03_drive_0042 000000 001100
02: 2011_10_03_drive_0034 000000 004660
03: 2011_09_26_drive_0067 000000 000800
04: 2011_09_30_drive_0016 000000 000270
05: 2011_09_30_drive_0018 000000 002760
06: 2011_09_30_drive_0020 000000 001100
07: 2011_09_30_drive_0027 000000 001100
08: 2011_09_30_drive_0028 001100 005170
09: 2011_09_30_drive_0033 000000 001590
10: 2011_09_30_drive_0034 000000 001200

The raw sequences can be downloaded from
http://www.cvlibs.net/datasets/kitti/raw_data.php
in the respective category (mostly: Residential).

Evaluation Code:
================

For transparency we have included the KITTI evaluation code in the
subfolder 'cpp' of this development kit. It can be compiled via:

g++ -O3 -DNDEBUG -o evaluate_odometry evaluate_odometry.cpp matrix.cpp