/Duke-DeepCC

Fork of https://github.com/ergysr/DeepCC

Primary LanguagePython

DeepCC

Features for Multi-Target Multi-Camera Tracking and Re-Identification. CVPR 2018

Ergys Ristani, Carlo Tomasi

[Paper] [Spotlight] [PhD Thesis] [PhD Slides] [DukeMTMC Project] [BibTeX]


Multi-Target Multi-Camera Tracking (MTMCT) is the problem of determining who is where at all times given a set of video streams as input. The output is a set of person trajectories. Person re-identification (ReID) is a closely related problem. Given a query image of a person, the goal is to retrieve from a database of images taken by different cameras the images where the same person appears.

In this repository, we provide MATLAB code to run and evaluate our tracker, as well as Tensorflow code to learn appearance features with our weighted triplet loss. This code has been written over the past years as part of my PhD research, initially for multi-target tracking by correlation clustering (BIPCC), and lately extended to use deep features in multi-camera settings (DeepCC). We additionally provide tools to download and interact with the DukeMTMC dataset.


Downloading the data

DukeMTMC

After cloning this repository you need to download the DukeMTMC dataset. Specify a folder of your choice in src/duke/downloadDukeMTMC.m and run the relevant parts of the script, omitting the cells which are tagged optional. For the tracker to run you only need to download videos, OpenPose detections, and precomputed detection features.

Please be patient as you are downloading ~160 GB of data. [md5sum]


Running the tracker

As a first step you need to set up the dataset root directory. Edit the following line in get_opts.m:

opts.dataset_path = 'F:/DukeMTMC/';

Dependencies

Clone mexopencv in src/external/ and follow its installation instructions. This interface is used to read images directly from the Duke videos.

Pre-computed features

Download the pre-computed features into experiments/demo/L0-features/.

Compiling

Run compile to obtain mex files for the solvers and helper functions.

Training an appearance model

To train and evaluate our appearance model which employs the weighted triplet loss, first download resnet_v1_50.ckpt in src/triplet-reid/. Then install imgaug. Modify the images folder net.image_root = 'F:/DukeMTMC/DukeMTMC-reID'; accordingly in get_opts.m, Finally run:

mkdir('src/triplet-reid/experiments/')
opts = get_opts();
train_duke(opts);
embed(opts);
evaluation_res_duke_fast(opts);

The code will run 25,000 training iterations, compute embeddings for query and gallery images of the DukeMTMC-reID benchmark, and finally print the mAP and rank-1 score. The above functions are MATLAB interfaces to the Tensorflow/Python3 code of Beyer et al. The code has been extended to include our weighted triplet loss.

Alternatively you can run train_duke_hnm to train with hard negative mining.

Once you train a model, you can analyze the distribution of distances between features to obtain a separation threshold:

view_distance_distribution(opts);

Optionally, you can use features = embed_detections(opts, detections); to compute features for a set of detections in the format [camera, frame, left, top, width, height];. A usage example can be found in compute_L0_features.m.

Running DeepCC

Run demo and you will see output logs while the tracker is running. When the tracker completes, you will see the quantitative evaluation results for the sequence trainval-mini.

Note on solvers

The graph solver is set in opts.optimization. By default Correlation Clustering by a Binary Integer Program (BIPCC) is used. It solves every graph instance optimally by relying on the Gurobi solver, for which an academic license may be obtained for free.

opts.optimization = 'BIPCC'; 
opts.gurobi_path = 'C:/gurobi800/win64/matlab';

If you don't want to use Gurobi, we also provide two existing approximate solvers: Adaptive Label Iterative Conditional Models (AL-ICM) and Kernighan-Lin (KL). From our experience, the best trade-off between accuracy and speed is achieved with option 'KL'.

Understanding errors

To gain qualitative insights why the tracker fails you can run render_results(opts)

This will generate movies with the rendered trajectories validated against ground truth using the ID measures. Color-coded tails with IDTP, IDFP and IDFN give an intuition for the tracker's failures. The movies will be placed under experiments/demo/video-results.

Visualization

To visualize the detections you can run the demo show_detections.

You can run render_trajectories_top or render_trajectories_side to generate a video animation similar to the gif playing at the top of this page.

To generate ID Precision/Recall plots like in the state of the art section see render_state_of_the_art. Make sure that you update the files provided in src/visualization/data/duke_*_scores.txt with the latest MOTChallenge submissions. The provided scores are only supplied as a reference.

State of the art

The state of the art for DukeMTMC is available on MOTChallenge. Submission instructions can be found on this page.

The original submission file duke.txt can be downloaded here. Results from the released tracker may differ from submission time due changes in code and settings. Once you are happy with the performance of your extensions to DeepCC, run prepareMOTChallengeSubmission(opts) to obtain a submission file duke.txt for MOTChallenge.

Remarks

MTMCT and ReID problems differ subtly but fundamentally. In MTMCT the decisions made by the tracker are hard: Two person images either have the same identity or not. In ReID the decisions are soft: The gallery images are ranked without making hard decisions. MTMCT training requires a loss that correctly classifies all pairs of observations. ReID instead only requires a loss that correctly ranks a pair of images by which is most similar to the query. Below I illustrate two ideal feature spaces, one for ReID and one for MTMCT, and argue that the MTMCT classification condition is stronger.

In MTMCT the ideal feature space should satisfy the classification condition globally, meaning that the largest class variance among all identities should be smaller than the smallest separation margin between any pair of identities. When this condition holds, a threshold (the maximum class variance) can be found to correctly classify any pair of features as co-identical or not. The classification condition also implies correct ranking in ReID for any given query.

For correct ranking in ReID it is sufficient that for any query the positive examples are ranked higher than all the negative examples. In the above example the ranking condition is satisfied and guarantees correct ReID ranking for any query. Yet there exists no threshold that correctly classifies all pairs. Therefore the ReID ranking condition is subsumed by the MTMCT classification condition.

Citing

If this code helps your research, please cite the following work which made it possible.

@phdthesis{ristani2018thesis,
  author       = {Ergys Ristani}, 
  title        = {People Tracking and Re-Identification from Multiple Cameras},
  school       = {Duke University},
  year         = {2018}
}

@inproceedings{ristani2018features,
  title =        {Features for Multi-Target Multi-Camera Tracking and Re-Identification},
  author =       {Ristani, Ergys and Tomasi, Carlo},
  booktitle =    {Conference on Computer Vision and Pattern Recognition},
  year =         {2018}
}

@inproceedings{ristani2016MTMC,
  title =        {Performance Measures and a Data Set for Multi-Target, Multi-Camera Tracking},
  author =       {Ristani, Ergys and Solera, Francesco and Zou, Roger and Cucchiara, Rita and Tomasi, Carlo},
  booktitle =    {European Conference on Computer Vision workshop on Benchmarking Multi-Target Tracking},
  year =         {2016}
}

@inproceedings{ristani2014tracking,
  title =        {Tracking Multiple People Online and in Real Time},
  author =       {Ristani, Ergys and Tomasi, Carlo},
  booktitle =    {Asian Conference on Computer Vision},
  year =         {2014},
  pages =        {444--459},
  organization = {Springer}
}