VisualComputingInstitute/TrackR-CNN

Forwarding and tuning of TrackR-CNN

jong372 opened this issue · 6 comments

Currently I trained my algorithm on my own dataset. Next I would like to forward and tune the datasets. Nevertheless, I have some doubts about this. @ahnonay @pvoigtlaender

  1. What's the purpose of the forwarding process?
  2. How do I tune my training dataset by using a validation set (images+instances)? Because in the ReadMe it's not given how to use a separate validation set (e.g with the path KITTI_MOTS/data/testing/(instances+images)/(0000,0001,0002)) in order to tune the parameters of the forwarded output (created on the training dataset)?
  3. If tuning can only be performed on the training dataset how can the MOTS metrics (MOTSA, MOTSP, sMOTSA) be calculated on the validation set?
  4. After tuning, are the weights of the forwarding and tracking somehow updated in order to visualize new tracks?
  5. How can I visualize the predicted masks and tracks on the validation dataset by using the output parameters of the tuned model?
  6. Which variables in your code belong to the γ, β, δ, α in your paper? In order to adapt the ranges of these parameters within the segtrack_tune_experiment.py to get less id switches. Are these parameters only used for the Forwarding and Tracking step? Or also for the Training step?
  7. Which of the above variables are highly affecting the ID switches? Currently I obtain MOTSA values between (-15, -80) when I'm tracking apples (smaller area compared to cars/pedestrians) annotated on 25FPS.
Scores(sMOTSA_car=-95.2, sMOTSA_ped=-inf, MOTSA_car=-85.9, MOTSA_ped=-inf, MOTSP_car=78.9, MOTSP_ped=inf, IDS_car=723, IDS_ped=0)

[('0006', 101), ('0007', 103)] False

Scores(sMOTSA_car=-69.9, sMOTSA_ped=-inf, MOTSA_car=-61.8, MOTSA_ped=-inf, MOTSP_car=79.6, MOTSP_ped=inf, IDS_car=985, IDS_ped=0)

[('0006', 101), ('0007', 103)] False

Thanks in advance!

Hi,

  1. Forwarding means evaluating the network on the given video sequences and creating the tracking output
  2. Can't you just give the tuning script the path to the validation set instead of the training set?
  3. No, tuning can also be done on the validation set, you just need to invoke the tuning script with other parameters. You can calculate the MOTS metrics using the mots tools https://github.com/VisualComputingInstitute/mots_tools
  4. The tuning process will determine the best set of hyperparameters, and these are then used to evaluate on the final dataset split (test set)
  5. First run forwarding on the validation set to create the output, then use mots tools to create a visualization
  6. All the parameters are here: https://github.com/VisualComputingInstitute/TrackR-CNN/blob/master/scripts/segtrack_tune_experiment.py#L90 Unfortunately I don't have the exact correspondence now
  7. reid_weight_car and mask_iou_weight_car should be quite important for id switches. In general one of the most important hyperparameters is the detection confidence threshold (detection_confidence_threshold_car) although it relates less to ID switches

Thanks a lot @pvoigtlaender! I have some additions to this:

  1. I guess it's possible to add both your training and validation datasets to the instances map and by changing the train.seqmap and val.seqmap to the corresponding folders, e.g. 0006, 0007 for training and 0008 for validation. Is it true that the output below then first describes the best parameters of tuning on the training set (0006, 0007) and then using these parameters to generate the MOTS metrics on the validation set (0008)?
Best CAR sMOTSA train -49.1 Settings: {'tracker': 'hungarian', 'reid_comp': 'euclidean', 'detection_confidence_threshold_car': 0.8496067731735859, 'reid_weight_car': 0.0, 'mask_iou_weight_car': 0.0, 'bbox_center_weight_car': 1.0, 'bbox_iou_weight_car': 0.0, 'association_threshold_car': 0.04012652020683272, 'keep_alive_car': 0, 'reid_euclidean_offset_car': 5.0, 'reid_euclidean_scale_car': 1.0, 'new_reid_threshold_car': 2.0, 'box_offset': 54.438720812209155, 'box_scale': 0.011915189785745486, 'new_reid': False}
[('0008', 322)] False
Scores(sMOTSA_car=-98.6, sMOTSA_ped=-inf, MOTSA_car=-98.5, MOTSA_ped=-inf, MOTSP_car=57.4, MOTSP_ped=inf, IDS_car=2, IDS_ped=0)
Val scores sMOTSA -98.6 MOTSA -98.5 MOTSP 57.4 IDS 2
---
Best PED sMOTSA train -inf Settings: {'tracker': 'hungarian', 'reid_comp': 'euclidean', 'detection_confidence_threshold_pedestrian': 0.8184689690082664, 'reid_weight_pedestrian': 0.0, 'mask_iou_weight_pedestrian': 0.0, 'bbox_center_weight_pedestrian': 1.0, 'bbox_iou_weight_pedestrian': 0.0, 'association_threshold_pedestrian': 0.12639446927003167, 'keep_alive_pedestrian': 0, 'reid_euclidean_offset_pedestrian': 5.0, 'reid_euclidean_scale_pedestrian': 1.0, 'new_reid_threshold_pedestrian': 2.0, 'box_offset': 52.70180414820998, 'box_scale': 0.017683827806868228, 'new_reid': False}
[('0008', 322)] False
Scores(sMOTSA_car=-112.1, sMOTSA_ped=-inf, MOTSA_car=-111.9, MOTSA_ped=-inf, MOTSP_car=57.7, MOTSP_ped=inf, IDS_car=2, IDS_ped=0)
Val scores sMOTSA -inf MOTSA -inf MOTSP inf IDS 0
  1. The tuned parameters (see code above) then first need to be manually added to the forwarding script?
  2. Okay, then I will just make the ranges bigger. If you encounter the exact correspondence between the paper and the code please let me know. Because it can also be valuable for future research for others.

Hey Stefan,

  1. The segtrack_tune_experiment script takes two parameters: seq_map_train and seq_map_val. The tuning is done on seq_map_train. Then, the best parameters on seq_map_train are used to evaluate on seq_map_val. So for your case, you should put your validation set into seq_map_train and your test set into seq_map_val.

Is it true that the output below then first describes the best parameters of tuning on the training set (0006, 0007) and then using these parameters to generate the MOTS metrics on the validation set (0008)?

Yes, that is correct.

  1. You can take the output from the tuning script and put it into your config. For example, configs/conv3d_sep2 contains our tuned values for KITTI mots. In your case, you want to first tune on your validation set, take the output and put it into your config file and then run forwarding + tracking on your test set. This way, it will use your updated values from the config for tracking on the test set.

  2. alpha is used during training, it is the margin for the triplet loss. See "reid_loss_margin" in FasterRCNN.py. beta, gamma, and delta are parameters of the tracking algorithm, so they are !not! used during training. Their ranges are defined in tune_model in segtrack_tune_experiment.py. Regarding the correspondences:
    gamma - detection_confidence_threshold
    delta - association_threshold
    beta - keep_alive

Btw, earlier you asked why we use a smaller image size for training than for inference. The reason is, as Paul already mentioned, that training needs more GPU memory than inference, so we could not input huge images during training. On the other hand, we wanted to have "best possible" results during inference, so we tried to avoid downscaling the input images there.

@ahnonay Thanks a lot for your extensive answers!
There are also some variables which are not explained in the paper but which are tuned. What do these variables mean and how did you determine the range of them?
It concerns the following variables in the tuning script:

  1. reid/mask/bbox_weight
  2. reid/box_offset
  3. reid/box_scale

Hey Stefan,

the _weight parameters concern an experiment that didn't make it into the paper (basically, you can compute association scores as a linear combintaion of mask propagation, association vector similarities and bounding box distance scores) -> not important

the box_offset/box_scale parameters are only necessary when you track using distances of bounding boxes (then the euclidean distances between bounding boxes are computed and rescaled using these parameters) -> this was only used for a minor comparison experiment in the paper, I think

The reid_offset and reid_scale parameters are actually important, sorry I forgot about them before. When the distances between association vectors are taken, we rescale them according to these parameters:
rescaled_distances = scale * (offset - raw_distances)

The ranges for all these tuning parameters where determined empirically (basically, you can also set a larger range and tune for more iterations).

Many thanks! This helps me a lot for implementing it correctly (and based on certain decisions).