hobbitsyfeet/PeopleTracker

Evaluation TODO:

Opened this issue · 1 comments

  • Export Tracked Predictions into a CSV (Done, December 6 839816c)

  • Apply evaluation to MRCNN predictions (Tracked Predictions) (Done, December 6 839816c)

  • Export evaluation results to be more interpretable so we can extract what errors happen when - For explainability (Completed Dec 7 d53d16a)

  • Track time elapsed for datalogger on both user and time to predict. Record hardware device used for BOTH. This will make an impact. (Mask RCNN, duration is recorded through datalogger on the user before by using the time between first and last interaction. c36832c)

  • Create a procedure document on the steps needed to follow while evaluating.


For occlusion we will track occluding objects. Artworks, Tree, Bush categories. On static objects in static videos, track one frame and propagate this data onto the video length. Use this for occlusion metric. Problem: Load separately as occluding objects csv. All objects not created equal. PeopleTracker vs occluding objects, not the ground truth. A single region/tracker specifically to describe the entire occluding object. This will represent a region where errors are more likely to occur. (Include area)

  • Open/on ground (rivers/field)
  • Bush terrain
  • Tree Terrain (not included)
    Number of occluding objects (Video complexity?)

When occlusion between 2 ground truths, it is only inhibiting MO and MT errors, we want to record when this happens per frame.


MaksRCNN train on database type.
Cross Validation Split by gallery (half) 3 training, 2 testing
Or 4 Training 1 testing (leave one out)

Use the best model with least errors for Mask RCNN prediction assistance in user labelling.
Sum of False Positive + False Negative errors. We do not look at FIO and FIT errors because that metric is dependent on the centrefold tracker portion which is not directly dependent on tracker quality.

MAYBE:
Evaluate on a range of percentages. Threshold_tc given F-Measure(2PR)/(P+R) P=Precision, R=Recall.
Higher F-Measure the better the score. If we lower the score threshold by 10% we can see how inaccurate the scores are overall. If False Positives and False Negatives drop with a slight change, then our measurements are not that far off.

Run a couple videos with procedure, get a sense of what the analysis is saying or what it needs to say.

  • Error summary
  • Look at tracking errors and where they happen in the video to try and explain them with the video characteristics.
  • Where each tracking method excels (where does it cause issues, where does it actually help.
  • Look at the sum of interaction types, duration etc.