Goal: Get ground truth bounding boxes given a stereo video + camera calibration parameters.
On the bottom-left of the interface you can see the ids of the current image pair Im:
and bounding box Id:
. You can change the keys according to your preferance by editing the config.yaml file. It is also in config.yaml where you set the input directory, video and calibration parameters.
You will be labelling the bounding boxes by clicking on the same target in both images. You want to guarantee that the centre of the bounding box is always targeting the same tissue region.
For the SurgT challenge, the annotators are never allowed to use i
nterpolation, automation or external intervention to aid their labelling!
For each target bounding box:
-
Set
is_visible_in_both_stereo = False
if the target's centre point is (1) fully out-of-view or (2) fully occluded in the left or in the right image. In other words, when the target's centre is compelety NOT visible in either of the stereo images. Otherwiseis_visible_in_both_stereo = True
. -
Set
is_difficult = True
if the target's centre (1) annotation is too difficult, or there are (2) conflicting opinions between annotators, or (3) there is fast motion of the camera or the tissue. The frames that are marked as difficult (is_difficult = True
) do not affect any of the metrics/scores. In practice, in our benchmarking tool, the difficult frames are simply ignored. The reason for marking frame-pairs as difficult during (3) fast motion of the camera or tissue, is that it is possible to have error in the stereo-camera's synchronization. Therefore, the left and right image may be captured at slightly different timestamps, which is a problem during fast-camera-motion since the target will no longer be imaged consistently between the stereo images. For example the target may no longer be row-aligned as it is expected in rectified images. The faster the motion, the easier it is to notice synchronization errors. By labeling these image-pairs asis_difficult = True
they are correctly ignored by the benchmarking tool.
The usage idea is the following:
- The annotator should watch the entire video and decide which keypoint will be labelled next. We recommend the annotator to choose a keypoint that is easy to label throughout the video;
- Then, the annotator should classify
is_visible_in_both_stereo
for all the frame-pairs of the video. This can be done by pressingv
over an image, orv
over a range of images. A big redX
will be draw over the images withis_visible_in_both_stereo = False
; - Then the annotator should labell the keypoint in all the remaining images, where
is_visible_in_both_stereo = True
. All the keypoints should respect:- The annotator should ensure that the keypoint is mapped accurately and corresponds to the same target in both stereo images;
- The annotator should also look back at the previous frame in the video sequence to ensure temporal video consistency in labelling;
- If it the keypoint is difficult to label, according to the definition above, then the annotator should set
is_difficult = True
. This can be done by pressingm
tom
ark the bounding box as a difficult one. You will notice a big red\
drawn in the image whenis_difficult = True
.
- Steps 1. to 3. should be repeated if you want to labell multiple keypoints per video. Once you are satisfied with the labeling of that keypoint in the entire stereo video, you can go back to the img 0, press
w
to select the next keypoint id, and go back to step 1. to start labeling the next keypoint. - The annotations should be reviewed by another annotator.
- Finally, once the labelling is reviewed press
g
to generate theg
round truth.
First select the bounding box that you want to delete. By default the selected bounding box is shown in red. Then press e
(standing for e
liminate).
If you want to set is_visible_in_both_stereo = False
to a range of pictures you can again use r
(standing for r
ange). After pressing r
just use a
and d
to select the desired range of images. Then press v
to toggle the visibility. The same logic would apply vice-versa.
You can also use range to e
liminate multiple images, m
ark as is_difficult
.
The middle mouse can be used for zoom-in and zoom-out of the images, however, it is more practical to use the zoom mode. The zoom mode allows you to labell faster by focusing on the area around the keypoints. Labell a pair of keypoints and you will notice a blue rectangle around them, if you press z
(standing for z
oom) you will zoom in or out of that blue rectangle. In zoom mode you can also re-adjust the bounding boxes by clicking again. Give it a try!
I recommend you to create a Python virtual environment:
python3.9 -m pip install --user virtualenv
python3.9 -m virtualenv venv
Then you can activate that environment and install the requirements using:
source venv/bin/activate
pip install -r requirements.txt
Now, when the venv
is activated you can run the code using:
python main.py