pieterblok/boxal

what is the best way to resume a training?

Closed this issue · 10 comments

Dear Pieter, First of all, thank you very much for sharing this repository with me.

I am currently using it, and it works fine with my data. However, I encountered a minor issue. Since it takes time for me to prepare new annotations for the next iteration, there is a possibility that the program is stopped during this period. What do you recommend as the best way to resume a running process in such cases?

Based on my current understanding, I have thought of a potential solution. My plan is to place the new annotations in the initial_train directory alongside the original initial data. Then, I would rerun the algorithm while setting the pretrained_weights to the weights of the last run that was stopped.

Please let me know if this approach seems reasonable or if you have any other suggestions to handle this situation more effectively. I greatly appreciate your guidance.

Hello @pkhateri

Thank you for using BoxAL, and happy to hear that it works for your data.

Your suggestion sounds reasonable. I haven't tried it myself, but based on the options in the YAML file, this can be achieved.

Let me know if it works (or not).

Good luck!

PS. From now on the BoxAL repository is public 💻🚀

Thanks very much for the quick reply!

I tried it and it seems to work. I have only one more related question: How can I see the inference results after each loop? The inference results are generated in the output directory only if we continue to the last iteration. But in my case, the last iteration never happens because the algorithm is stopped.

PS: That's great news! Congratulations :)

@pkhateri As far as I remember, after each training the performance is evaluated on the independent test resulting the inference.json file in the output folder. Then the sampling starts and afterwards the labelling. If you stop the procedure during the first sampling iteration (just Ctrl+C in the terminal) then the inference.json represents the last inference. It is simply overwritten after each active learning iteration.

Besides inference.json, there is a results file generated in your results folder giving the overall mAP and the mAP for the individual classes. That might be enough for your application (I used these for my scientific research output).

Hi @pieterblok
I am really sorry for the very late response!
After training, four text files are generated in the results and weights folders which include metrics or inference results. No output folder is generated. Here are the files:

  1. weights/exp1/uncertainty/metrics.json
    This is similar to the original metrics.json file which is generated after running detectron2 without your AL extension. This would be useful for my above-mentioned purpose.
  2. weights/exp1/uncertainty/inference/coco_instances_results.json
    This file includes some instances, it is not really clear to me for which images exactly. See part of the file below:
[{"image_id": 1, "category_id": 1, "bbox": [314.2623596191406, 95.52009582519531, 341.5835876464844, 76.62232971191406], "score": 0.9237105250358582},
{"image_id": 5, "category_id": 1, "bbox": [146.166259765625, 143.88946533203125, 850.9454956054688, 76.61729431152344], "score": 0.9729232788085938},
{"image_id": 5, "category_id": 1, "bbox": [172.25726318359375, 141.29135131835938, 465.15924072265625, 73.83609008789062], "score": 0.23908992111682892},
{"image_id": 5, "category_id": 1, "bbox": [600.7523193359375, 146.4454803466797, 285.7037353515625, 77.92771911621094], "score": 0.07600747048854828},
{"image_id": 7, "category_id": 1, "bbox": [327.11590576171875, 86.79924011230469, 404.923095703125, 96.70126342773438], "score": 0.207632377743721}, ...
  1. results/exp1/uncertainty/coco_instances_results.json
    This file is very similar to the previous file but with different instances.
  2. results/exp1/uncertainty/uncertainty.csv
    The content of this file is:
train_size,val_size,test_size,mAP,mAP-damaged
16,31,29,12.8,12.8
25,31,29,18.2,18.2
33,31,29,17.0,17.0
45,31,29,17.0,17.0
56,31,29,23.2,23.2
60,31,29,20.7,20.7

and again not really clear what each entry means exactly in this file. If you can provide some explanation on the files in steps 2, 3 and 4, I would really appreciate it.

1.) metrics.json is the standard output from Detectron2, I haven't changed anything about this output file. Refer to: https://stackoverflow.com/questions/61421951/getting-the-output-metrics-from-detectron2s-cocoevaluator

2.) and 3.) coco_instances_results.json is also a standard output from Detectron2, created by https://detectron2.readthedocs.io/en/latest/_modules/detectron2/evaluation/coco_evaluation.html

4.) a csv file with the outputs generated by my active learning script:

boxal/boxal.py

Line 533 in ed4b891

write_values = [len(dataset_dicts_train), len(dataset_dicts_val), len(dataset_dicts_test), round(eval_results['bbox']['AP'], 1)] + bbox_values

I have used this specific output for my scientific research.

In your file you can see that the training size is being increased starting from 16 images, then 25, until 60 images. Apparently you are having one class called "damaged", therefore the overall mAP and the class-specific mAP is the same. I have done my experiments with 5 classes, and I was particularly interested in 4 of them because they were minority classes.

Thank you!
Sorry, there was a typo in my message. I meant files 2,3 and 4 (edited now).

2 & 3. What is the difference between these two file?

  1. yes, I have only one class called "damaged". The number of images in my test and val files are respectively 45 and 46. And my setting in the boxal.yaml file is as follows:
initial_datasize: 20
pool_size: 20
loops: 5

I assume that it only counts the images which have (at least) one box on them? Some of my images do not include damaged area, i.e. there's no box on them.

I don´t know the difference between 2&3, I think they are the same files, but maybe you did some regular training causing one file instance created in another folder.

Yes, it counts the number of images with annotations in it. "Empty images" are skipped by Detectron2.

In Detectron2, empty images are skipped by default, but one can include them in training as below. Right?

cfg.DATALOADER.FILTER_EMPTY_ANNOTATIONS = False

I added this line to the train method in the boxal.py file in order to be able to configure it with boxal too:

cfg.DATALOADER.FILTER_EMPTY_ANNOTATIONS = config['filter_empty_annotation']

In Detectron2, empty images are skipped by default, but one can include them in training as below. Right?


cfg.DATALOADER.FILTER_EMPTY_ANNOTATIONS = False

I added this line to the train method in the boxal.py file in order to be able to configure it with boxal too:


cfg.DATALOADER.FILTER_EMPTY_ANNOTATIONS = config['filter_empty_annotation']

Oh nice! I didn't know about that. Thanks for the information 🙏

Actually, it does not work with the boxal implementation :(
I will open a new issue for this topic. I am going to add a few other parameters to the boxal.yaml file, including filter_empty_annotations.
You may close this issue.