training errors and settings

Question

Terencedu opened this issue 5 months ago · 2 comments

Hi,

Thanks for your open source work. I am really interested in it. Please allow me to ask some questions:

Model size：I trained the R50_nui model with mini-dataset, why is the trained model is 452M and yours 150M, how to reduce the size?
Training error: Why is there no error with "data_time" when epoch=6/8/12 but an error with epoch=10? So I need to set epoch=6x?
Setting: I will use 4 4090(4×24GB) to train the R50_nui model with full-dataset. May I ask how to set bs and lr? num_gpus=4,bs=2,lr=2e-4 or num_gpus=4,bs=4,lr=4e-4?
Focal head: If I set use_hybrid_tokens = True in focal_head to train the model, it would speed up the training time (because of less training feature), but the test FPS remain the same (because remove focal_head in test) and the test accuracy drops a bit?
Setting: I want to use R50_nui to detect nearby objects when the ego is not moving, What are the tips for setting memory_len, topk_proposals, num_query, num_propagated? like topk_proposals=300 and memoroy_len=600 to fusion 2 frames?
How to calculate "resize_lim" based on "final_dim"?

There are many questions and thanks for your patient.

Answer 1 · 2024-04-06T02:14:39.000Z

Sorry for late response:

Your model checkpoint contains parameters for both the model and optimizer. And when I uploaded the model, I only uploaded the model parameters. You can read model checkpoints and print keys() to know the detail.
Sorry, I didn't observe this before, I will reproduce it when I have time. lol
Both setting is right. But I think the result of num_gpus=4,bs=4,lr=4e-4 is better.
You are right. So I recommend you to set use_hybrid_tokens = False.
Yes
First you should calculate a base resize_ratio, e.g. 704/1600 = 0.44, Then adjust the upper and lower bounds by 10% -20% respectively e.g. 0.44 x 0.8 = 0.352, 0.44x1.2 = 0.528. But you need to ensure that 1600 x lower bounds (0.352 in this example)>256 (the finaldim of height). Because PIL will crop images.

Answer 2 · 2024-04-07T09:21:37.000Z

Thank you for your detailed reply. I learned a lot！