mgrankin/fast_tabnet

LR finder graph : isn't is suppose to go up?

Optimox opened this issue · 4 comments

Hello,

Happy to create the first issue! :)

I just have a question : what do you think about the learning rate finder graph? From what I understand we are supposed to see the loss going up if the LR is too high, but here it does not seem to be the case? Any idea of why?

Also it seems that batch norm is 10? (or maybe I'm wrong) I did not perform an extensive benchmark but the paper's authors are using very large batch, so maybe the LR finder on such small batches is not a great choice.

Thanks!

Hello, it's nice to see you here!

I believe the loss is very high in the beginning. Because of that even the pretty high LR still works and loss is decreasing. If you increase LR even more you see the loss increasing. If you start with trained model then the graph will be very different.

I didn't get the next paragraph about batch norm (batch size?).

yes sorry, I meant batch size.

So how would you pick a learning rate with the current graph?

You choose a graph area where the slope (the rate of loss decreasing) is maximum. It's big on this particular graph - somewhere between 2e-3 and 1e-1. I've tried several values in between and 3e-2 works best.

stale commented

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.