Bug in Linear Initialization Training Mode

Question

Bug in Linear Initialization Training Mode

Closed this issue 2 years ago · 4 comments

I believe there is a bug in the linear_init training mode that crept in when I was cleaning up the code for release. I'm testing a fix now, and I'll post an update by the end of the ~~day~~ weekend. In the meantime, do not rely on the numbers produced by the code in linear_init mode.

Answer 1 · 2022-02-27T21:14:36.000Z

Pushed a fix!

4c1519b

What was the code doing for `linear_init` before?

For each hyperparameter set:

Set a particular learning rate and batch size.
Use those hyperparameters to train a linear classifier.
Use the same hyperparameters to fine-tune.

Why was that wrong?

The hyperparameters that are best for linear classifier training may not be best for fine-tuning. We need to optimize hyperparameters separately for the two phases. This is how we produced the numbers in the paper.

So what is the code doing for `linear_init` now?

Set hyperparameters for the linear phase using linear_init_lr and linear_init_bsize here.
Use those hyperparameters to train a linear classifier on fixed features.
Starting from this model, run a hyperparameter search for the fine-tuning phase.

To find the values for linear_init_lr and linear_init_bsize, run the code in linear_fixed_features mode for the desired loss/dataset. The optimal hyperparameters for the linear model will be reported at the end of training.

This is noted in the updated README as well.

Answer 2 · 2022-03-15T22:50:34.000Z

For convenience, the linear_init mode now automatically sets the best hyperparameters for the linear phase.

See 4170040 for details.

Answer 3 · 2022-07-31T15:10:56.000Z

Thanks for giving such details, but could the author directly provide for the optimal hyperparameters (fine tuning) for each dataset when training with "+linear_init" mode. thx a lot! Besides, from my view to the code, there's no exact implementation of "+LinearInit" which i need to change the code to further finetune the model based on a best linear initialized network?

Answer 4 · 2022-08-02T22:39:13.000Z

Thanks for the question!

I am "out of the office" for the CV4Ecology Summer School until the end of August, and I can't provide a timeline for the fine tuning hyperparameters before then. However, running the code will provide those parameters.

I'm not sure I understand your second question about the "exact implementation" - the procedure for running "+LinearInit" experiments is here. Please let me know if that does not answer your question, I'll be happy to clarify!

What was the code doing for linear_init before?

Why was that wrong?

So what is the code doing for linear_init now?

What was the code doing for `linear_init` before?

So what is the code doing for `linear_init` now?