elijahcole/single-positive-multi-label

Bug in Linear Initialization Training Mode

Closed this issue · 4 comments

I believe there is a bug in the linear_init training mode that crept in when I was cleaning up the code for release. I'm testing a fix now, and I'll post an update by the end of the day weekend. In the meantime, do not rely on the numbers produced by the code in linear_init mode.

Pushed a fix!

4c1519b

What was the code doing for linear_init before?

For each hyperparameter set:

  1. Set a particular learning rate and batch size.
  2. Use those hyperparameters to train a linear classifier.
  3. Use the same hyperparameters to fine-tune.

Why was that wrong?

The hyperparameters that are best for linear classifier training may not be best for fine-tuning. We need to optimize hyperparameters separately for the two phases. This is how we produced the numbers in the paper.

So what is the code doing for linear_init now?

  1. Set hyperparameters for the linear phase using linear_init_lr and linear_init_bsize here.
  2. Use those hyperparameters to train a linear classifier on fixed features.
  3. Starting from this model, run a hyperparameter search for the fine-tuning phase.

To find the values for linear_init_lr and linear_init_bsize, run the code in linear_fixed_features mode for the desired loss/dataset. The optimal hyperparameters for the linear model will be reported at the end of training.

This is noted in the updated README as well.

For convenience, the linear_init mode now automatically sets the best hyperparameters for the linear phase.

See 4170040 for details.

wxr99 commented

Thanks for giving such details, but could the author directly provide for the optimal hyperparameters (fine tuning) for each dataset when training with "+linear_init" mode. thx a lot! Besides, from my view to the code, there's no exact implementation of "+LinearInit" which i need to change the code to further finetune the model based on a best linear initialized network?

Thanks for the question!

I am "out of the office" for the CV4Ecology Summer School until the end of August, and I can't provide a timeline for the fine tuning hyperparameters before then. However, running the code will provide those parameters.

I'm not sure I understand your second question about the "exact implementation" - the procedure for running "+LinearInit" experiments is here. Please let me know if that does not answer your question, I'll be happy to clarify!