Should I keep decreasing the learning rate?
txemaheredia opened this issue · 0 comments
Hi,
I ran CellBender on my dataset using these params:
cellbender remove-background \
--cuda \
--input ${sample}/raw_feature_bc_matrix.h5 \
--output ${sample}/cellbender_output.h5 \
--expected-cells 10000 \
--total-droplets-included 30000 \
--exclude-feature-types "Antibody Capture" \
--fpr 0.01 \
--epochs 150
The html report showed this ELBO plot:
And it issued this warning:
Automated assessment --------
- WARNING: The training ELBO deviates quite a bit from the max value during the second half of training.
- We typically expect to see the training ELBO increase almost monotonically. This curve seems to have a concerted period of motion in the wrong direction near epoch 56. If this is early in training, this is probably okay.
- We hope to see the test ELBO follow the training ELBO, increasing almost monotonically (though there will be deviations, and that is expected). There may be a large gap, and that is okay. However, this curve ends with a low test ELBO compared to the max test ELBO value during training. The output could be suboptimal.
Summary:
This is unusual behavior, and a reduced --learning-rate is indicated. Re-run with half the current learning rate and compare the results.
I followed the suggestion and halved the learning rate, re-running cellbender with --learning-rate 0.00005
. The resulting ELBO plot looked like this:
Automated assessment --------
- The training ELBO deviates quite a bit from the max value at the last epoch.
- We typically expect to see the training ELBO increase almost monotonically. This curve seems to have a concerted period of motion in the wrong direction near epoch 76. If this is early in training, this is probably okay.
- We hope to see the test ELBO follow the training ELBO, increasing almost monotonically (though there will be deviations, and that is expected). There may be a large gap, and that is okay. However, this curve ends with a low test ELBO compared to the max test ELBO value during training. The output could be suboptimal.
Summary:
This is slightly unusual behavior, and a reduced --learning-rate might be indicated. Consider re-running with half the current learning rate to compare the results.
Should I keep halving the learning rate? Or I'd be better using the default learning rate but using more epochs?