ValueError: Tried to step 7921 times. The specified number of total steps is 7920
Opened this issue · 4 comments
Hello team,
Thank you for the great package. I kept failing to train ATAC gradient-based tuner, giving me error
ValueError: Could not train the Gradient-based tuner.
• This can happen if the maximum learning rate was initially set wayyyyy too high. Please ensure you have run the learning rate range test and set reasonable learning rate boundaries.
• This could also happen because there are outliers in the dataset in terms of the number of reads in a cell. Make sure to remove outlier cells from the dataset, especially those with too many counts.
• For accessibility (ATAC-seq) models, this can occur when modeling too many features (>150K). Removing extremenly rarely-accessible peaks to reduce the feature space will help.
If none of the above work, the standard Bayesian Tuning approach is not affected by numberical stability issues like the gradient-based estimator, so try that next.
So, I've tried with the Bayesian tuning approach, but it is also raising an error:
ValueError: Tried to step 7921 times. The specified number of total steps is 7920
Would you have an idea of what might be causing the problem? I will be happy to share the data if necessary!
Thanks!
Hi Chloe,
I'll need a little bit more information to solve this issue. Did this error thrown at the very end of training?
AL
If you can share your MIRA version, python version, dataset size (number of cells, number of peaks), and any parameters you used that are not default that would also help!
Thank you Allen for getting back to me. Here are the versions:
mira: 2.1.0
python: 3.11.3
dataset size: the original data is 42343 cells × 2461758 peaks but as you've recommended < 150k peaks, I've preprocessed the data: sc.pp.filter_genes(adata_atac, min_cells = 1000), which gave me 42343 × 62833 as input for training.
For the gradient-based tuner, I've followed the tutorial:
np.random.seed(1234)
adata_atac.var['endogenous_peaks'] = np.random.rand(adata_atac.shape[1]) <= min(1e5/adata_atac.shape[1], 1)
atac_model = mira.topics.make_model(
*adata_atac.shape,
feature_type = 'accessibility',
endogenous_key='endogenous_peaks' # which peaks are used by the encoder network
)
atac_model.set_learning_rates(1e-5, 1e-4)
topic_contributions = mira.topics.gradient_tune(atac_model, adata_atac)
I've tried lowering the learning rate but didn't help...
For the baysian tuner, I also followed the tutorial:
np.random.seed(1234)
adata_atac.var['endogenous_peaks'] = np.random.rand(adata_atac.shape[1]) <= min(1e5/adata_atac.shape[1], 1)
atac_model = mira.topics.make_model(
*adata_atac.shape,
feature_type = 'accessibility',
endogenous_key='endogenous_peaks' # which peaks are used by the encoder network
)
NUM_TOPICS = 20
atac_model.set_params(num_topics = NUM_TOPICS).fit(adata_atac)
tuner = mira.topics.BayesianTuner(
model = atac_model,
min_topics = NUM_TOPICS - 5,
max_topics = NUM_TOPICS + 5,
save_name = 'bayesiantuner/',
n_jobs=5
# n_jobs = 5, # highly suggested - if you have the GPU memory!
# storage = mira.topics.Redis(), # if you are running a redis server backend
)
The model seems to fail half way through - I've attached the error messages to this post. Thank you!
slurm-703280_1_gradient-based.txt
slurm-703281_1_baysian-tuner.txt
Hm your initial dataset size used 2461758 peaks, are you using genome bins? One possibility could be that one of the peaks has an extremely large value, which can happen due to alignment errors sometimes. Are there any outliers?