Questions about experimental code
Closed this issue · 5 comments
Hello, I'm very interested in your model, about the genome benchmark, I operated through your guidance, and there were 2 problems - the dataloader length is 0 and the loss is infinite, I don't know if this is normal, can you help confirm what the reason is?
Q1:
RUN:
python -m train
experiment=hg38/genomic_benchmark
callbacks.model_checkpoint_every_n_steps.every_n_train_steps=5000
dataset.dataset_name="dummy_mouse_enhancers_ensembl"
dataset.train_val_split_seed=1
dataset.batch_size=128
dataset.rc_aug=false
+dataset.conjoin_train=false
+dataset.conjoin_test=false
loader.num_workers=2
model=caduceus
model.name=dna_embedding_caduceus
+model.config_path=""
+model.conjoin_test=false
+decoder.conjoin_train=true
+decoder.conjoin_test=false
optimizer.lr="1e-3"
trainer.max_epochs=10
train.pretrained_model_path="<path to .ckpt file>"
wandb=null
ERROR:
Q2:
RUN:
python -m train
experiment=hg38/hg38
callbacks.model_checkpoint_every_n_steps.every_n_train_steps=500
dataset.max_length=1024
dataset.batch_size=1024
dataset.mlm=true
dataset.mlm_probability=0.15
dataset.rc_aug=false
model=caduceus
model.config.d_model=128
model.config.n_layer=4
model.config.bidirectional=true
model.config.bidirectional_strategy=add
model.config.bidirectional_weight_tie=true
model.config.rcps=true
optimizer.lr="8e-3"
train.global_batch_size=8
trainer.max_steps=10000
+trainer.val_check_interval=10000
wandb=null
ERROR:
Regarding Q1, this is an error I haven't hit before. Can you provide a bit more of the console output. Also it looks like these two fields are empty in the command you used to launch. They need to be filled with arguments that correspond to a pre-trained model.
+model.config_path=""
train.pretrained_model_path="<path to .ckpt file>" `
Regarding Q2, can you post the LR and training loss graphs from wandb? Did the model ever hit a nan
loss during training?
Q1:Sorry, it's my fault. The code I uploaded has issues, and here are more error screenshots.
RUN:
python -m train
experiment=hg38/genomic_benchmark
callbacks.model_checkpoint_every_n_steps.every_n_train_steps=5000
dataset.dataset_name="human_nontata_promoters"
dataset.train_val_split_seed=2
dataset.batch_size=128
dataset.rc_aug=false
+dataset.conjoin_train=false
+dataset.conjoin_test=false
loader.num_workers=2
model=caduceus
model.name=dna_embedding_caduceus
+model.config_path="/home/gyc/caduceus-main/outputs/2024-03-11/20-21-19-995417/model_config.json"
+model.conjoin_test=false
+decoder.conjoin_train=true
+decoder.conjoin_test=false
optimizer.lr="1e-3"
trainer.max_epochs=10
train.pretrained_model_path="/home/gyc/caduceus-main/outputs/2024-03-11/20-21-19-995417/checkpoints/last.ckpt"
wandb=null
ERROR:
I just tried running this and did not hit the division by zero error. Can you confirm that data was properly downloaded to ./data/genomic_benchmark/human_nontata_promoters/
by the genomics-benchmark
library:
This directory should look like this
data/genomic_benchmark/human_nontata_promoters/
├── test
│ ├── negative
│ └── positive
└── train
├── negative
└── positive
these directories should contain .txt
files with sequences.
Thanks for the reminder, I've successfully run your code and it works great!
Glad to hear it!