Questions about experimental code

Question

Questions about experimental code

Closed this issue 8 months ago · 5 comments

Hello, I'm very interested in your model, about the genome benchmark, I operated through your guidance, and there were 2 problems - the dataloader length is 0 and the loss is infinite, I don't know if this is normal, can you help confirm what the reason is?
Q1:
RUN:
python -m train
experiment=hg38/genomic_benchmark
callbacks.model_checkpoint_every_n_steps.every_n_train_steps=5000
dataset.dataset_name="dummy_mouse_enhancers_ensembl"
dataset.train_val_split_seed=1
dataset.batch_size=128
dataset.rc_aug=false
+dataset.conjoin_train=false
+dataset.conjoin_test=false
loader.num_workers=2
model=caduceus
model.name=dna_embedding_caduceus
+model.config_path=""
+model.conjoin_test=false
+decoder.conjoin_train=true
+decoder.conjoin_test=false
optimizer.lr="1e-3"
trainer.max_epochs=10
train.pretrained_model_path="<path to .ckpt file>"
wandb=null
ERROR:

Q2:
RUN:
python -m train
experiment=hg38/hg38
callbacks.model_checkpoint_every_n_steps.every_n_train_steps=500
dataset.max_length=1024
dataset.batch_size=1024
dataset.mlm=true
dataset.mlm_probability=0.15
dataset.rc_aug=false
model=caduceus
model.config.d_model=128
model.config.n_layer=4
model.config.bidirectional=true
model.config.bidirectional_strategy=add
model.config.bidirectional_weight_tie=true
model.config.rcps=true
optimizer.lr="8e-3"
train.global_batch_size=8
trainer.max_steps=10000
+trainer.val_check_interval=10000
wandb=null
ERROR:

Answer 1 · 2024-03-13T01:51:59.000Z

Regarding Q1, this is an error I haven't hit before. Can you provide a bit more of the console output. Also it looks like these two fields are empty in the command you used to launch. They need to be filled with arguments that correspond to a pre-trained model.

+model.config_path=""
train.pretrained_model_path="<path to .ckpt file>" `

Regarding Q2, can you post the LR and training loss graphs from wandb? Did the model ever hit a nan loss during training?

Answer 2 · 2024-03-13T02:40:17.000Z

Q1:Sorry, it's my fault. The code I uploaded has issues, and here are more error screenshots.
RUN：
python -m train
experiment=hg38/genomic_benchmark
callbacks.model_checkpoint_every_n_steps.every_n_train_steps=5000
dataset.dataset_name="human_nontata_promoters"
dataset.train_val_split_seed=2
dataset.batch_size=128
dataset.rc_aug=false
+dataset.conjoin_train=false
+dataset.conjoin_test=false
loader.num_workers=2
model=caduceus
model.name=dna_embedding_caduceus
+model.config_path="/home/gyc/caduceus-main/outputs/2024-03-11/20-21-19-995417/model_config.json"
+model.conjoin_test=false
+decoder.conjoin_train=true
+decoder.conjoin_test=false
optimizer.lr="1e-3"
trainer.max_epochs=10
train.pretrained_model_path="/home/gyc/caduceus-main/outputs/2024-03-11/20-21-19-995417/checkpoints/last.ckpt"
wandb=null
ERROR：

Answer 3 · 2024-03-13T03:10:36.000Z

I just tried running this and did not hit the division by zero error. Can you confirm that data was properly downloaded to ./data/genomic_benchmark/human_nontata_promoters/ by the genomics-benchmark library:

This directory should look like this

data/genomic_benchmark/human_nontata_promoters/
├── test
│   ├── negative
│   └── positive
└── train
    ├── negative
    └── positive

these directories should contain .txt files with sequences.

Answer 4 · 2024-03-13T06:11:36.000Z

Thanks for the reminder, I've successfully run your code and it works great!

Answer 5 · 2024-03-13T13:15:22.000Z

Glad to hear it!