Pre-training Code

Question

Pre-training Code

Closed this issue a month ago · 3 comments

Thank you for the impressive work! I was wondering if it would be possible for you to release the pre-training code? It would be extremely helpful for understanding the underlying mechanisms combined with the paper.

Answer 1 · 2024-10-22T00:50:55.000Z

Thanks for the interest! We will eventually be releasing the pre-training code and an expanded appendix detailing how pre-training was done -- we'll have a timeline on when this will happen soon! If you had any specific questions about the pre-training in the meantime please let us know, we'd be happy clarify any steps.

Answer 2 · 2024-11-26T10:30:25.000Z

Hi there, very cool work! Since you offered, would you mind sharing simple training/model details, i.e. how many steps Orthrus was pretrained for before stopping, and how many mamba blocks does it have? Am I also correct to think you use single nucleotide tokens?
Many Thanks

Answer 3 · 2024-11-26T19:50:44.000Z

Hey Marcell,

thanks for your questions. Most of these can actually be found in the current released codebase as the model architecture, and the weights are public. You can find details in the following places:

, ,

You'll notice in the model weights section they are trained for 20,000 steps and we use a batch size of 150 for shorter transcripts and 50 for longer (a technique to minimize memory used due to padding).

You will find other training details listed in the configs but please let us know if you have other questions.