Pre-training Code
Closed this issue · 3 comments
Thank you for the impressive work! I was wondering if it would be possible for you to release the pre-training code? It would be extremely helpful for understanding the underlying mechanisms combined with the paper.
Thanks for the interest! We will eventually be releasing the pre-training code and an expanded appendix detailing how pre-training was done -- we'll have a timeline on when this will happen soon! If you had any specific questions about the pre-training in the meantime please let us know, we'd be happy clarify any steps.
Hi there, very cool work! Since you offered, would you mind sharing simple training/model details, i.e. how many steps Orthrus was pretrained for before stopping, and how many mamba blocks does it have? Am I also correct to think you use single nucleotide tokens?
Many Thanks
Hey Marcell,
thanks for your questions. Most of these can actually be found in the current released codebase as the model architecture, and the weights are public. You can find details in the following places:
You'll notice in the model weights section they are trained for 20,000 steps and we use a batch size of 150 for shorter transcripts and 50 for longer (a technique to minimize memory used due to padding).
You will find other training details listed in the configs but please let us know if you have other questions.