About the pretrain data

Question

About the pretrain data

wyhsleep opened this issue 6 months ago · 1 comments

Thank you for your outstanding work. I have a question regarding the datasets used for pre-training. Specifically, you mentioned using HUMAN AND MULTI-SPECIES GENOME data to pre-train the model. Could you please clarify the source of the multi-species genome dataset?

Answer 1 · 2024-07-25T21:24:04.000Z

You can found more information about it in the Appendix and download it here. https://drive.google.com/file/d/1dSXJfwGpDSJ59ry9KAp8SugQLK35V83f/view