About the pretrain data
wyhsleep opened this issue · 1 comments
wyhsleep commented
Thank you for your outstanding work. I have a question regarding the datasets used for pre-training. Specifically, you mentioned using HUMAN AND MULTI-SPECIES GENOME data to pre-train the model. Could you please clarify the source of the multi-species genome dataset?
Zhihan1996 commented
You can found more information about it in the Appendix and download it here. https://drive.google.com/file/d/1dSXJfwGpDSJ59ry9KAp8SugQLK35V83f/view