MAGICS-LAB/DNABERT_2

About the pretrain data

wyhsleep opened this issue · 1 comments

Thank you for your outstanding work. I have a question regarding the datasets used for pre-training. Specifically, you mentioned using HUMAN AND MULTI-SPECIES GENOME data to pre-train the model. Could you please clarify the source of the multi-species genome dataset?

You can found more information about it in the Appendix and download it here. https://drive.google.com/file/d/1dSXJfwGpDSJ59ry9KAp8SugQLK35V83f/view