evo-design/evo

Release of Pre-training Data Preprocess Scripts

Opened this issue · 1 comments

Hi,
If the data release for OpenGenome is still on-going, would it be possible to release the preprocess scripts for the data (no need to be exactly reproducible)?

cx0 commented

@KatarinaYuan @brianhie

I have tried to reproduce the OpenGenome dataset here by following the instructions in the paper. You can generate a functionally-equivalent dataset for your own training while waiting for the authors to release the exact filtering steps used for the dataset.