This is just a script to extract datasets from https://github.com/ML-Bioinfo-CEITEC/genomic_benchmarks as fasta files for DNA string classifications. All the fasta file has each entry in the following format:
> <class>
<DNA string>
where <something>
depends on the original data.
If you find these datasets useful don't forget to cite them: check https://github.com/ML-Bioinfo-CEITEC/genomic_benchmarks