genomic_benchmark_datasets

This is just a script to extract datasets from https://github.com/ML-Bioinfo-CEITEC/genomic_benchmarks as fasta files for DNA string classifications. All the fasta file has each entry in the following format:

> <class>
<DNA string>

where <something> depends on the original data.

If you find these datasets useful don't forget to cite them: check https://github.com/ML-Bioinfo-CEITEC/genomic_benchmarks

kchu25/genomic_benchmark_datasets