vanheeringen-lab/genomepy

bad filing on Ensembl

siebrenf opened this issue · 1 comments

Bakers yeast is one of those model organisms you would expect to be used a lot.
Unfortunately, the filing system for this genome, and at least several other fungi is inconsistent.

Provider: Ensembl
example genome: ASM280432v1

expected: ftp://ftp.ensemblgenomes.org/pub/fungi/release-48/fasta/saccharomyces_cerevisiae/dna/Saccharomyces_cerevisiae.dna_sm.toplevel.fa.gz

genomepy: ftp://ftp.ensemblgenomes.org/pub/fungi/release-48/fasta/saccharomyces_cerevisiae_gca_002804325/dna/Saccharomyces_cerevisiae_gca_002804325.ASM280432v1.dna_sm.toplevel.fa.gz

real: ftp://ftp.ensemblgenomes.org/pub/fungi/release-48/fasta/saccharomyces_cerevisiae/dna/Saccharomyces_cerevisiae.R64-1-1.dna_sm.toplevel.fa.gz

The error lies in url_name, which is given by Ensembl. We could change this to look for a partial match instead of an exact match.

Add this to a FAQ / known issues. Don't spend a lot of time on fixing this. Genomepy also won't work well for bacteria.