huangyh09/brie

BRIE does not support use of the Ensembl toplevel genome sequence

jenni-westoby opened this issue · 2 comments

I get the following error when executing brie-event-filter using the Ensembl toplevel genome sequence (ftp://ftp.ensembl.org/pub/release-82/fasta/mus_musculus/dna/Mus_musculus.GRCm38.dna.toplevel.fa.gz) as the reference genome sequence:

$ brie-event-filter -a AS_events/SE.gff3 --anno_ref=Mus_musculus.GRCm38.82.chr.gtf --reference=Mus_musculus.GRCm38.dna.toplevel.fa
9908 Skipped Exon events are input for quality check.
Traceback (most recent call last):
  File "venv/bin/brie-event-filter", line 9, in <module>
    load_entry_point('brie==0.1.2', 'console_scripts', 'brie-event-filter')()
  File "venv/local/lib/python2.7/site-packages/brie/events/event_filter.py", line 369, in main
    no_splice_site)
  File "venv/local/lib/python2.7/site-packages/brie/events/event_filter.py", line 118, in as_exon_check
    up_ss3 = fastaFile.get_seq(chrom, _exon_loc[1]+1, _exon_loc[1]+2)
  File "venv/local/lib/python2.7/site-packages/brie/utils/fasta_utils.py", line 19, in get_seq
    return self.f.fetch(qref, start-1, stop)
  File "pysam/libcfaidx.pyx", line 278, in pysam.libcfaidx.FastaFile.fetch (pysam/libcfaidx.c:5011)
KeyError: "sequence 'chrX' not present"

This error does not occur when I use the non-toplevel Ensembl genome sequence (ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_mouse/release_M12/GRCm38.p5.genome.fa.gz). I believe it is likely to be caused due to different naming of the chromosomes in each file.

You are right. The issue comes from the default value "--add_chrom=chrX", which does not match your fasta file as its according chromosome id is "X".

So could you add the "--add_chrom=X" in your command line? I will fix this bug in latter release.

Cheers,
Yuanhua

Thank you, that works.