huangyh09/brie

Cannot fetch reads in region

yang91 opened this issue · 5 comments

Hi,

I tried brie to my own data, but it reported "Cannot fetch reads in regions: xxx" all the way, and finished with error showed below:

File "/home/yangxx/.local/bin/brie", line 9, in <module>
   load_entry_point('brie==0.1.3', 'console_scripts', 'brie')()
 File "build/bdist.linux-x86_64/egg/brie/brie.py", line 256, in main
 File "/usr/lib/python2.7/multiprocessing/pool.py", line 567, in get
   raise self._value
UnboundLocalError: local variable 'reads' referenced before assignment

My aligner is STAR 2.5.2b, and I already sort bam files during aligner and index with samtools. I also tried to resort output bam files and indexed. These all reported the same error. This is my command line:
brie -a $path/annotation/SE.gold.gtf -f $path/annotation/human_factors.SE.gold.csv -p 10 -s $path/STAR/SRR1033783.Aligned.sortedByCoord.out.bam -o $path/BRIE/SRR1033783. Is there something wrong about my input or parameter?

Hi,

That error looks odd, but I can't tell the reason so far. Your command line seems no problem. Could you simply try using one core, i.e., by setting -p 1? It will be slower, but may help me check whether the problem comes from the multiprocessing package.

Thanks and let me know.
Yuanhua

Hi Yuanhua,

I've tried with -p 1 parameter, error seems somewhere different:

[Brie] loading annotation file... Done.
[Brie] loading reads for 12586 genes with 1 cores...
[Brie] [--------------------] 0.1% done in 1.2 sec.Cannot fetch reads in region: 20:38902668-38907054
Traceback (most recent call last):
  File "/usr/local/bin/brie", line 9, in <module>
    load_entry_point('brie==0.1.3', 'console_scripts', 'brie')()
  File "build/bdist.linux-x86_64/egg/brie/brie.py", line 240, in main
  File "build/bdist.linux-x86_64/egg/brie/utils/run_utils.py", line 18, in set_info
  File "build/bdist.linux-x86_64/egg/brie/utils/tran_utils.py", line 270, in set_reads
  File "build/bdist.linux-x86_64/egg/brie/utils/sam_utils.py", line 74, in fetch_reads
UnboundLocalError: local variable 'reads' referenced before assignment

Then I tried with -p 2, the read processing could go to 100% rather than exit with 0.1% when -p 1, but it still reported same error with -p 10 parameter.

While something puzzles me that when during my brie installation, it got stucked by pylab module. You've offer help to forbiddden this function and installation went well ( see in #6 ).But I don't know if it matter. Do you think this error could be lead by this reason?

Hi, Thanks for the feedbacks. I think the error is clear now, by looking at the report with -p 1 (this doesn't use the multiple processors, so can directly report the error). The error happens because
No reads can be fetched from region 20:38902668-38907054
20 here means the chr20, while in your reference genome sequence, the chr20 is written as 20.

I would guess there is some incompatibility in your bam file, though I see you already sort and index it. Is it possible to check whether the reads on other genes can be fetched? When you say by using -p 2, all genes can processed, do you mean you can get the output results file? If so, maybe only this gene is incompatible to the reference genome.

Anyway, I have updated the sam_utils.py, to give reads=[] if reads cannot fetched from sam file. You could have a try and shouldn't make things worse. Also, I don't turning pylab off makes any differences here as it is only for plot.

Best
Yuanhua

Oh thank you for your reminding, I've used the human annotaion from brie to run with the mouse aligned bam files, such a fool! Very sorry for bothering you, and thank again for your kindly help!

Never mind, enjoy your analysis.