biocompibens/ALFA

Error when bedgraph files are not in the output directory

Opened this issue · 2 comments

Hi

I encountered an error when I run ALFA with bedgraph in combination with the -o option.

Following these steps you can reproduce the error (I have alfa and bedtools installed in my path).

clone the ALFA repository
git clone https://github.com/biocompibens/ALFA

Create a genome coverage with bedtools

bedtools genomecov -bg -strand + -ibam ALFA/Quick_start/quick_start.bam > test.plus.bg
bedtools genomecov -bg -strand - -ibam ALFA/Quick_start/quick_start.bam > test.minus.bg

Generate the alfa index:

alfa -a ALFA/Quick_start/quick_start.gtf -g ALFA/Quick_start/quick_start --chr_len ALFA/Quick_start/quick_start.chr_len.txt

The first error occurs when I run:

alfa -g ALFA/Quick_start/quick_start --bedgraph test.plus.bg test.minus.bg test_label --strandness forward -o results

, because the test.minus.bg is empty

To avoid this I just copy the plus to minus:

cp test.plus.bg test.minus.bg

Then the main error arises when I run the following (I also deleted the empty results directory):

alfa -g ALFA/Quick_start/quick_start --bedgraph test.plus.bg test.minus.bg test_label --strandness forward -o results

This is what I get in the log:

ALFA

The output directory doesn't exist yet, it is created.

Checking parameters

Intersecting index and BedGraph files

Intersecting BAM and genome N/A% | |0 of 2|Elapsed Time: 0:00:00multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/usr/local/lib/python3.7/multiprocessing/pool.py", line 121, in worker
result = (True, func(*args, **kwds))
File "/usr/local/lib/python3.7/site-packages/alfa.py", line 544, in intersect_bedgraphs_and_index_to_count_categories_1_file
if os.stat(bedgraph_files + strand + bedgraph_extension).st_size == 0:
FileNotFoundError: [Errno 2] No such file or directory: 'results/test.plus.bg'
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/usr/local/bin/alfa", line 6, in
alfa.main()
File "/usr/local/lib/python3.7/site-packages/alfa.py", line 1655, in main
cpt = intersect_bedgraphs_and_index_to_count_categories(labels, bedgraphs, options, bedgraph_extension, genome_index, prios, index_chrom_list, unknown_cat) # TODO: Write the counts to an output file
File "/usr/local/lib/python3.7/site-packages/alfa.py", line 645, in intersect_bedgraphs_and_index_to_count_categories
results = list(pbar(pool.imap_unordered(intersect_bedgraphs_and_index_to_count_categories_1_file, inputs)))
File "/usr/local/lib/python3.7/site-packages/progressbar/bar.py", line 453, in next
value = next(self._iterable)
File "/usr/local/lib/python3.7/multiprocessing/pool.py", line 748, in next
raise value
FileNotFoundError: [Errno 2] No such file or directory: 'results/test.plus.bg'
Intersecting BAM and genome 100% |##################################################################|2 of 2|Elapsed Time: 0:00:00

It seems that it looks for the bedgraph files in the output directory. I think the problme lies in these lines:
https://github.com/biocompibens/ALFA/blob/master/alfa.py#L1459-L1481

Can you please have a look.

Thank you in advance for your help

Best
Foivos

Hi Foivos,

Thanks for message (and it's very convenient/appreciable to have such a good detail level!).

I don't have so much time to deal with it right now (and I'm not so comfortable because I have to work from home).

About the empty minus bedgraph file, you're right, I guess this would never happen with a real dataset so we didn't take care of it and we wanted the toy example as simple as possible.

Regarding the other one, I can't easily commit the fix properly from here but I think that you can change the line 1.482 to:
bedgraphs.append(re.sub("(.(plus|minus))?" + bedgraph_extension, "", options.bedgraph[sample_package_nb + sample_file]))
Just remove the "options.output_dir + ", it should do the trick.
I'll fix it when we are back to the office.

Let me know if this works for you.

Cheers,
Mathieu

Hi Mathieu

Thank you for your reply. I will wait for a permanent fix from you when you have more time. For the time being, I access the output directory and run the command (not optimal but works). Just ping me when this is fixed.

Thanks again for your help

Best
Foivos