AnantharamanLab/vRhyme

Program trying to read from non-existing output file

minna-miha opened this issue · 6 comments

Hi there,

I ran vRhyme on several hundreds of samples, and all went smoothly except for one specific problem file, wherein the program seems to write to output files but tries to read from a non-existing intermediate file.

The err output file says:

Traceback (most recent call last):
  File "/users/mhauer1/miniconda3/envs/vRhyme2/bin/vRhyme", line 1017, in <module>
    final, binned_seqs, bins_count, best_score, binned_prots, binned_redundancy = score_stuff.final_bin(folder, best, scoring, mapper, uniques, uniques_info)
  File "/gpfs/home/mhauer1/miniconda3/envs/vRhyme2/bin/score_stuff.py", line 146, in final_bin
    with open(f'{folder}{final}.vRhyme-bins.tsv', 'r') as infile, open(f'{folder}vRhyme_best_bins.{final}.membership.tsv', 'w') as outfile, open(f'{folder}vRhyme_best_bins.{final}.summary.tsv', 'w') as summary:
FileNotFoundError: [Errno 2] No such file or directory: 'I16_1016/13.vRhyme-bins.tsv

When I look at the contents of the I16_1016 output directory, it appears vRhyme has produced these *.vRhyme-bins.tsv files for values 1 through 12, and then 18, 19, but not 13-17.

When I open the log file, it shows that the program always ends at "2.67 Extracting binning summary statistics for each iteration".

Any idea what might be going on here? I am using vRhyme 1.1.0, and almost all of the other samples I ran worked without any issues. Thank you!

Hi @minna-miha

I am not the original vRhyme author, but I can try to help you. I do not have an immediate guess as to what could be happening, especially since this is only a problem for 1 of your inputs. It seems the immediate problem is that iteration 13 is selected as the best binning iteration, but the file no longer exists.

If possible, providing the inputs you used would be the easiest way to figure this out since I could reproduce your results locally, but if that is not possible can you provide the following information:

  • In your output directory, can you confirm that the file "vRhyme_alternate_bins/vRhyme_bin_scoring.tsv" exists and has iterations 0-19 present? Or are iterations 13-17 missing from that file as well? If so, then the problem is earlier in the code. There is a step a few lines above that should generate [0-19].vRhyme-bins.tsv.

Hi

I have the same problem. looks like vRhyme doesn't produce the 12.vRhyme-bins.tsv file. It produces the files [0-11].vRhyme-bins.tsv, then skips to 13.vRhyme-bins.tsv. It happens on all the samples on which I test it, so I guess it's a bug. Somebody has an idea on how to fix this problem?

Hi,

I'm not sure what is causing the issue. Are you able to provide sample data that the error occurs with? Thanks

Hi,

I met similar error,
FileNotFoundError: [Errno 2] No such file or directory: 'outdir/1.vRhyme-bins.tsv

below is the text in vRhyme_bin_scoring.tsv , it seems that *.vRhyme-bins.tsv of itertions with score==-1 were not generated, included the :

iteration sequences redundancy bins proteins score
1 0 0 0 0 -1
2 0 0 0 0 -1
3 0 0 0 0 -1
5 0 0 0 0 -1
6 0 0 0 0 -1
8 0 0 0 0 -1
10 0 0 0 0 -1
12 0 0 0 0 -1
13 0 0 0 0 -1
15 0 0 0 0 -1
16 0 0 0 0 -1
17 0 0 0 0 -1
18 0 0 0 0 -1
0 115 347 50 5327 -1.218
4 115 347 50 5327 -1.218
7 115 347 50 5327 -1.218
11 115 347 50 5327 -1.218
14 115 347 50 5327 -1.218
19 115 347 50 5327 -1.218
9 28 127 14 1359 -1.5339

Hi @minna-miha

I am not the original vRhyme author, but I can try to help you. I do not have an immediate guess as to what could be happening, especially since this is only a problem for 1 of your inputs. It seems the immediate problem is that iteration 13 is selected as the best binning iteration, but the file no longer exists.

If possible, providing the inputs you used would be the easiest way to figure this out since I could reproduce your results locally, but if that is not possible can you provide the following information:

  • In your output directory, can you confirm that the file "vRhyme_alternate_bins/vRhyme_bin_scoring.tsv" exists and has iterations 0-19 present? Or are iterations 13-17 missing from that file as well? If so, then the problem is earlier in the code. There is a step a few lines above that should generate [0-19].vRhyme-bins.tsv.

Hi,

I met similar error,
FileNotFoundError: [Errno 2] No such file or directory: 'outdir/1.vRhyme-bins.tsv

below is the text in vRhyme_bin_scoring.tsv , it seems that *.vRhyme-bins.tsv of itertions with score==-1 were not generated:

iteration sequences redundancy bins proteins score
1 0 0 0 0 -1
2 0 0 0 0 -1
3 0 0 0 0 -1
5 0 0 0 0 -1
6 0 0 0 0 -1
8 0 0 0 0 -1
10 0 0 0 0 -1
12 0 0 0 0 -1
13 0 0 0 0 -1
15 0 0 0 0 -1
16 0 0 0 0 -1
17 0 0 0 0 -1
18 0 0 0 0 -1
0 115 347 50 5327 -1.218
4 115 347 50 5327 -1.218
7 115 347 50 5327 -1.218
11 115 347 50 5327 -1.218
14 115 347 50 5327 -1.218
19 115 347 50 5327 -1.218
9 28 127 14 1359 -1.5339

How many viruses are you trying to bin? I am not sure what specifically deletes the iterations, but the results suggest that the optimal binning is actually no binning, which tends to happen when trying to bin a small number of viral contigs.

Hi,

I have the same issue:
51.7 Extracting binning summary statistics for each iteration Traceback (most recent call last): File "/clusterfs/jgi/groups/science/homes/ccoclet/micromamba/envs/mvp/bin/vRhyme", line 1017, in <module> final, binned_seqs, bins_count, best_score, binned_prots, binned_redundancy = score_stuff.final_bin(folder, best, scoring, mapper, uniques, uniques_info) File "/clusterfs/jgi/groups/science/homes/ccoclet/micromamba/envs/mvp/bin/score_stuff.py", line 146, in final_bin with open(f'{folder}{final}.vRhyme-bins.tsv', 'r') as infile, open(f'{folder}vRhyme_best_bins.{final}.membership.tsv', 'w') as outfile, open(f'{folder}vRhyme_best_bins.{final}.summary.tsv', 'w') as summary: FileNotFoundError: [Errno 2] No such file or directory: 'DEVAKI_YELLOWSTONE_PROJECT/07_BINNING/07A_vRHYME_OUTPUT/12.vRhyme-bins.tsv'.

12.vRhyme-bins.tsv is not generated while I have the vRhyme_alternate_bins/vRhyme_bin_scoring.tsv and I am trying to bin a large number of viral sequences (n = 8853). I also attach the text in vRhyme_bin_scoring.tsv:

`iteration sequences redundancy bins proteins score

12 0 0 0 0 -1

15 0 0 0 0 -1

16 0 0 0 0 -1

18 0 0 0 0 -1

0 783 624 297 8445 -1.1079

4 783 624 297 8445 -1.1079

7 783 624 297 8445 -1.1079

11 783 624 297 8445 -1.1079

14 783 624 297 8445 -1.1079

19 783 624 297 8445 -1.1079

9 554 439 221 6052 -1.1679

5 452 424 184 5153 -1.2735

1 229 274 95 2834 -1.436

6 229 274 95 2834 -1.436

10 229 274 95 2834 -1.436

13 229 274 95 2834 -1.436

17 229 274 95 2834 -1.436

8 112 155 48 1456 -1.5409

2 22 67 11 419 -1.9412

3 22 67 11 419 -1.9412`

Thank you,
Clément