bcgsc/ntJoin

malformed bed entry?

Closed this issue · 5 comments

hello and thank you for developing this program. I'm attempting to map a highly fragmented draft assembly to a reference using:

./ntJoin/ntJoin assemble target=Ldef.fa target_weight=1 references=Ldeca.fa reference_weights='2' n=1

Although bedtools is in the path, I'm getting this error:

Error: malformed BED entry at line 22. Start was greater than end. Exiting.

thanks!

Hi @kingcohn1,

Interesting - I haven't seen that error before. Did this happen at the stage where the log says that it is "Printing output scaffolds"? Could you paste the full log to confirm? What version of bedtools are you using?

The first place I would look is line 22 of the *unassigned.bed file if you are also getting the message bedtools getfasta failed -- is bedtools on your PATH? right after the error -- is the start greater than the end in that entry?

thanks for the quick reply! It looks like the bed v2.25.0 file didn't write properly (lines 22 and 23)

scaffold10001_size9971	0	0
scaffold100020_size2021	1	2020

And here's more of the output

Total number of components in graph: 2592 

2020-05-22 02:05:20.960310 : Printing output scaffolds
Error: malformed BED entry at line 22. Start was greater than end. Exiting.
bedtools getfasta failed -- is bedtools on your PATH?
None
Traceback (most recent call last):
  File "/home/molecularecology/Documents/ZachCohen/Redundans_assemblies/All/ntJoin/bin/ntjoin_assemble.py", line 851, in <module>
    main()
  File "/home/molecularecology/Documents/ZachCohen/Redundans_assemblies/All/ntJoin/bin/ntjoin_assemble.py", line 848, in main
    Ntjoin().main()
  File "/home/molecularecology/Documents/ZachCohen/Redundans_assemblies/All/ntJoin/bin/ntjoin_assemble.py", line 838, in main
    self.print_scaffolds(paths)
  File "/home/molecularecology/Documents/ZachCohen/Redundans_assemblies/All/ntJoin/bin/ntjoin_assemble.py", line 693, in print_scaffolds
    raise subprocess.CalledProcessError(out_fasta.returncode, cmd_shlex)
subprocess.CalledProcessError: Command '['bedtools', 'getfasta', '-fi', 'L.defecta_Redundans_scaffolds.fa', '-bed', 'out.k32.w1000.n1.L.defecta_Redundans_scaffolds.fa.k32.w1000.tsv.unassigned.bed', '-fo', '-']' returned non-zero exit status 1

thanks!

Can you try updating your bedtools? I tried using bedtools getfasta with bedtools v2.29.2 and a BED file with a zero-length entry, and I get a warning (Feature (mx0:0-0) has length = 0, Skipping.), but no malformed BED entry error. I get the same error as your are seeing though if I do the same command with bedtools v2.25.0. I see an issue reported to bedtools developers about length 0 intervals here: arq5x/bedtools2#646

I'm also interested in seeing if the creation of that scaffold10001_size9971 0 0 entry will be fixed with an updated version of bedtools. I could also reproduce the creation of a zero-length region when using bedtools complement with bedtools v2.25.0 but I didn't see it using my current v2.29.2.

Yes, that worked! Thank you 👍

Excellent! I'll add a note about the bedtools versions in the README.

Thanks for your interest in ntJoin!