natir/yacrd

Thread 'main' panicked at 'called `Option::unwrap() ERROR

bbalog87 opened this issue · 6 comments

Hello @natir ,

I wanted to test your tool on a set of contigs, to see whether it can detect "chimeric" contigs as well. But after just 2 min, yacrd crashed with this error message:

thread 'main' panicked at 'called `Option::unwrap()` on a `None` value', libcore/option.rs:345:21 note: Run with `RUST_BACKTRACE=1` for a backtrace.

The command I ran is :

yacrd -i sample.mecat2.racon.noN.self.olp.paf -o yacrd.out -f fasta -e fasta -s fasta
Any idea, what might have gone wrong?

Best,
Julien

natir commented

Hi @bbalog87,

Thank you for your interest on my tools and sorry I took time to answer you.

The error message is not very clear, I have to fix it.

The --split, -extract and --filter options request a path to the fasta or fastq file on which the operation should be performed.

In your case I guess you mapped your reads against your contigs to determine if they are chimeric so I think you should indicate the paths of your contigs file.

The Readme and the help message may not be clear enough on this point.

Thank you again for your interest and hoping to have helped you.

Hi Pierre,
Thank you for replying. Indeed, the problem was the missing paths. It ran successfully.
However, I am not satisfied with the results. Yacrd reported 60% chimeric contigs, which is unrealistic (too many false positive), since busco has reported 98% complete and single copy gene on the same contig-set. I think, yacrd was not optimized for contigs. It is more suitable for raw pB reads.
I will try it on raw reads.
Thank you.
Julien

natir commented

Indeed yacrd was not designed for this.

Moreover, it seems to me to be difficult to compare the results of yacrd and Busco since they do not do the same thing at all.

But the current version of yacrd is very dependent on the quality of overlapping. Perhaps by choosing another overlapper or modifying the mapping parameters the results could be much better.

Ok., I will try these options as well.
By mentioning Busco, I was not trying To compare both tools. Moreover, I wanted to point out that Busco cannot predict 98% complete genes on a contigs set having ~50% chimeric sequences. It sounds somehow illogical to me.

Best,
Julien

natir commented

If I'm not mistaken, busco just checks that conserved genes are present in the assembly, for me it's possible that all the conserved genes be present in chimeric contigs.

But that's not the problem here I did a test with simulated pacbio data 20x, run this pipeline to get an assembly:

minimap2 -x ava-pb reads.fasta reads.fasta > data/overlap.paf
miniasm -f reads.fasta overlap.paf > assembly.gfa
sed -nr "s/S\t([^[:space:]]+)\t([^[:space:]]+)\t.*/>\1\n\2/pg" assembly.gfa > assembly.fasta # to get assembly in fasta format

And I run yacrd pipline like this :

minimap2 -x map-pb assembly.fasta reads.fasta > tig2read.paf
minimap2 -x map-pb assembly.fasta reads.fasta > read2tig.paf
yacrd -i tig2read.paf
yacrd -i read2tig.paf

For tig2read. yacrd only report some read are chimeric, apparently read map at contig extremity.
For read2tig.paf yacrd report two contig on six are chimeric.

Depending on how the mapping was done, the results are not consistent and potentially wrong (by mapping against the reference I checked that no contigs are chimeric).

I have to study the question a little more, especially on the impact of the order of the input files on the minimap2 mapping results (especially on the choice of kmer). So for the moment I strongly advise against using yacrd with hybrid mapping data (contig against reads or two different read sets).

But there is no problem using yacrd with self-mapping data.

Interesting ... your small experiment shows that yacrd strongly depends on the mapping order
contigs<->reads. This substantiates my doubts on the accuracy of the results I mentioned in my previous posting. Asyou said the mapping mechanism needs to be investigated.