Discrepancy on the number of fragments between *ccm.csv and *kmaout.frag.
Closed this issue · 5 comments
Hi!
I have noticed a discrepancy on the number of fragments (in this case scaffolds) classified as a certain specie in output files from *_scaffolds.ccmetagen.ccm.csv and *_scaffolds.kmaout.frag.
For file *_scaffolds.ccmetagen.ccm.csv from CCmetagen the sum of column Depth for rows classified as a certain specie (e.g. Taenia solium) is 5. But I could find just 4 fragments in the KMA output *_scaffolds.kmaout.frag.
I used as key columns to match:
[Closest_match] - *_scaffolds.ccmetagen.ccm.csv
[template_name] - *_scaffolds.kmaout.frag
Is there any explanation for this?
Looking forward to your reply.
Thanks.
Hi Ana,
Sorry for the slow response.
I think this might be due to finding more than one match for a given template, but let's investigate.
You are using contigs (not paired-end reads) I assume? Which flags did you use with KMA?
Vanessa
Hi!
Yes, we are using contigs.
I'm using the following command:
Kma -i contigs.fasta -o outFasta -t_db databasePath -ca -1t1 -mem_mode -ef
Thanks for your reply.
Ana Carolina.
Hi!
Okay, a few more questions:
Could you also tell us the command you used with ccmetagen?
Which database you used, the NCBI nt?
Not sure if this is your case but the -1t1 flag may be tricky to use with contigs (unless you are using a ref. database of complete genomes): you are telling KMA to find only one match for that scaffold, but there might be multiple genes (and therefore multiple equally good matches) in the database.
Hi!
ccmetagen command:
CCMetagen.py -i inputFileFasta -o outFasta --depth_unit fr --map inputMapFasta --depth 1 --query_identity 80 -ef y
Yes. NCBI nt.
I see your point. We are going to test without the -1t1 flag.
Thanks for your reply.
Closing issue due to inactivity. Feel free to open it again if you need help.