vrmarcelino/CCMetagen

Discrepancy on the number of fragments between *ccm.csv and *kmaout.frag.

Closed this issue · 5 comments

Hi!
I have noticed a discrepancy on the number of fragments (in this case scaffolds) classified as a certain specie in output files from *_scaffolds.ccmetagen.ccm.csv and *_scaffolds.kmaout.frag.

For file *_scaffolds.ccmetagen.ccm.csv from CCmetagen the sum of column Depth for rows classified as a certain specie (e.g. Taenia solium) is 5. But I could find just 4 fragments in the KMA output *_scaffolds.kmaout.frag.

I used as key columns to match:
[Closest_match] - *_scaffolds.ccmetagen.ccm.csv
[template_name] - *_scaffolds.kmaout.frag

Is there any explanation for this?

Looking forward to your reply.
Thanks.

ed_scaffolds.ccmetagen.ccm.csv

*_scaffolds.kmaout.frag.
Screen Shot 2022-11-30 at 00 13 16

Hi Ana,

Sorry for the slow response.
I think this might be due to finding more than one match for a given template, but let's investigate.
You are using contigs (not paired-end reads) I assume? Which flags did you use with KMA?

Vanessa

Hi!

Yes, we are using contigs.

I'm using the following command:

Kma -i contigs.fasta -o outFasta -t_db databasePath -ca -1t1 -mem_mode -ef

Thanks for your reply.
Ana Carolina.

Hi!

Okay, a few more questions:
Could you also tell us the command you used with ccmetagen?
Which database you used, the NCBI nt?

Not sure if this is your case but the -1t1 flag may be tricky to use with contigs (unless you are using a ref. database of complete genomes): you are telling KMA to find only one match for that scaffold, but there might be multiple genes (and therefore multiple equally good matches) in the database.

Hi!

ccmetagen command:

CCMetagen.py -i inputFileFasta -o outFasta --depth_unit fr --map inputMapFasta --depth 1 --query_identity 80 -ef y

Yes. NCBI nt.

I see your point. We are going to test without the -1t1 flag.

Thanks for your reply.

Closing issue due to inactivity. Feel free to open it again if you need help.