endixk/ezaai

Coverage and Identity cut-off explanation

Opened this issue · 2 comments

Hello everyone,

I've changed the identity percentage cutoff (-id) to 0, 0.2, 0.3, 0.4 while keeping the coverage fixed at 0.5, but the number of matched counts and AAI remains the same. Is it possible for the number of Matched counts to always stay the same?

I also tried using 0 as coverage and varying the identity percentage (0.2, 0.4), but also in this case, the value of matched counts remains identical.

In detail, for my experiments, I would like to see what difference there is between 0.2 and 0.4 as identity cutoffs, and I would expect at least a slight variation in the AAI.

Is there any explanation for this?

Thank you so much.

Laura

endixk commented

Dear Laura,

I found that the logic around the average identity calculation was actually static, which uses 40% identity in any circumstances. Thank you so much for pointing out this critical flaw.

This is now fixed and will be deployed as a static version as soon as possible.

Before then, please use this unstable binary with this issue fixed: EzAAI_v1.2.2.unstable.jar.zip

You can run this by command java -jar EzAAI_v1.2.2.unstable.jar.

I'm also posting the result from the fixed version below (which is showing expected results), ran with sample genomes of Clavibacter insidiosus and C. nebraskensis, by lowering the identity value from 0.5 to 0.2.

Label 1	Label 2	AAI	CDS count 1	CDS count 2	Matched count	Proteome cov.	ID param.	Cov. param.
Clavibacter insidiosus	Clavibacter nebraskensis	95.434771	3300	2905	2528	0.814827	0.500000	0.500000
Clavibacter insidiosus	Clavibacter nebraskensis	95.253508	3300	2905	2537	0.817728	0.400000	0.500000
Clavibacter insidiosus	Clavibacter nebraskensis	94.973205	3300	2905	2549	0.821595	0.300000	0.500000
Clavibacter insidiosus	Clavibacter nebraskensis	94.919639	3300	2905	2551	0.822240	0.200000	0.500000

Dear Daniel,

thank you so much for your kind reply!

Best regards,

Laura