Coverage and Identity cut-off explanation
Opened this issue · 2 comments
Hello everyone,
I've changed the identity percentage cutoff (-id) to 0, 0.2, 0.3, 0.4 while keeping the coverage fixed at 0.5, but the number of matched counts and AAI remains the same. Is it possible for the number of Matched counts to always stay the same?
I also tried using 0 as coverage and varying the identity percentage (0.2, 0.4), but also in this case, the value of matched counts remains identical.
In detail, for my experiments, I would like to see what difference there is between 0.2 and 0.4 as identity cutoffs, and I would expect at least a slight variation in the AAI.
Is there any explanation for this?
Thank you so much.
Laura
Dear Laura,
I found that the logic around the average identity calculation was actually static, which uses 40% identity in any circumstances. Thank you so much for pointing out this critical flaw.
This is now fixed and will be deployed as a static version as soon as possible.
Before then, please use this unstable binary with this issue fixed: EzAAI_v1.2.2.unstable.jar.zip
You can run this by command java -jar EzAAI_v1.2.2.unstable.jar
.
I'm also posting the result from the fixed version below (which is showing expected results), ran with sample genomes of Clavibacter insidiosus and C. nebraskensis, by lowering the identity value from 0.5 to 0.2.
Label 1 Label 2 AAI CDS count 1 CDS count 2 Matched count Proteome cov. ID param. Cov. param.
Clavibacter insidiosus Clavibacter nebraskensis 95.434771 3300 2905 2528 0.814827 0.500000 0.500000
Clavibacter insidiosus Clavibacter nebraskensis 95.253508 3300 2905 2537 0.817728 0.400000 0.500000
Clavibacter insidiosus Clavibacter nebraskensis 94.973205 3300 2905 2549 0.821595 0.300000 0.500000
Clavibacter insidiosus Clavibacter nebraskensis 94.919639 3300 2905 2551 0.822240 0.200000 0.500000
Dear Daniel,
thank you so much for your kind reply!
Best regards,
Laura