Got different results of the same genome from the different runs of checkm2

Question

Got different results of the same genome from the different runs of checkm2

Opened this issue 4 months ago · 3 comments

Hello, I found some of my MAGs have different completeness and contamination in different runs of checkm2. I first get 20 original MAGs and run checkM2 for them. In this run, e.g., the completeness and contamination of MAG bin.SY48.20 are 50.41%/6.91%. Next, I run the reassembly moddule of metawrap and got 20 reassembled version of the 20 original MAGs. I run checkM2 for the 40 MAGs and the completeness and contamination of the original version of bin.SY48.20 became to 49.87%/6.65%. Why the quality score of the same genome changed in the different runs of checkM2 with different input MAG sets?

this is the first run for the 20 original MAGs:

this is the second run for the 20 original (with 'orig' in the genome id, e.g., 'bin.SY48.20.orig' is the same genome of 'bin.SY48.20' in the first run):

Here is the other run of checkM2 using only two MAGs including the original and reassembled version of bin.SY48.20, and you can see that the the completeness and contamination were changed again:

Answer 1 · 2024-05-23T03:37:48.000Z

Hmm, could potentially be related to keras-team/keras#12400 or the BatchNormalization layers as part of the neural network model, leading to minor fluctuations in output.

Does this affect the 'General' (Gradient boost) results or only the 'Specific' (Neural network') results for your MAGs? You can test this running the --allmodels command. Are the differences in results in excess of 1% completeness or contamination?

Will see if I can reproduce it with my own MAGs.

Answer 2 · 2024-05-23T08:52:59.000Z

Hmm, could potentially be related to keras-team/keras#12400 or the BatchNormalization layers as part of the neural network model, leading to minor fluctuations in output.

Does this affect the 'General' (Gradient boost) results or only the 'Specific' (Neural network') results for your MAGs? You can test this running the --allmodels command. Are the differences in results in excess of 1% completeness or contamination?

Will see if I can reproduce it with my own MAGs.

I run these three MAG sets again with the --allmodels command. Both 'General' and 'Specific' results of bin.SY48.20 were changed in the different runs ('2 MAGs' and '40 MAGs') of checkm2.

However, I don't know why that I could not reproduce the previous result of the '20 original MAGs'... It is strange that the completeness was 50.36% in the result that I report first time, but it became to 49.87% at this time whatever I using --allmodels or not...

Answer 3 · 2024-07-12T09:19:40.000Z

Hmm, could potentially be related to keras-team/keras#12400 or the BatchNormalization layers as part of the neural network model, leading to minor fluctuations in output.

Does this affect the 'General' (Gradient boost) results or only the 'Specific' (Neural network') results for your MAGs? You can test this running the --allmodels command. Are the differences in results in excess of 1% completeness or contamination?

Will see if I can reproduce it with my own MAGs.

Hello, may I ask if this issue has been resolved? The fluctuation of checkm2 result is no matter for genome with high quality but stilll has influence for genome near the quality threshould. I'm worried that others may come to different conclusions from me when reanalyzing my data. I'm not sure if it's appropriate to publish these genomes. Looking forward for your reply.