gene numbers from "consensus" should be consecutive
Closed this issue · 3 comments
Hi,
I've been running multiPhATE2 on a number of related phages and I realised that the consensus output has missing gene numbers. For example, one genome has 276 protein coding-genes, but the gene/protein names go until chr1_consensus_303_geneCall
. This is reflected in all the final files when running with primary_calls='consensus'
: the protein file, the gff,...
Not sure if this is intended but I find odd that the final numeration for the genes is not consecutive.
PS: my guess this is related to the fix done for #25 that would remove the genes that shouldn't be in the consensus, but it doesn't address the loss of consecutive enumeration
Cheers,
The consensus gene calls are calculated per genome, by comparing all of the results for gene callers on that genome. It is possible for the consensus set to contain more calls than the result set from any given gene caller. As I recall, the consensus set should be numbered consecutively, but if you are seeing non-consecutive numbering, it would be helpful for me to take a look at the results you are getting for one of your genomes. Please post here or email to multiphate@gmail.com. Thank you.
Yes, that's exactly what it's happening. I'll send you the files by email.
Thank you for sending the files, and for identifying an imperfection in the code! I looked through the consensus output, comparing it to the CGC_results.txt, which confirms that occasionally the consecutive numbering in the consensus output skips a number. This is happening when one of the gene callers calls a gene that is unique with respect to all of the other callers. This is going to happen most often with PHANOTATE, since that gene caller is more likely to detect a gene that the others do not. Because each consensus gene call's number is unique (though not necessarily consecutive), this should not affect any downstream analyses you might do with the consensus data. Therefore, I'm going to fix this issue in the next version of Multiphate2. Thank you for submitting this issue!