vrmarcelino/CCMetagen

CCMetagen_merge.py output columns meaning

theo-allnutt-bioinformatics opened this issue · 1 comments

Can you please describe each of the columns in the output of CCMetagen_merge.py?

Score | Expected | Template_length | Template_Identity | Template_Coverage | Query_Identity | Query_Coverage | Depth | q_value | p_value

Thanks.

Hi Theo,

In short: Score, Expected, q_value and p_value are the result of the conclave-sorting calculations.
Template_length is the len of the reference sequence.
Template_identity is the percentage of identical nucleotides between the reference sequence and the consensus sequence (formed by your reads).
Template_Coverage is the proportion of the ref sequence that is covered by the consensus sequence.
Query identity and coverage are just the reciprocal values described above, but for the query.

The most important parameters to check are Depth, Template_Coverage and Template_identity.

These are definitions from the KMA aligner, so for more detailed explanation, please refer to their manual:
https://bitbucket.org/genomicepidemiology/kma/src/master/KMAspecification.pdf