Question about tbdb.barcode.bed coordinates
mariaelf97 opened this issue · 4 comments
Hello!
It seems that some mutations in the tbdb.barcode.bed file are not based on H37Rv's coordinates.
Could you clarify which ones have a different reference coordinates?
Thank you
Hi @mariaelf97
I'm looking at this file at the moment as well for my own needs.
Can you clarify which mutations are not based on H37Rv ? Because as I understood the file, column 5 is supposed to be the alternative allele value, i.e., I don't think there should be a single instance where this value is equal to the reference allele.
I have a related question for @jodyphelan, what would be the output of the lineage classification if we (supposedly) sequenced the exact strain that was used for the reference genome.
I see in the bed file that there are some mutations that are defined for 4.9 ("H37Rv-like"). However, these are not reference allele but alternative allele. So my guess is that the reference genome would be given no lineage classification ? Or is there a default value encoded somewhere?
Thanks for your help
As an update to my question, actually these 20 entries in the bed are equal to the reference allele
-- | -- | -- | -- | -- | -- | -- | -- |
---|---|---|---|---|---|---|---|
Chromosome | 206480 | 206481 | lineage4 | C | Euro-American | LAM;T;S;X;H | None |
Chromosome | 311612 | 311613 | lineage4.9 | G | Euro-American (H37Rv-like) | T1 | None |
Chromosome | 420007 | 420008 | lineage4.9 | A | Euro-American (H37Rv-like) | T1 | None |
Chromosome | 498530 | 498531 | lineage4 | A | Euro-American | LAM;T;S;X;H | None |
Chromosome | 541200 | 541201 | lineage4.9 | A | Euro-American (H37Rv-like) | T1 | None |
Chromosome | 546356 | 546357 | lineage4 | A | Euro-American | LAM;T;S;X;H | None |
Chromosome | 599867 | 599868 | lineage4 | A | Euro-American | LAM;T;S;X;H | None |
Chromosome | 662910 | 662911 | lineage4 | T | Euro-American | LAM;T;S;X;H | None |
Chromosome | 903912 | 903913 | lineage4.9 | T | Euro-American (H37Rv-like) | T1 | None |
Chromosome | 931122 | 931123 | lineage4 | T | Euro-American | LAM;T;S;X;H | None |
Chromosome | 1250339 | 1250340 | lineage4 | A | Euro-American | LAM;T;S;X;H | None |
Chromosome | 1396921 | 1396922 | lineage4.9 | T | Euro-American (H37Rv-like) | T1 | None |
Chromosome | 1759251 | 1759252 | lineage4.9 | G | Euro-American (H37Rv-like) | T1 | None |
Chromosome | 1907295 | 1907296 | lineage4.9 | G | Euro-American (H37Rv-like) | T1 | None |
Chromosome | 2022867 | 2022868 | lineage4.9 | T | Euro-American (H37Rv-like) | T1 | None |
Chromosome | 2825465 | 2825466 | lineage4 | G | Euro-American | LAM;T;S;X;H | None |
Chromosome | 2994186 | 2994187 | lineage4 | T | Euro-American | LAM;T;S;X;H | None |
Chromosome | 3367764 | 3367765 | lineage4.9 | G | Euro-American (H37Rv-like) | T1 | None |
Chromosome | 3823158 | 3823159 | lineage4.9 | A | Euro-American (H37Rv-like) | T1 | None |
Chromosome | 3830694 | 3830695 | lineage4 | A | Euro-American | LAM;T;S;X;H | None |
So my own concern is answered.
Hi @mariaelf97 - the positions are actually from the H37Rv reference genome (there are several versions of the same genome but with different chromosome names).
@sachalau - yes as you noticed that column is not the reference allele but is the allele that is expected for that particular lineage
@jodyphelan Thank you for your reply.
I understand that since H37Rv is lineage 4.9, the aforementioned mutations refer to REF base rather than ALT. However, I was wondering then what would be the ALT base in this case? Does that mean any base other than the reference base? my analyses on about 100 isolates shows the ALT base is pretty consistent across lineages except for a few cases.
I'd appreciate your thoughts on this.