biocommons/hgvs

Equivalent genomic coordinates for transcript structure

wlymanambry opened this issue · 2 comments

Is it possible to get equivalent genomic coordinates for the exon structure?

Right now I can get the exon structure which looks something like:

"c_transcript_structure": [
[1, 158], [159, 4138] ],

But I'd like to also know the corresponding genomic positions for those relative transcript positions.

Thanks!

Since you are building JSON - have you see cdot? It provides transcript info in JSON so may be what you need already

Eg check out the JSON from here: http://cdot.cc/transcript/NM_001001890.3

exons are a list of (genomic start, genomic end, exon ID, tx start, tx end, cigar for alignment gaps), eg:

      "exons": [
        [
          36160097,
          36164907,
          5,
          2474,
          7283,
          null
        ],

You can download gzipped JSON files


Doing it via Biocommons

In the data provider you should be able to call get_tx_exons for the transcript_accession + contig for the genomic coordinate

Help docs say it's return is

            {
                'tes_exon_set_id' : 98390
                'aes_exon_set_id' : 298679
                'tx_ac'           : 'NM_199425.2'
                'alt_ac'          : 'NC_000020.10'
                'alt_strand'      : -1
                'alt_aln_method'  : 'splign'
                'ord'             : 2
                'tx_exon_id'      : 936834
                'alt_exon_id'     : 2999028
                'tx_start_i'      : 786
                'tx_end_i'        : 1196
                'alt_start_i'     : 25059178
                'alt_end_i'       : 25059588
                'cigar'           : '410='
            }

You look to be using [tx_start_i, tx_end_i] in your code above, the genomic coords for the contig are [alt_start_i, alt_end_i]

Let me know if this is enough to help you out, so I can close the issue. Thanks!

Yep this is what I was looking for. Thanks so much!