biocommons/hgvs

Context view improvements

andreasprlic opened this issue · 3 comments

The hgvs library has a built in (text-based) visualization, which allows to build a view of the context of a variant with the alignment between the transcript and the reference genome. It can create representations similar to this:

                                              v                               NC_000010.10:g.64572045dupT
NC_000010.10 g 64572025 > ACTCAGGGAGTGATTTTTTTTCTCCATAATAAGGCAACCCA          > 64572065 NC_000010.10:g.64572045dupT
NC_000010.10 g 64572025 < TGAGTCCCTCACTAAAAAAAAGAGGTATTATTCCGTTGGGT          < 64572065 NC_000010.10:g.64572045dupT
                          |||||||||||||-|||||||||||||||||||||||||||          13=1D27=
NM_000399.3  n     2670 < TGAGTCCCTCACT-AAAAAAAGAGGTATTATTCCGTTGGGT          <     2709 NM_000399.3:n.2696dupA
NM_000399.3  c      902 <                                                    <      941 NM_000399.3:c.*928dupA

At the moment this visualization is flagged as "experimental". It also requires the uta_align package for re-aligning the sequences.

Describe the solution you'd like

It would be nice to expand on this and add a few more features:

  • Show genome, transcript, protein sequences, plus something like a "ruler" to show the positions.
  • Have a flexible windows size, that adjusts based on the size of a variant.
  • Exposes the data that is behind the view, so alternative renderings (perhaps SVG graphics?) could be built on top of this too
  • Does not require the uta_align module, since we should have all the alignments already in UTA.
  • nice to have: option to add some color to improve readability.

Describe alternatives you've considered

  • I don't think there is anything quite like that yet for the hgvs library (besides what is already in context.py)

The question is mostly if we want to have better tooling around visualizing as part of the main hgvs module, or perhaps as a separate tool. Since we already have context.py, perhaps it fits into the main library.

Maybe this is a bit too out there, but for those publishing work to Jupyter notebooks (probably a lot of us), you can also supply special repr methods that incorporate HTML/CSS (Pandas dataframes are probably the most popular example of this). That could go a long way towards improving readability, for that context.

Actually that would be nice. Being able to show the context in a notebook would be a good feature to have.

@andreasprlic I think this issue is related to #742 , right? I linked the PR to this issue.