how to deal with Other NCBI genome versions

Question

how to deal with Other NCBI genome versions

Closed this issue 5 years ago · 3 comments

I noticed in your article that: It supports genome assemblies from the NCBI Assembly resource (Kitts et al., 2016), including GRCh37 and GRCh38. This means this package can handle other NCBI assembly versions. I want to know how to use mapping for reference genomes of other species. I wanted the uta again and again, I still don't know what to do with the non-human version(GRCh37 and GRCh38) of the genome.

Answer 1 · 2019-09-16T18:22:25.000Z

Hi @winni-liu: Using the full capabilities of hgvs for non-human species would be difficult. Specifically, you would need 1) access to reference sequences (genomic, transcript, protein), 2) genome-transcript alignments These are available from NCBI, but preparing them for hgvs would be time consuming.

Yes, hgvs uses assemblies, but really only the notion that an assembly refers to a collection of reference genomic sequences.

Please close this issue if that answers your question.

Answer 2 · 2019-09-17T01:38:14.000Z

Hi @winni-liu: Using the full capabilities of hgvs for non-human species would be difficult. Specifically, you would need 1) access to reference sequences (genomic, transcript, protein), 2) genome-transcript alignments These are available from NCBI, but preparing them for hgvs would be time consuming.

Yes, hgvs uses assemblies, but really only the notion that an assembly refers to a collection of reference genomic sequences.

Please close this issue if that answers your question.

I would like to try to generate genome-transcript alignments and apply it to other species using hgvs. Can you give me some Suggestions for generating such files: ncbi_20170907-schema.pgd.gz uta_20180821.pgd.gz

Answer 3 · 2019-09-17T15:03:46.000Z

Unfortunately, this is much harder than I can coach you through right now. Briefly, you'd have to write a new hgvs data provider (see hgvs/dataproviders/uta.py for example in the source), then arrange to return data in the appropriate format for your species.

The good news is that a company has offered to sponsor some work that should make it much easier to add custom sequences, transcripts, and alignments to hgvs. I don't have an ETA for that work, but 6 months seems quite possible.