statgen/bravo_api

Add `other_names` functionality to `variants.get_genes`

Opened this issue · 2 comments

Issue or current state

Discovered this comment regarding adding more sorting options for the mongo aggregate pipeline of get_genes:

#TODO: add other_names (need to use aggregae https://stackoverflow.com/questions/28889240/mongodb-sort-documents-by-array-elements

From the context of the Stack Overflow post, it appears that this comment is about sorting on a field that is not part of the match.

Resolved when

  • Check if @dtaliun recalls the context and intent of the comment.
  • The intent of the comment is elucidated and a feature issue is filled completed, or determined that an issue is not needed.

"FURIN" used to be called "PCSK3". If you search Bravo for "PCSK3", you get nothing. .other_names should be used more like this:

image

@pjvandehaar Thanks for the illustration. That makes sense.

Per @dtaliun

Each gene has a unique identifier (so called Ensemble ID) which starts with “ENSG” and is stored in the gene_id field. Also, a gene has a name (e.g. “PCSK9"), which is stored in the gene_name field. Many genes also have so called “synonyms” or “aliases” (names which were used previously), which are stored in the other_names field (a list of all other names). For example, PCSK9 has a synonym “NARC1”. Currently, the search of variants by gene is done using only gene_id or gene_name fields, but not by other_names. So, if somebody will use “NARC1", no results will be returned.

The intended functionality was to also search through the other_other names field in addition to gene_name and gene_id