CDE Harmonization
Closed this issue · 19 comments
View historic comments for this ticket https://github.com/helxplatform/development/issues/868
Need to have:
- Add name
and descriptions of CDEsinto the KGX export (@gaurav) - Move CDEs into a separate tab on the Detail Box (@mbwatson)
- In the Detail Box, include CDE (well, form) name, description and list of related concepts (@mbwatson)
- Each related concept can be clicked on to either start a new Dug search or open a new Detail Box for the new concept (@mbwatson)
Nice to have:
- Figure out how to display a KG/CDE tab in the Summary Box (possibly make them both optional) (@mbwatson)
a branch has been created in the helx-ui repo to consume CDEs per the above requirements: feature/cdes.
Thanks, Matt! It turns out that I don't have any descriptions for CDEs from the HEAL CDE peeps, but I'm regenerated the KGX files with the correct names now. I'll send them to Yaphet once I'm done.
I've regenerated the KGX files and sent them to @YaphetKG. All but six forms seem to have names that we can use; those six just say "File path: [file path to the HEAL CDE file]" for now. I'll fix that in the next run, which I'm hoping to have by the end of next week once I'm back from vacation.
Adding our task list from original ticket
Need to have:
-
Add name and descriptions of CDEs into the KGX export (@gaurav)
-
Move CDEs into a separate tab on the Detail Box (@mbwatson)
-
In the Detail Box, include CDE (well, form) name, description and list of related concepts (@mbwatson)
-
Each related concept can be clicked on to either start a new Dug search or open a new Detail Box for the new concept (@mbwatson)
Nice to have:
- Figure out how to display a KG/CDE tab in the Summary Box (possibly make them both optional) (@mbwatson)
Hi @hhiles on the status of this ticket, we have new KGX files from @gaurav , and the new details tab from @mbwatson. I believe the CDE tabs are currently placeholders (@mbwatson please correct me if i missed that).
For tasks 3 and 4, we have to re-index adding a new feature to roger pipeline and tranql. To describe that a little more, currently the nodes in the redis graph all have the most specific labels from the biolink model labeled on them.
For eg: if there is a node CHEMBL.COMPOUND:CHEMBL1098659 (water)
in the graph its current label is just biolink:ChemicalEntity
, but the whole biolink hierarchy (from Node normalization) assigns these labels :
[
"biolink:ChemicalEntity",
"biolink:NamedThing",
"biolink:Entity",
"biolink:PhysicalEssence",
"biolink:PhysicalEssenceOrOccurrent",
"biolink:ChemicalOrDrugOrTreatment",
"biolink:ChemicalEntityOrGeneOrGeneProduct",
"biolink:ChemicalEntityOrProteinOrPolypeptide"
]
By doing so we are not able to write generic tranql queries that enable discovery we want for 3 and 4.
@frostyfan109 mentioned that he has a version of this that uses a predefined list of specific types to perform the queries.
But I am looking into modifing roger pipeline and tranql to be able to use the full list of node labels so the tranql queries can perform queries on the most high level biolink:NamedThing
instead of a very specific biolink type.
In summary we are further reorganizing the roger pipeline and tranql to be able to perform 3 and 4, in more generic mode.
@YaphetKG the CDEs tab does contain the actual CDEs. it's the related concepts that we're quite completed yet.
the related concepts linked in the CDEs tabs are related to that particular concept, not to the CDE. however, i now have the means to get those per-CDE related concepts (thanks to @frostyfan109 !).
@frostyfan109 reports that this is now ready for an initial demo and review. The plan is to get feedback on the demo, improve it, and then submit it as a PR for incorporation into Dug.
Some notes from @frostyfan109's awesome CDE presentation from last week:
- Can we hide or grey out related concepts that don't have data associated with them?
- Can we display something more useful than a filename? (GV: Not at the moment.)
- Are the TranQL and ROBOKOP tabs overly confusing? It might be a good idea to have an "Advanced" tab that hides those away from the beginning user.
- Would going back to a separate tab (Studies | CDEs) be better than a single tab (Studies)? The compact view prevents cluttering in the smaller view, but it defies user intuition. If that turns out to be a problem, we can separate them again.
- Need to make it clear how CDEs <-> concepts <-> studies link between things -- maybe documentation? A figure? (GV: I'll work on this ASAP).
- How is this related to RadX?
- Does this affect the work Ginnie is working on? (GV: I don't think so.)
In addition to
Need to make it clear how CDEs <-> concepts <-> studies link between things -- maybe documentation? A figure? (GV: I'll work on this ASAP).
I think it would also be nice (for me) to understand how that information is stored in the different dug components
Here's where I am with the CDE <-> concept <-> studies thing: https://docs.google.com/presentation/d/1PFQ2EikcuuYHbXnyEv4V4aLrVmgFkkca7KDoBpgGMKo/edit?usp=sharing
I think it would also be nice (for me) to understand how that information is stored in the different dug components
I'll be working on this next, although I'm not sure I can improve upon Yaphet's "Dug High Level Architecture" slide from the Dug Socialization 'Shop.
@cbizon , @gaurav , to add more context here, dug initially used to parse studies, from files such as dbgap xml and topmed csv. And this end up in the variables index with each variable containing the study it belongs too as an attribute (dugElement.collection_id, dugElement.collection_name) (https://github.com/helxplatform/dug/blob/develop/src/dug/core/parsers/_base.py#L9) .
When working with CDE's we took a different approach to indexing them. We used some logic to extract out dugElements from Graph instead of the aforementioned. The way it works is that Each dug elemement is annotated as perviously but when we go to expand it to find other concepts that are related to it in the graph, we also look for CDE's and parse those are DugElements aswell that would end up in the Variables index.
This way all Study metadata (variables) are linked to CDE's this slide tries to outline some details around this (https://docs.google.com/presentation/d/1eJjT8-GoLrTYa9_V3NvwrwBHHx-JaDCzMdhVHA9bkfY/edit#slide=id.g105cd980d51_0_0) .
The Graph queries look something like
for each concept from annotation of a dug element :
get related cdes
parse cde element as dug element
add the concept to it
(https://github.com/helxplatform/dug/blob/develop/src/dug/core/crawler.py#L197)
Ack, sorry, I've been sidetracked by Translator week and totally missed this until just now!
Current status and remaining work:
- A workflow for annotating CDEs using SciGraph, MedType or OGER has been developed and used to generate KGX files for ingest into Dug.
- This needs to be better documented, but seems to be working well: heal-data-stewards/heal-cdes#17
- Code for incorporating KGX files into Dug: completed by Yaphet.
- A user interface for allowing users to search using HEAL CDEs: completed by Griffin and Matt.
- We have some outstanding questions about this work (see comment above at #221 (comment)), but I don't think any of those need to be fixed before we have a broader discussion of this feature with the NIH HEAL CDE team.
Before we can share this publicly, we should get an "okay" from the NIH HEAL CDE team whose CDE data we're using, to make sure they understand how much information about HEAL CDEs we are sharing and to check if they have questions or concerns for us about any of this. I chatted about this a bit with Kathy last week, and it sounds like before we get to that stage, we will need to make a full HEAL Dug release and get it signed-off by the C3 team and possibly also the HEAL Stewards; once we've done that, we could show it to the NIH HEAL CDE team as part of the Dug socialization process. I'm hoping that's something we can discuss at a C3 meeting soon!
@hhiles I've written up everything I can think about here: https://renci.atlassian.net/browse/DUG-281
I'll add more there if I think of more. Apart from that, I think we can close this issue!