In link prediction, filter nodes by prefix or other slots
Opened this issue · 4 comments
Some graphs have nodes we would like to filter for, but they don't make clear distinctions in their Biolink categories:
PR:000002977 biolink:NamedThing Graph owl:Class
So we would like to specify a filter for prefix rather than category.
This can be based on a flag used in the link_node_types:
block in the config.
Similarly, it would be nice to be able to filter by other node slots/properties:
XPO:0134172 biolink:NamedThing increased apoptosis in simple columnar epithelium An increased occurrence of apoptotic process in simple columnar epithelium. Graph
This could be as simple as a regex for a string value in a named column, e.g., match everything with the string "apoptosis"
@LucaCappelletti94 you may have already solved this problem in terms of filtering graph nodelists by CURIE prefix and mapping it to a namespace
In ensmallen it is possible to filter by the prefix, but I do not know what you mean by mapping it to a namespace
.
Same thing as far as we're concerned - namespace == prefix , at least as far as node IDs go.
Ok, then graph.filter_from_names(...)
has all of the kwargs you may desire for this sort of goal. It should be available in the latest nightly build if I am not mistaken (0.7.0.dev20
).