/sg_regex

Primary LanguageJupyter Notebook

Subgraph Regex

We collect all code and data for the Node Embedding with Semantic Subgraphs project here. This work in created in association with our publication, CompactWalks: Taming Knowledge-Graph Embeddings with Domain- and Task-Specific Pathways. If you find this repository helpful towards the development of novel methods and systems, please cite our work. The layout of this repo is as follows:

In the directory Code we have the following:

  • Subgraph_Regex.ipynb: This is a Jupyter Notebook containing all of the Python code for the CompactWalk project. The Notebook is divided into multiple sections. The first section covers how we create and utilize regular expression grammar in our project. This section heavily utilizes the lark grammar parasing library. The following section focuses on leveraging the grammar and constructed queries to extract patterns from existing graph databases. This section focuses on CYPHER query generation and Neo4J database connections. The final section of the notebook contains code for performing machine learning tasks on semantic subgraph creation; performing semantic walks on these subgraphs, and calculation of embedding values for these graph.

In the directory Data we have the following:

  • regex_build_times.csv: A dataset of the times taken to build various subgraphs. In this dataset, the length of time to gather and build a subgraph with naive methods with one utilizing semantic filtering. Details of the methods are discussed more elaborately in our publication.
  • comparison.xlsx: A collection of the cosine similarities, rank, and semantic walk scores. These values are calculated for three different embedding methodologies; deepwalk, node2vec, metapath2vec. We collected two sets of datapoints from biomedical experts we wished to compare, one postitive set where drug pairs are mechanistically releated and one negative where drugs share no relationship to each other.
  • robokop-2d.emb: Two dimensional node2vec embeddings run on the ROBOKOP subgraph. These embeddings were generated using the SNAP: Stanford Network Analysis Project.