/ipykgb

A package for maintaining knowledge graph building tools with Jupyter examples

Primary LanguageJupyter NotebookBSD 3-Clause "New" or "Revised" LicenseBSD-3-Clause

ipykgb

A package for maintaining knowledge graph building tools with Jupyter examples. ipykgb is an end-to-end pipeline for automatically constructing knowledge graphs from unstructured text in the form of RDF triples without supervision. It utilizes technologies from spaCy, Stanford CoreNLP, Neural Coref from Huggingface, and sentence-transformers to perform Named Entity Recognition, Coreference Resolution, Relation Extraction, and Entity-Linking. The goal here is to offer a novel knowledge representation construction method for applications in automated ontology construction and taxonomy expansion. The intuition here is that by extracting knowledge from text sources, this project will aid in solving the low coverage issue often faced with hand crafted knowledge bases.

Workflow Visualization

pipeline1

Tools & Systems Flowchart

pipeline

Example

Input Text:
Darth Vader, also known by his birth name Anakin Skywalker, is a fictional character in the Star Wars franchise. Darth Vader appears in the original film trilogy as a pivotal antagonist whose actions drive the plot, while his past as Anakin Skywalker and the story of his corruption are central to the narrative of the prequel trilogy. The character was created by George Lucas and has been portrayed by numerous actors. His appearances span the first six Star Wars films, as well as Rogue One, and his character is heavily referenced in Star Wars: The Force Awakens. He is also an important character in the Star Wars expanded universe of television series, video games, novels, literature and comic books. Originally a Jedi who was prophesied to bring balance to the Force, he falls to the dark side of the Force and serves the evil Galactic Empire at the right hand of his Sith master, Emperor Palpatine (also known as Darth Sidious).

Output Graph:
graph

Requirements

  • python >=3.7
  • spacy
  • pandas
  • stanford-corenlp
  • json
  • nltk
  • miniconda

Setup

  1. Create new conda environment:
    conda env create -f environment.yml
  2. Activate environment:
    conda activate ipykgb
  3. Download CoreNLP 4.2.0 and place in root folder of this repo:
    https://stanfordnlp.github.io/CoreNLP/download.html

Example Notebooks

Example notebooks can be found under the notebooks folder which contain

  • NER_Evaluation.ipynb
  • CoRef_Evaluation.ipynb
  • Relation_Extraction.ipynb
  • EntityLinking_Evaluation.ipynb
  • Notebook_Text_to_Graph_Pipeline.ipynb
  • Notebook_SpaCy_Parsing_OpenIE_BERT_Evaluation.ipynb

For more information on this project, check out the Wiki.