CMPUT-664 Knowledge Graph Construction for Detecting Cybersecurity Attacks

Group Members

Student name	CCID
Shraddha Mukesh Makwana	smakwana
Pranjal Dilip Naringrekar	naringre

colab-notebooks: contains experiments and processing done over Colabs notebooks
data: dataset of CVE vulnerabilities taken from kaggle
spacy-relation-extraction: contains modified base relation extraction given by Spacy
reports: contain the proposal, check-in and final report

Step 1: Create a folder on GDrive and place the converted spacy files given over there (Colab contains this step, but in order to skip converting spacy files it's been placed over there, as it might take time)
Step 2: Execute Google colab step to train the model for entity recognition (It currently finds entity like VULNERABILITY and OPERATING_SYSTEM)
Step 3: After that clone spacy's based project and excute commands as per colab to train relation extraction model (It currently finds HAS_VULNERABILITY relation). Also once clone command is ran given in the notebook, then inside rel_component folder, which will create at runtime, create another folder called 'data' and place all 3 spacy files and then run remaining command
Step 4: Last part of google colab is to feed the data to Neo4j using queries for visualization

In order to do this step one has to first create an account over here [https://sandbox.neo4j.com/?usecase=blank-sandbox]
Create a new project, this will give host, username and credential to login to this neo4j project
In google colab notebook, put these temporary credentials to run it
Note: Neo4j sandbox gets expired within 3 days. Hence, every time we need to create new instance of project if we want to run it

(Note: In general provided google colab notebook contains all the step. However, it took us couple of hours[approx. 4-5 hours] to train the model hence we have also placed screenshot of our previous successful run into the output folder for reference)

For data we have used publically available data set on Kaggle. Click here to get the dataset.

The project was done under coursework of Prof. Karim Ali