/Link-Prediction-on-DBLP-Citation

Link Prediction Techniques using Machine Learning and Neural Networks. (neo4j used for storing the data)

Primary LanguageJupyter Notebook

Link Prediction Techniques on Co Authors in a Citation System

The project is using the following datasets for testing.

Structure of these datasets can be found here.

Note: The neo4j queries can be found in ML notebooks which have been used many times in the GraphSAGE notebooks as well.

Contents


Introduction

Link prediction is one of the most important research topics in the field of graphs and networks. The objective of link prediction is to identify pairs of nodes that will either form a link or not in the future.

Link prediction depiction


Problem

Given a citation network system in which different authors have collaborated with each other in the past. Our task is to find the links that can be formed with the authors in the future. (i.e. they are co-authors)


Database

For storing the data, we are using neo4j (in both ML as well as Deep Learning Techniques). Cypher is used for data manipulation. For connectivity with the python Data Science Ecosystem, py2neo is used.

For installing neo4j instance on Linux VM, you can follow this.


Machine Learning Techniques for Link Prediction

Here we are using the following techniques to measure similarity measures to get an idea about the structure and topology of the Graph Network as well.

  • Common Neighbours
  • Preferential Attachment
  • Total Neighbours
  • Triangle Completion and Clustering Coefficients
  • Label Propagation
  • Louvain Algorithm

The scores obtained from these techniques can be used alone to determine the links for future. For better performance, these features can be fed into some ML model to obtain the results. We are using Random Forest Classifier for the purpose (which will act as a binary classifier).

The dataset is very large to process on a whole. So, we are using subsets of the data to perform our tasks. (Notebooks with different sets of data have been added.)

Notebooks for the above techniques


Deep Learning Techniques

We will be leveraging power of Graph Neural Networks to achieve Link Predictions in our co-author Graph. We are using GraphSAGE implemented in stellargraph library for this.

The following notebooks has been added (Citation v11 used):


Reference

[1] Graph Algorithms: Practical Examples in Apache Spark and Neo4j, By Mark Needham & Amy E. Hodler

[2] GraphSAGE: Inductive Representation Learning on Large Graphs