/social_network_graph_link_prediction

Link prediction in a directed social graph.

Primary LanguageJupyter Notebook

Social network Graph Link Prediction - Facebook Challenge

1. Problem statement:

Given a directed social graph, have to predict missing links to recommend users (Link Prediction in graph)

2. Data Overview

Taken data from facebook's recruting challenge on kaggle https://www.kaggle.com/c/FacebookRecruiting
data contains two columns source and destination eac edge in graph. - Data columns (total 2 columns):
- source_node int64
- destination_node int64

3. Mapping the problem into supervised learning problem:

4. Business objectives and constraints:

  • No low-latency requirement.
  • Probability of prediction is useful to recommend highest probability links

5. Performance metric for supervised learning:

  • Both precision and recall is important so F1 score is good choice
  • Confusion matrix

6. Getting Started

Start by downloading the project and run "Facebook_Link_Prediction_Models.ipynb" file in ipython-notebook.

7. Prerequisites

You need to have installed following softwares and libraries before running this project.

  1. Python 3: https://www.python.org/downloads/
  2. Anaconda: It will install ipython notebook and most of the libraries which are needed like sklearn, pandas, seaborn, matplotlib, numpy and scipy: https://www.anaconda.com/download/

8. Libraries

  • scikit-learn: scikit-learn is a Python module for machine learning built on top of SciPy.

    • pip install scikit-learn
    • conda install -c anaconda scikit-learn
  • networkx: NetworkX is a Python package for the creation, manipulation, and study of the structure, dynamics, and functions of complex networks.

    • pip install networkx
    • conda install -c anaconda networkx
  • nltk: The Natural Language Toolkit (NLTK) is a Python package for natural language processing.

    • pip install nltk
    • conda install -c anaconda nltk

9. Authors

• Manish Vishwakarma - Complete work