/NTDS-Project

Primary LanguageJupyter NotebookMIT LicenseMIT

A Culinary Tour of Data Science

Analyzing the Epicurious recipe data available on Kaggle using network science and graph theory.

Prerequisites

  • nltk 3.4.5
  • networkx 2.3
  • numpy 1.16.5
  • pandas 0.25.1
  • seaborn 0.9.0

File descriptions

Data Processing and Graph Construction.ipynb - uses natural language processing techniques to extract ingredients from a raw recipe text for each dish and then builds a weighted graph with nodes being dishes and connects them based on the common ingredients they share using Jaccard simialrity index

Exploring the Graph.ipynb - explores the structure and various properties of the graph

Data Exploitation.ipynb - this is the main notebook which tries to answer the research questions. It includes the following:

  • Network visualizations
  • Clustering using Louvain method (community detection)
  • KNN-based recommender system
  • Logistic regression model predicting recipe ratings using graph filtering