/Source-Recommendation-System

Source-Recommendation-System takes an article from the user as input and outputs any relevant article from a dataset of 8.5 million articles.

Primary LanguagePython

Source-Recommendation-System

Source Recommendation System takes an article from the user as input and outputs any relevant article from 8.5 million articles in the dataset to the user. It uses Apache Spark to handle this huge load of articles.

Prerequisites

This project uses rake-nltk library to extract keywords.

pip install rake-nltk

FakeNewsCorpus was used as dataset (27 GB) for news articles. Apache Spark has been used to handle this huge dataset. It needs to be correctly installed and configured. The configuration file for Spark can be found at spark-2.4.4-bin-hadoop2.7 folder. Hadoop was used as underlying distributed file system. The configuration for Hadoop can be found at hadoop-conf folder. Both of them needs to changed according to your configuration.

Source Code

The source code can be found at /src folder.

Algorithm & Implementation Details

This idea was implement as project for course work of Distributed System course in Colorado State Univeristy. Detailed description of the algorithm can be found here -

Authors