/Wikipedia_Search_Engine

A TF-IDF (Term Frequency & Inverse Document Frequency) based search algorithm for searching a small subset of Wikipedia Data using Apache Spark Cluster of 3 Nodes on top of HDFS, hosted on AWS, having web UI with Django.

Primary LanguagePython

Wikipedia-Search-Engine

Project Links:

Pre-Requisites:

STEPS:

(Activate your virtual environment and clone this repository to present working directory)

STEP-1: Start Django in-built server:

python manage.py runserver 127.0.0.1:8000

STEP-2: Now open browser and goto 127.0.0.1:8000/bigdatajob to interact with the search engine. (Use credentials (testuser,test1234) to login!!)

Collaborators

Madhav Agarwal
Raj Kumar Maurya
Kunwar Ashutosh Singh

This project was the winner of BigWar, a 48 hours National Level Project Competition organized by ABESIT, Ghaziabad during BigDataThon'17 from 7-9th April 2017.