Search Engine on Stackoverflow corpus nearly 160k+ docs
I split the code in three notebook files
- DataExtraction is code for collecting data
- DataPreprocessing is code for processing the data
- Retrieval is code for retrieving top 10 similar docs
pandas, numpy, sklearn, nltk, re, os ,sys, csv, xml
For this project I collected data from Stack Exchange Data Dump website
A web crawler which crawls the Stack Overflow website and finds the most popular technologies at current point of time by getting the tags information of the newest questions asked on the website.
webcrawler is the code for the this task
urllib3, requests, bs4, zlib, operator, os, sys
Download the files and make sure all the files and folders are in the same directory
To run the code in server
- Go to UI-demo folder
- create a virual environment ( Command : virtualenv env for windows)
- activate the virtual environment (Command : env/Scripts/activate)
- install requirements.txt (Command: pip install -r requirements.txt)
- run python app.py in the terminal