Information retrieval project at SPbAU 7th term
We use python and pipenv as a primary tools for
development. See Pipfile, Pipfile.lock,
requirements-dev.txt(if any) and
requirements.txt for full specification of
platform, python and dependency packages.
Basically, to reproduce enviroment, you need to run pip install -r requirements.txt
with certain version of python. However, it is recommended
to use virtualenv.
We provide Makefile for convinient commands implementation.
Run make help
for get info on that.
- psql>=10.0 for crawler to store pages
We provide main.py script, which implements cli interface.
Run python main.py -h
to get info on that.
python main.py crawler
You can now preprocess data (look at this).
Then python main.py --dfpath="data/clean_articles.h5" --indexpath="data/index.json" --workers=8 index
.
Run python main.py web_interface
. Then you can find page
at localhost on port 8080.