Semestral project at Czech technical university in Prague. Course BI-VWM, summer 2021
In this project @sutymate and @makarada focused on creating a model, that finds similar articles based on article content. This model is based on LSI model, also knows as latent semantic indexing model. The main idea is that articles are indexed by frequency and importance of various words in the articles.
In order to be able to run our project you need to install dependecies found
in requirements.txt
. You can install them with pip
or anaconda
.
- Install dependencies from
requirements.txt
- Download your own set of articles or extract the recommended set of articles
(around 3000) from
articles/all_articles.zip
. - Move all articles to
lsi-data/articles/
- Run latent semantic indexing by executing
lsi-data/run.py
- Launch the web server to view the results by executing
python3 lsi-web/manage.py runserver
- Go to
localhost:8000
and find how the articles are similar to each other