This project was created during the WildHack hackathon
The task this project solving is processing and classifing news about Kamchatka.
As input data we receivedseveral thousands of news and as output we were required to structurize this news in a way that future newswriters could access the desired news easier.
We checked these news sites:
- kam-kray.ru
- kamtoday.ru
- kamchatinfo.com
- kam24.ru
- kronoki.ru
And decided to use kamtoday.ru
- Clone this repo
- Run the website/website.py using Flask
- Go to the address you will receive in the terminal and use program!
- Groups - classified news. You can see keywords for each news group in the square colored picture. Click on news card to see detailed description. Also you can search and filter news by desired keywords
- Cards - news are grouped by the year of publish. You can apply filters and click on news cards to see detailed description. Search on this page doesn't work because it requires ElasticSearch to be set up on your server.
Server is made on Flask framework
There are several interesting scripts like kamtoday.py
which grab data from news site and prepare it.
Classification is made with tf-idf
filter and then KMeans
classifier.
Have a look at colab notebook with classification code and the presentaion of the project (in russian).