News Classification

This project is focused on topic extraction from news. User has a possibility to process lots of news information in the compressed form.

forthebadge forthebadge

PRs Welcome

Table of contents

Description

The goal of this project is to give the user the possibility to process lots of news information in the compressed form. The model we choose to explore was LDA (Latent Dirichlet Allocation). It is a soft-clustering algorithm. It is a natural choice for topic modeling, as usually, our texts consisting of more than just one topic.

Data

Yo can get dataset which we used for training our model here

“Most popular news 2017” by Webhose

Installing

Possible troubles with installation

Since we used library pattern it is possible to faсe the error during installation:

EnvironmentError: mysql_config not found

Solution:

  • Ubuntu/Debian based distros:

sudo apt-get install libmysqlclient-dev

or

sudo apt install default-libmysqlclient-dev

  • Arch based distros:

Install libmysqlclient from AUR

  • For Windows/MacOS/other distros you should find your way to install mysqlclient.

Team

Olesia Tretiak Hermann Yavorskyi
olesyat wardady

Copyright

Ukrainian Catholic University. Ukraine. Lviv. 2020. Artificial Intelligence course. © 2019 Olesya Tretyak, Hermann Yavorskyi.