Topic and Polarity Classification of news

Topic and Polarity Classification of Dutch news related to the Corona virus outbreak.

Input Data: Google News from USA related to COVID-19 outbreak from the topics of Healthcare, Science, Economy, and Travel. Google News gives the possibility to filter the news based on the country, COVID-19, and topic.
Main goal: Classify the news based on the topic ( Healthcare, Science, Economy, and Travel) and on the polarity (positive, negative, neutral).
End result: Deploy a dashboard which shows for each newsitem the detected topic and polarity labels.

Steps:

  1. Scraping news from the Google News.
    We scraped the news from 18-04-2020 until 10-05-2020, the period of the coronavirus outbreak.

  2. Topic Classification
    For having better results, we used an ensemble model, which combines the results of two Machine Learning algorithms.
    a) Logistic Regression
    b) FastText

  3. Polarity Clasification with VADER algorithm

  4. Dashboard, which displays the news with their labels

How to start

Clone the repository and run main.py

Brief explanation of the files

Input Data:
• Corona-ScrapedData: folder that contains the scraped data and the Python scripts used to scrape the data from Google News

Main implementation:
• preprocessing.py: reads and preprocesses the scraped data
• main.py : the main function which reads the training data, trains the models for topic and polarity classification, and predicts the labels for unknown newsitems
• simple_text_classification.py: implements TFIDF and Logistic Regression training and prediction
• FastText.py: implements the FastText algorith for Topic Classification.
Training: It uses the Google News with the topic categories (Healthcare, Science, Economy, and Travel)
Prediction: Given a newsitem it predicts its label (Healthcare, Science, Economy, and Travel)
• polarity_analysis.py: implements the polarity classification algorithm (VADER). It uses a rule-based technique
Prediction: Given a newsitem it predicts its label (positive, negative, neutral)

Results:
• topic_classification_predictions.csv: prediction results of topic classification
• polarity_predictions.csv : prediction results of polarity classification