This is an End-to-End News Category Classification Application, designed to collect news articles from various RSS feeds, store them into a database, and classify them into predefined categories. The categories include Terrorism/Protest/Political Unrest/Riot, Positive/Uplifting, Natural Disasters, and Others. The classified data is then outputed as a sql dump by the application.
- Parsing of RSS feeds to extract news article information.
- Database storage to store parsed news articles without duplicates.
- Asynchronous processing of new articles using using asyncio, to streamline processing tasks.
- Category classification using a Zero-Shot NLP Classifier.
- Logging and error handling for robustness.
Programming Language: Python Libraries: Flask, Feedparser, Transformers (HuggingFace),Asyncio, Matplotlib
Database: sqlite3
- Clone the GitHub repository:
git clone https://github.com/kriss-3957/Zero-Shot-News-Classifier-App/tree/main
cd Zero-Shot-News-Classifier-App
- Install dependencies: For your convenience a requirements.txt is provided, which allows you to install them by using pip:
pip install -r requirements.txt
- Run the Flask application:
python app.py
-
Visit http://127.0.0.1:5000/ in your web browser.
-
Select the desired RSS feed links from the provided list.
-
Click on the "Fetch Articles" button.
-
View the results, including a table of articles and a category frequency plot.
Alternatively, you can directly use the deployed app by visiting the deployed link :
Zero-Shot.News.Classifier-App-Demo.mp4
The application starts by parsing the selected RSS feeds asynchronously using asyncio. It utilizes the feedparser library to retrieve news articles from the feeds.
The parsed articles are stored in a relational database using SQLAlchemy. The database schema is designed to avoid duplicates.
Asyncio is used for asynchronous processing. Each news article is processed asynchronously for category classification using a Zero-Shot NLP Classifier.
The application employs the HuggingFace Zero-Shot NLP Classifier to predict the category of each news article. This is done by creating a pipeline with an explicit tokenizer and model.
Proper logging is implemented throughout the application to track events and potential errors. The application gracefully handles parsing errors and network connectivity issues.
The application stores the parsed news articles into a database using the 'sqlite3' library. The data is stored in a table named news_articles, and duplicates are avoided using database constraints.
The application handles errors gracefully, providing informative messages in case of parsing errors or other exceptions. Proper logging is implemented to track events and errors.
Further enhancements of the application may include : Fine-tuning the Zero-Shot NLP Classifier with additional training data for better category predictions. Implementing user authentication and personalized feeds. Adding support for additional RSS feeds and categories. Optimizing and scaling for larger datasets. Contributing Contributions to the project are welcome! Feel free to open issues, submit pull requests, or suggest improvements.
This project is licensed under the MIT License - see the LICENSE file for details. MIT