Akash Rout
Sparsh Singh Bhatia
Ujjawal Gupta
Dhruv Goyal
Automated Crawling, Categorization and Sentiment Analysis of Digital News with Incorporated Feedback System.
The project addresses the need for a 360-degree feedback software for monitoring Government of India-related news stories in regional media using Artificial Intelligence and Machine Learning.
We've developed a smart system that automatically scrapes news from numerous sources across the internet including text articles as well as video news. After fetching these articles, these are then classified into categories of which ministry’s jurisdiction they come under followed by their sentiment analysis as positive, neutral, or negative scores are assigned to each news article fetched. If negative news is detected, alerts are sent to the respective government department through their concerned email address. This system keeps the government updated with news events and allows for quick responses when needed. The news are then displayed on a visually appealing and easy to use user-friendly interface where user can refresh and load the latest news when required. If not refreshed manually, the news is automatically refreshed after every hour. Option to fetch news articles in Engish, Hindi and multiple regional languages has been provided.
- AI: PyTorch, TensorFlow, and BERT libraries for creating ML models.
- Crawling: Beautiful Soup, Selenium
- Server: Django backend.
- Frontend: Next.js and Tailwind CSS frontend.
To run the project locally:
- Clone the repository:
git clone https://github.com/DhruvGoyal375/syntax-error-x.git
- Navigate to project directory.
cd syntax-error-x
2. Install dependencies for the client (Next.js):
```terminal
cd client
npm install
- Start the Next.js development server:
npm run dev
-
Install the necessary libraries and Paste the contents from here into the server folder.
-
Start the Django backend server.
python manage.py runserver
- Crawled 12000+ news articles and videos using Python Beautiful Soup and Selenium Library.
- Applied clustering on these articles to label them into different categories to prepare labeled dataset.
- Trained this dataset of articles using DistilBERT model to generate department predictions. Accuracy - 83%
- Used Roberta model to implement sentiment analysis on news articles.
- Sending mail of Negative News to respective departments using NodeMailer and Gmail - SMTP
- Integrated this model and crawling functionality with a Django backend and wrote APIs for generating predictions and sentiments.
- Merged this backend with a simple and attractive UI where user can give triggers to load latest news articles with their analysis.
- Implemented video news analysis using Selenium library by first extracting audio and converting it into text. Then applied classification and sentiment analysis on the extracted text.
- Developed the same functionalities for news in Hindi and others languages as well using Google Translate API.