Cyberthon-2023

Team CyberNirikshak

Team Members-

Sparsh Singh Bhatia
Ujjawal Gupta
Dhruv Goyal
Ankur Gupta

Project Name:

Automated Crawling, Categorization and Sentiment Analysis of Digital News with Incorporated Feedback System.

Problem Statement-

Build a sentiment analysis tool that monitors social media and local news for public sentiment regarding law enforcement. The system should help identify community concerns and public sentiment trends

Solution Proposed

We've developed a smart system that automatically scrapes news from numerous sources across the internet including text articles as well as video news. After fetching these articles, these are then classified into categories by their sentiment analysis as positive, neutral, or negative scores are assigned to each news article fetched. If negative news is detected, alerts are sent to the respective government department through their concerned email address. This system keeps the government updated with news events and allows for quick responses when needed. The news are then displayed on a visually appealing and easy to use user-friendly interface where user can refresh and load the latest news when required. If not refreshed manually, the news is automatically refreshed after every hour. Option to fetch news articles in Engish, Hindi and multiple regional languages has been provided.

Tech Stack Used

  • AI: PyTorch, TensorFlow, and BERT libraries for creating ML models.
  • Crawling: Beautiful Soup, Selenium
  • Server: Django backend.
  • Frontend: Next.js and Tailwind CSS frontend.

Run Commands

To run the project locally:

  1. Clone the repository:
git clone https://github.com/ugpec79/Sentimental_Analysis_Generator.git
  • Navigate to project directory.
cd Sentimental_Analysis_Generator
  1. Install dependencies for the client (Next.js):
cd client
npm install
  1. Start the Next.js development server:
npm run dev
  1. Install the necessary libraries and Paste the contents from here into the server folder.

  2. Start the Django backend server.

python manage.py runserver

Approach Details

  • Crawled 12000+ news articles and videos using Python Beautiful Soup and Selenium Library.
  • Applied clustering on these articles to label them into different categories to prepare labeled dataset.
  • Trained this dataset of articles using DistilBERT model to generate department predictions. Accuracy - 83%
  • Used Roberta model to implement sentiment analysis on news articles.
  • Sending mail of Negative News to respective departments using NodeMailer and Gmail - SMTP
  • Integrated this model and crawling functionality with a Django backend and wrote APIs for generating predictions and sentiments.
  • Merged this backend with a simple and attractive UI where user can give triggers to load latest news articles with their analysis.
  • Implemented video news analysis using Selenium library by first extracting audio and converting it into text. Then applied classification and sentiment analysis on the extracted text.
  • Developed the same functionalities for news in Hindi and others languages as well using Google Translate API.

Screenshots

Frontend

Frontend

Frontend