Web_Scraping_and_Classification


This repository contains Python code for classifying news articles into different categories using machine learning techniques. The code fetches articles from a news website, extracts headlines and content, and uses various models for classification.

Dependencies

Make sure you have the following libraries installed:

  • requests
  • BeautifulSoup (bs4)
  • pandas
  • scikit-learn
  • imbalanced-learn (imblearn)
  • seaborn
  • nltk

You can install them using:

pip install requests beautifulsoup4 pandas scikit-learn imbalanced-learn seaborn nltk

Usage

  1. Run News_Scraping_and_Classification_final.ipynb to fetch news articles and create a CSV file (News_data.csv) with headlines, content, and categories.

  2. Run news_classification.py to perform text classification using K-Nearest Neighbors and Support Vector Machine models. The optimal parameters are determined through cross-validation.

Folder Structure

  • News_Scraping_and_Classification_final: Contains the Code file of Web_Scraping and Classification
  • News_data.csv: Contains the CSV file with NEWS data.
  • README.md: Placeholder for images used in the README.
  • README.md: Documentation explaining the code and usage.

##Screenshots Screenshot 2023-11-23 at 12 43 49 AM Screenshot 2023-11-23 at 12 43 59 AM Screenshot 2023-11-23 at 12 44 59 AM Screenshot 2023-11-23 at 12 44 07 AM Screenshot 2023-11-23 at 12 44 35 AM Screenshot 2023-11-23 at 12 44 48 AM