This repository contains Python code for classifying news articles into different categories using machine learning techniques. The code fetches articles from a news website, extracts headlines and content, and uses various models for classification.
Make sure you have the following libraries installed:
- requests
- BeautifulSoup (bs4)
- pandas
- scikit-learn
- imbalanced-learn (imblearn)
- seaborn
- nltk
You can install them using:
pip install requests beautifulsoup4 pandas scikit-learn imbalanced-learn seaborn nltk
-
Run
News_Scraping_and_Classification_final.ipynb
to fetch news articles and create a CSV file (News_data.csv
) with headlines, content, and categories. -
Run
news_classification.py
to perform text classification using K-Nearest Neighbors and Support Vector Machine models. The optimal parameters are determined through cross-validation.
News_Scraping_and_Classification_final
: Contains the Code file of Web_Scraping and ClassificationNews_data.csv
: Contains the CSV file with NEWS data.README.md
: Placeholder for images used in the README.README.md
: Documentation explaining the code and usage.