This project utilises news article api to scrap and cluster articles found. After placing them into several cluster, we then present the clusters with their correspong articles and urls. The KMean model was used to help us cluster the articles. The app has been deployed on Streamlite since it offers easy and smooth presentation of results.
Note: Deployed version of the web pages Here
- streamlit for creating the web app interface.
- requests to make HTTP requests to fetch news articles from an API.
- TfidfVectorizer from sklearn.feature_extraction.text for converting text documents into matrix of TF-IDF features.
- KMeans from sklearn.cluster for clustering the articles based on content similarity.
- Standard libraries such as datetime, os, and json for working with dates, file paths, and JSON data, respectively.
Please note that these packages should be installed first, as per requirements.txt
To run the project locally, there is a need to have Visual Studio Code (vs code) installed on your PC:
- VS Code: It is a source-code editor made by Microsoft with the Electron Framework, for Windows, Linux, and macOS.
- Clone the project
git clone https://github.com/UmuhireJessie/article-clustering.git
- Open the project with vs code
cd article-clustering
code .
- Install the required dependencies
pip install -r requirements.txt
- Run the project
streamlit run article.py
- Use the link printed in the terminal to visualise the app. (Usually
http://127.0.0.1:8501/
)
- The app has used existing APIs for news articles so one may be required to generate an API KEY if they want to test the app locally.
Jessie Umuhire Umutesi