Explore different clustering algorithms, Implemented on the dataset
The aim of this project is to:
- Explore different clustering algorithms, Implemented on the dataset
- Cluster the news articles
- Recommend similar articles that are available similar to the documents
- Extract keywords in the articles and provide a short summary
- Apply Visualization techniques on the textual document to showcase relevancy
- Identifying anomalies in the dataset
- Identifying popular words in each group - like a word cloud
Dataset used - 20 Newsgroups Dataset (http://qwone.com/~jason/20Newsgroups/) Using sklearn.datasets import fetch_20newsgroups
We used the google colab and Pycharm Jupyter notebook for additional CPU and GPU support and dataset storage
- LDA
- HDBScan
- Agglomerative clustering
- t-SNE
- UMAP
- Compression-VAE