akashkthkr/SWM573_G18_P7_Document_Clustering_Summarization_and_Visualization

Explore different clustering algorithms, Implemented on the dataset

Jupyter NotebookMIT

SWM573 - Document Clustering Summarization and Visualization

Explore different clustering algorithms, Implemented on the dataset

Project Abstract & Aim

The aim of this project is to:

Explore different clustering algorithms, Implemented on the dataset
Cluster the news articles
Recommend similar articles that are available similar to the documents
Extract keywords in the articles and provide a short summary
Apply Visualization techniques on the textual document to showcase relevancy
Identifying anomalies in the dataset
Identifying popular words in each group - like a word cloud

DataSet Used

Dataset used - 20 Newsgroups Dataset (http://qwone.com/~jason/20Newsgroups/) Using sklearn.datasets import fetch_20newsgroups

Run time and Instances

We used the google colab and Pycharm Jupyter notebook for additional CPU and GPU support and dataset storage

Algorithms and Visualization Techniques used

Clustering

LDA
HDBScan
Agglomerative clustering

Visualization Techniques

t-SNE
UMAP
Compression-VAE

Team Members