document-clustering-and-visulization

CSE 573 - Semantic Web Mining

Team Members

Aditya Deepak Bhat (1222133796)
Aniruddha Bhowmik (1223096615)
Fenil Madlani (1222149747)
Ishan Srivastava (1219537111)
Pawan Wagh (1219432396)
Shivam Malviya (1222318565)

The aim of this project is to:

Explore various clustering, and visualization techniques for textual documents.
Perform Clustering on these documents and create visualizations for the generated clusters.
Perform sentiment analysis within each category to detect positive and negative sentiments.
Visualize these documents to see if there is any relation between the categories and the sentiments.

Setup Details and Execution Steps

All the notebooks were executed in Google Colab using the GPU/TPU runtime environment. The notebooks contain installation steps to install the required libraries apart from what is already supported by Colab. So, these can be directly uploaded and run on Google Colab.

Setup Details and Execution Steps

Dataset used - 20 Newsgroups Dataset (http://qwone.com/~jason/20Newsgroups/)

Some of the Algorithms used are:

Clustering

Latent Dirichlet Allocation (LDA)
K-Means
HDBScan

Visualization

Uniform Manifold Approximation and Projection (UMap)
t-SNE

Sentiment Analysis

Vader (Valence Aware Dictionary and sEntiment Reasoner)

aditya-bhat/document-clustering-and-visualization

document-clustering-and-visulization

Team Members

Setup Details and Execution Steps

Setup Details and Execution Steps

Some of the Algorithms used are:

Clustering

Visualization

Sentiment Analysis