document-clustering-and-visulization
CSE 573 - Semantic Web Mining
Team Members
- Aditya Deepak Bhat (1222133796)
- Aniruddha Bhowmik (1223096615)
- Fenil Madlani (1222149747)
- Ishan Srivastava (1219537111)
- Pawan Wagh (1219432396)
- Shivam Malviya (1222318565)
The aim of this project is to:
- Explore various clustering, and visualization techniques for textual documents.
- Perform Clustering on these documents and create visualizations for the generated clusters.
- Perform sentiment analysis within each category to detect positive and negative sentiments.
- Visualize these documents to see if there is any relation between the categories and the sentiments.
Setup Details and Execution Steps
All the notebooks were executed in Google Colab using the GPU/TPU runtime environment. The notebooks contain installation steps to install the required libraries apart from what is already supported by Colab. So, these can be directly uploaded and run on Google Colab.
Setup Details and Execution Steps
Dataset used - 20 Newsgroups Dataset (http://qwone.com/~jason/20Newsgroups/)
Some of the Algorithms used are:
Clustering
- Latent Dirichlet Allocation (LDA)
- K-Means
- HDBScan
Visualization
- Uniform Manifold Approximation and Projection (UMap)
- t-SNE
Sentiment Analysis
- Vader (Valence Aware Dictionary and sEntiment Reasoner)