/document-clustering-and-visualization

CSE 573 - Semantic Web Mining

Primary LanguageJupyter Notebook

document-clustering-and-visulization

CSE 573 - Semantic Web Mining

Team Members

  • Aditya Deepak Bhat (1222133796)
  • Aniruddha Bhowmik (1223096615)
  • Fenil Madlani (1222149747)
  • Ishan Srivastava (1219537111)
  • Pawan Wagh (1219432396)
  • Shivam Malviya (1222318565)

The aim of this project is to:

  • Explore various clustering, and visualization techniques for textual documents.
  • Perform Clustering on these documents and create visualizations for the generated clusters.
  • Perform sentiment analysis within each category to detect positive and negative sentiments.
  • Visualize these documents to see if there is any relation between the categories and the sentiments.

Setup Details and Execution Steps

All the notebooks were executed in Google Colab using the GPU/TPU runtime environment. The notebooks contain installation steps to install the required libraries apart from what is already supported by Colab. So, these can be directly uploaded and run on Google Colab.

Setup Details and Execution Steps

Dataset used - 20 Newsgroups Dataset (http://qwone.com/~jason/20Newsgroups/)

Some of the Algorithms used are:

Clustering

  • Latent Dirichlet Allocation (LDA)
  • K-Means
  • HDBScan

Visualization

  • Uniform Manifold Approximation and Projection (UMap)
  • t-SNE

Sentiment Analysis

  • Vader (Valence Aware Dictionary and sEntiment Reasoner)