/hn_kaggle

Simple clustering of HN posts from this Kaggle dataset: https://www.kaggle.com/hacker-news/hacker-news-posts

Primary LanguageJupyter NotebookGNU General Public License v3.0GPL-3.0

Clustering Hacker News post titles

A simple method of clustering and viewing Hacker News posts.

Data obtained from https://www.kaggle.com/hacker-news/hacker-news-posts.

Example plot

This screenshot shows the first 1000 titles clustered. Clustered HN Post Titles

For an interactive plot, see it directly on Plotly.

Requirements

Pip install

pip install cython numpy pandas scikit-learn gensim plotly

Alternately, see the requirements.txt file.