olafmaas/hackdelft

dataset-links.txt missing

Closed this issue · 4 comments

Any chance you could also upload this file? Would appreciate being able to study your great code much better. Thanks!

Hi James,

We initially ignored all the .txt files since they are data files. Especially for you I make an exception and added the dataset-links.txt to the repo, see e4f5ab4. Since you are interested in the technology used in this repo, you might want to read this article Rik wrote about this little project: https://medium.com/towards-data-science/automatic-topic-clustering-using-doc2vec-e1cea88449c (~4min read).

Care to share what you are up to?

Hey thanks so much for doing this so fast. I actually had stumbled upon that original article while I was researching semantic clustering methods. Thought the whole webpage thing was really cool as soon as I saw it and was just trying to study how it was done. Was just very curious, as it was a unique way to represent clusters in an interactive manner. Really awesome idea!

Ah while you were typing your reply, I discovered you placed a comment on the article on medium...

Thanks again! Also note that since we don't have a known K, we could probably use a density based clustering method, or my preferred method of hierarchical when working with text, as it allows to visually see separation and better choose the cutoff criteria. This is really cool code!