marcdotson/counting-cockroaches

Clustering tweets into topic groups

AdrielC opened this issue · 3 comments

Clustering tweets into topic groups

We need to find a method for assigning a topic or category to each tweet based on it's text. We can first start with static rules such as substring matching (if tweet_text contains "delay"), although something more generalizable would be better, since people often misspell and shorten words in tweets.

https://github.com/hundredblocks/concrete_NLP_tutorial/blob/master/NLP_notebook.ipynb
I think a good way to go about this would be to use a CNN like in this article. Instead of topic assignment, we do sentence classification.

Agreed. Merging this into a new issue.