Implemented K-MEANS algorithm in Python using Jaccard distance as distance metric and analyzed various twitter based applications that involve truth discovery, trend analysis, search ranking. k-means clustering algorithm on tweet analysis using Jaccard distance
Programming language used: Python
Files included:
- InitialSeeds.txt-contains the initial centroids of the k-means
- Output2.txt - sample output
- tweet cluster.py - k-means clustering implemented in python
- Tweets.json - the boston bombing tweets dataset
Steps to run the code: 1.On the command line go the directory containing the files 2.Type or copy and paste the below command to run the python program on the command line python tweetcluster.py 25 InitialSeeds.txt Tweets.json output.txt
Note:You can change the name of the output file if the output file with the name already exists.
SSE value=16.85524996