/Tweezy

Classify Twitter users based on different parameters

Primary LanguagePython

TWEEZY


Classify Twitter users based on different parameters

Basic Idea

A twitter user is classified into Anomalous, Non Anomalous and Intermediate using 5 parameters and each of these parameter will be given a rank:

  • Time Difference (denoted by a)
  • Similarity of Tweets (b)
  • URL Ranking (c)
  • Malware URL (d)
  • Adult Content (e)

each of these parameter will be assigned a value from 1-10 for each user and these parameters have a weight which together will decide whether a user is anomalous or not

Weights of each parameter are :

  • Time Difference: 0.15
  • Similarity of Tweets: 0.25
  • URL Ranking: 0.30
  • Malware URL: 0.30
  • Adult Content: 1

An FAL value is assigned combining all these parameters which is given by

Depending upon the FAL value , a user can be classified into Anomalous, Non Anomalous and Intermediate

Classification

  • This algorithm is applied on a dataset of twitter users from which a dataset of a,b,c,d,e and FAL values are obtained.
  • Onto this dataset different classification methods are applied.

Classification Methods Used

  • K-nearest neighbors (KNN)
  • Support Vector Machine (SVM)
  • Naive Bayes classifiers
  • Random Forest
  • Decision Tree

Structure

  • Files related to algorithm used is present in Twitter folder
  • Main.py is the root file to be run from which other functions are called
  • dataset_generator.py generates dummy data of values a,b,c,d,e,FAL,type into dataset_gen.csv
  • Classifier.py takes in the data present in the dataset_gen.csv and classify the users based on different Classification Algorithm
  • wot.py is used to calculate Web Of Trust Rank
  • similarity.py is used to calculate similarity of tweets
  • url.py is used to calculate Alexa rank of url's present in the tweets
  • checkTime.py is used to calculate time difference of tweets
  • checkContent.py is used to check for adult contents in tweets

How to run ?

To check whether a particular user is anomalous :-

  • clone this repo
  • run the following commands in the terminal from the cloned folder
  • virtualenv venv
  • source venv/bin/activate
  • pip install -r requirements.txt
  • python manage.py migrate
  • python manage.py runserver
  • open localhost:8000/main in your browser

To do the classification follow these steps :-

  • open Twitter folder in terminal
  • store the dataset of usernames which needs to be classified in dataset_gen.csv
  • run python main.py
  • output based on 5 classification algorithm will be displayed as the output

Screenshots

1 2 3 4