TWEEZY

Classify Twitter users based on different parameters

Basic Idea

A twitter user is classified into Anomalous, Non Anomalous and Intermediate using 5 parameters and each of these parameter will be given a rank:

Time Difference (denoted by a)
Similarity of Tweets (b)
URL Ranking (c)
Malware URL (d)
Adult Content (e)

each of these parameter will be assigned a value from 1-10 for each user and these parameters have a weight which together will decide whether a user is anomalous or not

Weights of each parameter are :

Time Difference: 0.15
Similarity of Tweets: 0.25
URL Ranking: 0.30
Malware URL: 0.30
Adult Content: 1

An FAL value is assigned combining all these parameters which is given by

Depending upon the FAL value , a user can be classified into Anomalous, Non Anomalous and Intermediate

Classification

This algorithm is applied on a dataset of twitter users from which a dataset of a,b,c,d,e and FAL values are obtained.
Onto this dataset different classification methods are applied.

Classification Methods Used

K-nearest neighbors (KNN)
Support Vector Machine (SVM)
Naive Bayes classifiers
Random Forest
Decision Tree

Structure

Files related to algorithm used is present in Twitter folder
Main.py is the root file to be run from which other functions are called
dataset_generator.py generates dummy data of values a,b,c,d,e,FAL,type into dataset_gen.csv
Classifier.py takes in the data present in the dataset_gen.csv and classify the users based on different Classification Algorithm
wot.py is used to calculate Web Of Trust Rank
similarity.py is used to calculate similarity of tweets
url.py is used to calculate Alexa rank of url's present in the tweets
checkTime.py is used to calculate time difference of tweets
checkContent.py is used to check for adult contents in tweets

How to run ?

To check whether a particular user is anomalous :-

clone this repo
run the following commands in the terminal from the cloned folder
virtualenv venv
source venv/bin/activate
pip install -r requirements.txt
python manage.py migrate
python manage.py runserver
open localhost:8000/main in your browser

To do the classification follow these steps :-

open Twitter folder in terminal
store the dataset of usernames which needs to be classified in dataset_gen.csv
run python main.py
output based on 5 classification algorithm will be displayed as the output

antoniocarreiro/Tweezy