fml-twitter

In twitter, the user can be verified or not, meaning that this is an official profile. But twitter do this verification process manually, so we want to create an program that given a profile, see if it is a potential verifiable.

TODO:

Grid search to find the best numerical features (DONE!)

separate the data in training and test sets (DONE!)

Implement more algorithms (So far: SVM, Adaboost, stochastic_gradient_descent, nearestneighbor, decision_tree)

Create Slides (Cody doing)

graphs and Tables

Confusion Matrix (for the 2 best and 2 worse features)
Accuracy (comparison between algorithms)
ROC curve (?)

NLP

Q: how to extract features considering the different possible languages the user speak?
Q: how to extract features considering the use of symbols, 
Analyse the user names using n-grams
Analyse the description
Analyse the tweets (?)

dnr2/fml-twitter