/spam-classifier-svm

Spam classifier using Support Vector Machines

Primary LanguageMATLAB

Spam classifier using Support Vector Machines

Logo


The training data

The training data is present in file train and train-small while unlabled data is present in file test

Usage

Implementation is done using CVX package for convex-optimization which was later compared with classification done using LibSVM for matlab.
Each matlab script properly documented and it explains what it is doing.

Theory & Performance


image

image

image

Possible improvements

  • Use capitalization data - right now we are using lowercased data. But anecdotally it seems like spams have a higher chance of being in all caps [ shouting , Supurios offers, etc ].

  • Use punctuation - the classifier doesn't really use punctuation, this is most likely a mistake because spams seem to have a lot of weird punctuation and ascii art.

  • Search for keywords - just tokenizing the comment isn't the best because a lot of spam comments look like "pleasecheckoutmyfacebookpageatwwwfacebookcom/blah"

  • Most of the feature which are used in twitter-sentiment-analyis can be used.

Contributing

  1. Fork it!
  2. Create your branch: git checkout -b my-new-feature
  3. Commit your changes: `git commit -m 'Added Some featues'``
  4. Push to the branch: git push origin my-new-feature
  5. Submit a pull request :)

Credits