Malicious url classifier build with SVM, random forest, and logistic regression classifiers.
The data used is a subset of the UCSD malicious url data set, which can be found here.
Yes, the classifier is very accurate, correctly classifying approximately 99% of the 56,000 observations tested.
No, not at all. This was solely an academic exercise. Collecting the >3 million features observed in this data set in real time in order to turn this into a real-world security system is far outside the scope of this project.
Check out writeup.pdf!.