Malicious Urls: Statistical Learning Project

Malicious url classifier build with SVM, random forest, and logistic regression classifiers.

Data

The data used is a subset of the UCSD malicious url data set, which can be found here.

Does it Work?

Yes, the classifier is very accurate, correctly classifying approximately 99% of the 56,000 observations tested.

Is it Practical?

No, not at all. This was solely an academic exercise. Collecting the >3 million features observed in this data set in real time in order to turn this into a real-world security system is far outside the scope of this project.

How Did you Build it, and How do the Models Work?

Check out writeup.pdf!.

jldbc/malicious-urls

Malicious Urls: Statistical Learning Project

Data

Does it Work?

Is it Practical?

How Did you Build it, and How do the Models Work?