Extract features from the raw data. The features include: length of URL, length of primary domain, number of dots, contain IP, average word length, longest word length, the number of special character.
Implement logistic regression with python. Using the stochastic gradient ascent method to make the data quickly converge to the actual values.
Use statsmodels(https://www.statsmodels.org/) to implement logistic regression.
Results after using the logistic regression method and adjusting the parameters, based on Microsoft Azure Machine Learning Studio(https://studio.azureml.net/)
Results after using the decision tree, based on Microsoft Azure Machine Learning Studio