URL categorization using machine learning

Internet can be used as one important source of information for machine learning algorithms. Web pages store diverse information about multiple domains. One critical problem is how to categorize this information. Support vector machines and other supervised classification algorithms have been applied to text categorization.

Project structure


main.py file contains python code of the project.


URL dataset which is used for analyzing and machine learning implementation. Dataset can be downloaded: https://www.crowdflower.com/data-for-everyone/


Jupyter-notebook file which contains code of the project. For more information: http://jupyter.org/


Folder which contains LaTeX documentation for this project.


Project mini documentation in .pdf format generated by LaTeX preparation system.