URL categorization using machine learning

Internet can be used as one important source of information for machine learning algorithms. Web pages store diverse information about multiple domains. One critical problem is how to categorize this information. Support vector machines and other supervised classification algorithms have been applied to text categorization.

Project structure

main.py

main.py file contains python code of the project.

URL-categorization-DFE.csv

URL dataset which is used for analyzing and machine learning implementation. Dataset can be downloaded: https://www.crowdflower.com/data-for-everyone/

URL-categorization-Jupyter-Notebook.ipynb

Jupyter-notebook file which contains code of the project. For more information: http://jupyter.org/

Documentation

Folder which contains LaTeX documentation for this project.

Documentation.pdf

Project mini documentation in .pdf format generated by LaTeX preparation system.