/Website-Classification

Classifying a website based on it's URL. I've implemented Stochastic Gradient Descent, Multinomial Naive Bayes and Convolutional Neural Network for classifying the category of the URL.

Primary LanguageJupyter Notebook

URL Based Website Classification Using Deep Learning and Word Based Multiple N-gram Models

I’ve applied SGD and MNB classifiers for website classification by performing stemming on words within URLs and then also applied the same algorithms on n-grams without performing stemming. I’ve also implemented CNN on unigram, bigram, and trigram models.

DMOZ dataset is used for this task. It was known as open directory project(ODP). This dataset has over 1.5 websites with 15 categories that they belong like sports, Arts, Business etc. (you can find it here https://www.kaggle.com/shaurov/datasets).