Nepali News Classifier & Recommender

The process of categorizing Newspapers to certain preset categories is known as the classification of news. With the ever-increasing news items in the Nepalese context, the classification of the news items has become an urgency for news readers every day by the thousands of online news portals. This mini project is targeted to research about different machine learning algorithms and to determine how well they classify Nepali news. Here, datasets of news from different online portals are used, which is divided into 20 categories. Using this dataset, Support Vector Machine, Naive Bayes and Random Forest Classifier are used to compare their performance based on accuracy. The best performing model was SVM, which has an accuracy of 73% and the worst performing model was Random Forest Classifier with an accuracy of 57%. The system that recommends similar news to the one which the user prefers is called the news recommendation system. A recommendation of news was achieved using content based filtering based on the Vector Space Model using cosine similarity. This project aims to showcase both the classification and the recommendation systems applied to the Nepalese news dataset and to enhance the usability of these models by the implementation of a simple web interface.