This project is based on a supervised machine learning text classification model which would be able to predict the category of a given news article from the predefined set of categories. It uses a labelled dataset helping the algorithm to learn the patterns and correlations in the data. Data cleaning is used to ensure no distortions to the model. TF-IDF vectorizer is implemented using uni-grams and bi-grams as features, text formatted dataset is converted to numeric form which is then used to analyze the category of the given news articles. Further, using the Multinomial Naive Bayes, as a machine learning classification model to predict the class of the test data and tuning the hyper-parameter, good amount of accuracy is obtained.


