News-Article-Classification

Introduction

This project is based on a supervised machine learning text classification model which would be able to predict the category of a given news article from the predefined set of categories. It uses a labelled dataset helping the algorithm to learn the patterns and correlations in the data. Data cleaning is used to ensure no distortions to the model. TF-IDF vectorizer is implemented using uni-grams and bi-grams as features, text formatted dataset is converted to numeric form which is then used to analyze the category of the given news articles. Further, using the Multinomial Naive Bayes, as a machine learning classification model to predict the class of the test data and tuning the hyper-parameter, good amount of accuracy is obtained.

Results

Unigrams Frequency Distribution

Bigrams Frequency Distribution

Training and Cross Validation Accuracy

References

[1] Patra, Anuradha, and Divakar Singh. ”A survey report on text classifi-cation with different term weighing methods and comparison betweenclassification algorithms.” International Journal of Computer Applica-tions 75.7 (2013).

[2] Vijayan, Vikas K., K. R. Bindu, and Latha Parameswaran. ”A compre-hensive study of text classification algorithms.” 2017 International Con-ference on Advances in Computing, Communications and Informatics(ICACCI). IEEE, 2017.

[3] McNamee, P., Mayfield, J. Character N-Gram Tokenization for EuropeanLanguage Text Retrieval. Information Retrieval 7, 73–97 (2004).

[4] Domingos, Pedro, and Michael Pazzani. ”On the optimality of the simpleBayesian classifier under zero-one loss.” Machine learning 29.2 (1997):103-130.

[5] Aggarwal, Charu C., and ChengXiang Zhai. ”A survey of text clas-sification algorithms.” Mining text data. Springer, Boston, MA, 2012.163-222.

[6] Singh, Gurinder, et al. ”Comparison between multinomial and BernoulliNaive Bayes for text classification.” 2019 International Conference onAutomation, Computational and Technology Management (ICACTM).IEEE, 2019.

[7] Aghila, G. ”A Survey of Na ̈ıve Bayes Machine Learning approach inText Document Classification.” arXiv preprint arXiv:1003.1795 (2010).

[8] McCallum, Andrew, and Kamal Nigam. ”A comparison of event modelsfor naive bayes text classification.” AAAI-98 workshop on learning fortext categorization. Vol. 752. No. 1. 1998.

[9] Rennie, Jason D., et al. ”Tackling the poor assumptions of naive bayestext classifiers.” Proceedings of the 20th international conference onmachine learning (ICML-03). 2003.

[10] Zhai, Chengxiang, and John Lafferty. ”A study of smoothing methodsfor language models applied to information retrieval.” ACM Transactionson Information Systems (TOIS) 22.2 (2004): 179-214.

[11] Shimodaira, Hiroshi. ”Text classification using naive bayes.” Learningand Data Note 7 (2014): 1-9.

[12] He, Feng, and Xiaoqing Ding. ”Improving naive bayes text classifierusing smoothing methods.” European Conference on Information Re-trieval. Springer, Berlin, Heidelberg, 2007.

[13] Indriani, Fatma, and Dodon T. Nugrahadi. ”Comparison of Naive Bayessmoothing methods for Twitter sentiment analysis.” 2016 InternationalConference on Advanced Computer Science and Information Systems(ICACSIS). IEEE, 2016.

[14] BritishBroadcastingCorporation(BBC).“Insight-BBCDatasets.”InsightResources-BBCDatasets,BBC,2006,mlg.ucd.ie/datasets/bbc.html.

[15] Kibriya, Ashraf M., et al. ”Multinomial naive bayes for text categoriza-tion revisited.” Australasian Joint Conference on Artificial Intelligence.Springer, Berlin, Heidelberg, 2004.

[16] “Text Classification: The First Step Toward NLP Mastery.” Medium, 12Sept. 2020, medium.com/data-from-the-trenches/text-classification-the-first-step-toward-nlp-mastery-f5f95d525d73.

vimarsh-soni/CSE523-Machine-Learning-Classifiers

News-Article-Classification

Introduction

Results

References