/Natural-Language-Processing-Preprocessing

Created a Text Cleaning and Feature extraction pipeline to be used by a Logistic Regression Model and Support Vector Machines

Primary LanguageJupyter Notebook

Performed Text Cleaning and Feature Extraction using Techniques such as Bag of Words, Ngrams and Tf-Idf The Techniques were Compared on a Logistic Regression and Support Vector Machines to understand accuracy and precision

Method Logistic Regression SVM BOW 65.45%(1/5th of dataset), 66.54%(full dataset) 57.57%, 42.15% n-gram 65.67%(1/5th dataset), 66.65%(full dataset) 24.43%, 27.37% TF-IDF 65.48%(1/5th dataset), 66.65%(full dataset) 56.37%, 41.39%