Natural-Language-Processing-Preprocessing: A Jupyter Notebook repository from FOUZANKHAN

Performed Text Cleaning and Feature Extraction using Techniques such as Bag of Words, Ngrams and Tf-Idf The Techniques were Compared on a Logistic Regression and Support Vector Machines to understand accuracy and precision

Method Logistic Regression SVM BOW 65.45%(1/5th of dataset), 66.54%(full dataset) 57.57%, 42.15% n-gram 65.67%(1/5th dataset), 66.65%(full dataset) 24.43%, 27.37% TF-IDF 65.48%(1/5th dataset), 66.65%(full dataset) 56.37%, 41.39%

FOUZANKHAN/Natural-Language-Processing-Preprocessing