/toxic-comment-classification

A machine learning multi-label classification model to identify various types of toxic comments posted on social networking sites. Used and compared different machine learning algorithms such as SVM, KNN, XGBoost, LSTM, and NLP using TF-IDF, Glove. Achieved an average accuracy of 90%.

Primary LanguageJupyter Notebook

CS271-Toxic Comment Classification

Toxic Comment Clasiification

All the files only contains the output of each part of the project.

MLModels.ipynb contains the implementation of the Machine Learning models: a) SVC b) KNN c) XGBoost d) MultinomialNB e) LSTM

DataVisualization.ipynb file contains the output of the data visualization techniques used.

The files would not run in on their own and would require the dependent files. Please download the required files at the following link for the full execution of the models. Also dataset is available here : https://drive.google.com/open?id=1ThhMXAC-NlUmro--TyJElSSvAcZYqjdt

Note: While running LSTM please create a folder saved_models for save the trained LSTM model.

The code is dependent on following packages: numpy pandas sklearn gensim nltk tensorflow matplotlib seaborn imblearn XGBoost cython