Spam detection studied on the Twitter dataset by building 3 case studies:
- Case 1 - Using all numerical features
- Case 2 - Selecting top 7 features by using SelectKBest package from SKlearn
- Case 3 - Performing PCA and selecting the features explaining 95% variance
Grid search CV is employed to find the optimal hyperparameters for the following classification algorithms -
- Naive Bayes
- KNN
- SVM
- Decision Tree
- Random Forest
- Multi Layer Perceptron