- Given Above
- The dataset contains News, Title, Text, and Label as the attributes.
- The pre-processing was done by removing stopwords, applying tokenization and lemmatization.
- The output labels are classified as Fake and Real. For training the model, it was converted to the numerical form.
- Then vectorization is applied to map words or phrases from vocabulary to a corresponding vector of real numbers which is used to find word predictions, word similarities/semantics. The vectorizer objects provided by Scikit-Learn are used.
- I fitted two Machine Learning models to the data namely, Naive-Bayes and Passive-Aggressive Classifier. And concluded that the passive-aggressive classifier performed the better here.