news-classification

There's so much fake news spread on the internet. In order to believe what's the truth or false in the news we just can't figure it out at the instance.
News Classification helps in this matter. The News classification is done with the help of NLP Natural Language Processing.

NLTK library

The NLTK library is the sub field Artificial Intelligence which deals with the understanding of human language ans interpreting the language to the Machine Understandable language.It contains various functions that are imported along with the library,which analyzes the unstructured data analyzes,preprocesses and predicts the information with it.

The various Data Preprocessing methods are:

-Tokenization -Frequency Distribution of Words -Filtering Stop Words -Stemming -Lemmatization -Parts of Speech(POS) Tagging -Name Entity Recognition

Scikit-learn(Sklearn)

scikit learn library is most useful to handle the data.It provides the users to use certain statistical modelling techniques like classification, regression, clustering and dimensionality reduction via a consistence interfacce in python.

I have executed the project in the following way:

TOKENISZATION

Tokenization i.e a Token the data or the sentence is split into words.It basically converts the complex data is split into smaller parts of the data(sentence into words) in this context.It is mandatory step to perform in NLP as it helps in further process like preprocessing, analyzing and predicting.

SNOWBALL STEMMER-STEMMING

Stemming is the process done in order to simplify the word, remove the suffixes or prefixes if any in the word so as to remain with the root word.

SPLITING OF DATA

Spliting of the data in this process we split the data into train and test data.x_train,x-test,y-train ,y-test.This can be further used into testinf training of the model.

VECTORIZATION

Vectorization is done to correspond every word to vectors of real numbers.It is to simplify the process of Machine Understandable language.

LOGISTIC REGRESSION

Logistic Regression is done to define the data and classify when the data is categorical data.In this case we're predicting whether the news is True(1) or Fake(0)

PASSIVE AGGRESSIVE CLASSIFIER

The Passive Aggressive Classifier is an alogrithm that remains passive for correct predictions and stays Aggressive for incorrect predictions.

sharingann/news-classification