/Comment-Verification

DigikalaNext contest

Primary LanguageJupyter Notebook

Comment-Verification

DigikalaNext contest - 2019

Implemented a comment verification system using CountVectorizer, Bayes algorithm and Digikala website's comments dataset

  • Scikit-learn's Countvectorizer :

    Transform a given text into a vector on the basis of the frequency (count) of each word that occurs in the entire text.

  • Bayes theorem

    image

    image

    The “prior” P(A) and the “evidence” P(B) are the probabilities of observing A and B independently in the document, whereas the “posterior” and the “likelihood” are the conditional probabilities of observing A given B and vice versa.

    In this project what we are going to find is this: image

    While x is a feature vector containing the sequence of words in the given comment.

The “Naive” assumption that the Naive Bayes classifier makes is that the probability of observing a word is independent of others. Therefore, the probability of that comment being a spam is the product of seeing each of the words in the comment if a spam comment.

image