Journey to learning natural language processing.
In times of political turmoil, often the news we see from all sources is not 100% accurate. With different biases and parties releasing their own version of news, or with tabloid news outlets like Buzzfeed, Facebook, etc, we are trying to predict the accuracy of news based on text. This is a process called natural language processing, a machine learning method that essentially teaches the computer to understand words.
Learn natural language processing. Predict the accuracy of news based on keywords/tags from the article title. One method is to differentiate between object/verb in a sentence in the title of the article and a summary of the article.
Fact check the accuracy of news based on keywords/phrases. The third column includes the statements - predict how many could be "fact-checked." Try to break the statement into Subject-Verb-Object tuples and check against the data.
Utilizing the SpaCy python module, NLTK, and Scikit-learn.
- Get familiar with nlp using resources below > feel free to add your own!
- Clean up code and explore the data
- Remove the punctuation and stopwords from the data
- Tokenize the words and split the summaries into tuples
- Remove 0 1 2 3 4 5 column
- Make github branch and clone the repo to your personal computer
- Algorithms to use: Naive Bayes classifier, SVM
https://www.dataquest.io/blog/natural-language-processing-with-python/ https://pythonprogramming.net/naive-bayes-classifier-nltk-tutorial/?completed=/words-as-features-nltk-tutorial/ http://textminingonline.com/dive-into-nltk-part-ii-sentence-tokenize-and-word-tokenize http://victoria.lviv.ua/html/fl5/NaturalLanguageProcessingWithPython.pdf