Let's create a classifier that returns the probability for a news to be fake or not.
The classifier use news from two main stream media, that provide a open API to fetch news from them: NYT and The Guardian
With this method you are going to achieve .995% Accuracy base on a Kaggle competition for fake news hosted this year(2018), which would give you the first place.
- Python3.6+, MongoDB and some libs(pip install -r requirements.txt)
- Kaggle Fake news competition data https://www.kaggle.com/c/fake-news/data (Train and Test)
- Kaggle Fake News data https://www.kaggle.com/mrisdal/fake-news
- NYT API Key: https://developer.nytimes.com/signup
- The Guardian API Key: https://open-platform.theguardian.com/access/
- Create your APi Keys and save it at /config folders (not included)
- Download the datasets from kaggle and save it ata /data/kaggle folder
- Install and Create a MongoDB database, fake_news, and 2 collections tg_articles, nyt_articles
- run the nyt_spyder and the the_guardian_spyder
- run the clean_news_data to transform the data from NYT and The Guardian into a better format for python
- run the dataset_builder, so we merge with Kaggle datasets from https://www.kaggle.com/c/fake-news/data
- run the training and generate the models
- run the flask api server
- send a post request with the news title, news text and author to the endpoint http://cs410.canivel.com/api/isfakenews
- A Json with the probability of fake will return if you should look for further news on the matter
- To run the simple React client just run npm start on the fake-news-client
To know more about each project, just go to one of the folders, and a simple documentation is in place for each of them.
You can test the Client or Api direct from the Swagger docs