/naive_bayes_retweet

A (set of) Python scripts to retrieve tweets (for a given query) and RT one with highest score from Naïve Bayes classifier.

Primary LanguagePython

README

The purpose of this set of the scripts is to:

  • Retrieve a set of the tweets from Twitter, given a query and target language
  • Detects the most-likely-intersting tweet based on the Naive-Bayes classifier

One still needs to create training data, which can (and should) be accomplished by manually classifying an already obtained tweets.

This git repo contains following files:

  • data.yaml: trained data is stored.
  • naive_bayes_retweet.py: given the data.yaml and a file with list of the tweets in question, it then takes the tweets with the highest naive-bayes score and retweets that tweet on the Twitter account, accositate to the API keys in twitter_conf.yaml.
  • requirements: required packages are listed.
  • search_tweets.py: Twitter search results would be returned given a query and target language.
  • train.py: based on the true and false sets of the tweets, it conducts n-gram tokenization and stores the results to data.yaml.
  • twitter_conf.yaml: Twitter-API keys (needed to run the script) are stored.