rajhaq/AP2-Social-media-data-for-better-local-forecasts

Train relevance classifier

Closed this issue · 0 comments

Problem to solve

As scientist, I want have a classifier that predicts the relevance of Tweets so that I can only consider "relevant" Tweets for my rain classification Task.

Further details

Initial implementation done during maelstrom bootcamp (see a2 repo).

"Relevance"/"relevant" Tweets are Tweets that contain sufficient information for a human/AI to determine if it was raining or not raining.

Proposal

  • Bring relevance dataset into form such that a train and test set can be created for this task
    • All Tweets that have a relevance score predicted by falcon should now get the score as a label that is either "relevant" or "not relevant"
      • This depends on the relevance score threshold, we pick 0.5 for our first attempt
  • Adopt rain classifier to allow for classification of relevance of Tweets
    • Use you rain classifier notebook as basis for this. (make a copy)
    • Evaluate the performance of your model.
      • Check manually if predictions of model make sense
        • For easy and more difficult (especially misclassified) Tweets
  • Change relevance score threshold and see how it affects performance

Testing

What does success look like, and how can we measure that?

  • Trained relevance classifier exists

    • performance is estimated
    • performance based on relevance score threshold is estimated
  • weight: 5