Train relevance classifier

Question

Train relevance classifier

Closed this issue 8 months ago · 0 comments

Problem to solve

As scientist, I want have a classifier that predicts the relevance of Tweets so that I can only consider "relevant" Tweets for my rain classification Task.

Further details

Initial implementation done during maelstrom bootcamp (see a2 repo).

"Relevance"/"relevant" Tweets are Tweets that contain sufficient information for a human/AI to determine if it was raining or not raining.

Proposal

Bring relevance dataset into form such that a train and test set can be created for this task
- All Tweets that have a relevance score predicted by falcon should now get the score as a label that is either "relevant" or "not relevant"
  - This depends on the relevance score threshold, we pick 0.5 for our first attempt
Adopt rain classifier to allow for classification of relevance of Tweets
- Use you rain classifier notebook as basis for this. (make a copy)
- Evaluate the performance of your model.
  - Check manually if predictions of model make sense
    - For easy and more difficult (especially misclassified) Tweets
Change relevance score threshold and see how it affects performance

Testing

What does success look like, and how can we measure that?

Trained relevance classifier exists
- performance is estimated
- performance based on relevance score threshold is estimated
weight: 5