Benchmarking text classification with keras
and ruimtehol
packages on celebrity twitter data.
The underlying dataset is 9,000 tweets, collected on 2019-02-02 via rtweet
. It consists of 1,500 tweets each from
- Hadley Wickham,
- Wes McKinney,
- François Chollet,
- Kim Kardashian,
- Kourtney Kardashian,
- Khloe Kardashian
From each account 1,200 tweets (80%) are included in training dataset and 300 (20%) in verification set.
At a first glance the fancy Keras model performs better than the StarSpace one.
I do not feel though that such a simple comparison gives the models justice, as the StarSpace model was set up in a breeze and took about a minute to train. The effort and resources necessary to come up with the Keras solution were much, much higher.
So my provisional verdict on ruimtehol
/ StarSpace
approach is that it gives an adequate result at the fraction of the effort & resources required to set up a fancy solution.