Celebrity Faceoff: a Text Classification Challenge

Benchmarking text classification with keras and ruimtehol packages on celebrity twitter data.

The underlying dataset is 9,000 tweets, collected on 2019-02-02 via rtweet. It consists of 1,500 tweets each from

Hadley Wickham,
Wes McKinney,
François Chollet,
Kim Kardashian,
Kourtney Kardashian,
Khloe Kardashian

From each account 1,200 tweets (80%) are included in training dataset and 300 (20%) in verification set.

At a first glance the fancy Keras model performs better than the StarSpace one.

I do not feel though that such a simple comparison gives the models justice, as the StarSpace model was set up in a breeze and took about a minute to train. The effort and resources necessary to come up with the Keras solution were much, much higher.

So my provisional verdict on ruimtehol / StarSpace approach is that it gives an adequate result at the fraction of the effort & resources required to set up a fancy solution.

jlacko/celebrity-faceoff

Celebrity Faceoff: a Text Classification Challenge