webis-de/small-text

What are the best query strategies to use as a baseline approach?

renebidart opened this issue · 2 comments

I'm not sure where to start to get a good baseline result with active learning for text classification.
What query strategies should be attempted first? Is there something like this survey https://arxiv.org/abs/2203.13450 implemented for text classification?

Hi @renebidart,
I am not aware of any comprehensive benchmark of this kind. An exhaustive benchmark is unlikely to exist (unless from one of the larger well-known organization) because active learning experiments can very quickly become computationally expensive since there are a lot of combinations for such a large benchmark.

I would advise to try uncertainty-based methods first (such as BreakingTies). They are computationally cheap and usually provide a strong baseline:
https://aclanthology.org/2022.findings-acl.172.pdf

Thanks for the quick reply @chschroeder!
And that's a great paper, I'll try out that method.