What are the best query strategies to use as a baseline approach?
renebidart opened this issue · 2 comments
I'm not sure where to start to get a good baseline result with active learning for text classification.
What query strategies should be attempted first? Is there something like this survey https://arxiv.org/abs/2203.13450 implemented for text classification?
Hi @renebidart,
I am not aware of any comprehensive benchmark of this kind. An exhaustive benchmark is unlikely to exist (unless from one of the larger well-known organization) because active learning experiments can very quickly become computationally expensive since there are a lot of combinations for such a large benchmark.
I would advise to try uncertainty-based methods first (such as BreakingTies). They are computationally cheap and usually provide a strong baseline:
https://aclanthology.org/2022.findings-acl.172.pdf
Thanks for the quick reply @chschroeder!
And that's a great paper, I'll try out that method.