cleanlab/examples

Simpler or current model I should use to predict probabilities?

Closed this issue · 1 comments

Thanks for publishing such a great project for finding data issues. After reviewing some of the examples, I would like to hear your guidance for the following situation:

How to find human annotators' error labels during active learning to fine-tune a sentence transformer model for a text classification task.
Should I use a simpler model, i.e. a logistic regression model, to generate the probabilities for confident learning, or should I use the current fine-tuned sentence transformer to do the job? Will this make a big difference?

I'd recommend the current fine-tuned sentence transformer