PS: text here means search query from the input of users.
Pretrain bert with text-classification task and traditional masked word prediction task. In this way, text embedding with knowledge level augmentation can be derived from the specificly designed BERT.
Exactly two different pretrain tasks are designed which are parallel task and unified task. The only differnce is the usage of the output bert embeddings. The finetune preprocess is shown as follows.
Through multiple comparisons, Unified Text encoding from BERT (UTEB) is better than Parallel Text encoding from BERT(PTEB). Besides UTEB shows faster convergence and higher accuracy than TextCNN.
Tips:
- Set different learning rate for the params of BERT and other params. This makes a big difference.
- The original designed optimizer needs more specificly-finetuned hyper parameters. It's better to replace it with normal Adam.
- BERT is not so workful for short text.