I don't notice the Task-driven function anywhere.

Question

I don't notice the Task-driven function anywhere.

Closed this issue 2 years ago · 4 comments

Hi yaoxingcheng,

Thanks for releasing the code and paper for your task-driven language modeling approach.

I tried running your code.
But I don't notice the Task-driven functionality anywhere.
Can you explain a little more the difference between having Task-driven and traditional LM training?
thank you very much.

Best,
ChinhH

Answer 1 · 2022-02-09T18:36:16.000Z

Hi ChinhH. In the released code, data selection is decomposed from LM training. You can read the scripts in src/data_selection.py to see how task data is used to retrieve similar data from general corpus (which is one of the key differences between TLM and PLM). Also, in line 113 of the code in src/model.py, you can see the loss of the model is an weighted average of task objective and LM objective. Such multi-task learning scheme is also one of the points that makes a difference.

Answer 2 · 2022-02-16T02:59:21.000Z

thanks for your support.
I have 1 question about data set to train Language Modeling. Is it generated from source.csv+selected.csv ?

Answer 3 · 2022-02-16T06:52:12.000Z

We only use selected data (in your case selected_rake.csv) as the external data for language modeling

Answer 4 · 2022-02-23T06:02:34.000Z

Thanks for your answer