xlang-ai/UnifiedSKG

How can I train the multi-task models?

jifan-chen opened this issue ยท 6 comments

Hi, thanks for the great project and I am quite interested in it.

I briefly checked the code repo and the training process but I didn't find the right configuration to train the unified (multi-task) model. Any pointers or suggestions for this?

Hi,

The multi-task learning is done in two steps. In the first step, we trained a prefix-tuning model on all tasks, using the configuration named T5_large_prefix_all_tasks_2upsample.cfg. In the second step, we loaded the weight and fine-tuned a separate prefix for each task. For example, to run the second step for CoSQL:

# T5-large prefix for CoSQL with cell value
export WANDB_API_KEY=b76acacd02402f5a4521eadcd1192d9c706cc7e2
export WANDB_PROJECT=structured-knowledge-grounding
export CUDA_VISIBLE_DEVICES=4,5
export RUN_NAME=T5_large_prefix_cosql_with_cell_value
export SEED=2
nohup python -m torch.distributed.launch --nproc_per_node 2 --master_port 1352 train.py --seed $SEED --cfg Salesforce/$RUN_NAME.cfg --run_name from_all_$RUN_NAME$SEED --logging_strategy steps --logging_first_step true --logging_steps 4 --evaluation_strategy steps --eval_steps 500 --metric_for_best_model avr --greater_is_better true --save_strategy steps --save_steps 500 --save_total_limit 1 --load_best_model_at_end --gradient_accumulation_steps 4 --num_train_epochs 1000 --adafactor true --learning_rate 5e-5 --do_train --do_eval --do_predict --predict_with_generate --output_dir output/from_all_$RUN_NAME$SEED --overwrite_output_dir --per_device_train_batch_size 4 --per_device_eval_batch_size 8 --generation_num_beams 1 --generation_max_length 128 --input_max_length 576 --ddp_find_unused_parameters true --load_weights_from output/T5_large_prefix_all_tasks_2upsample2/checkpoint-220000 > $RUN_NAME$SEED.log 2>&1 &

You may want to change the --load_weights_from argument to the path of your model checkpoint trained in the first step.

Hi,

You can also check the info we provided in issue #14 .
Hope these information above helpful!

Thanks!

Thanks! I'll give it a shot and let you know if I have any further issues.

Feel free to re-open it if you have further issues.

Hi, after playing around the code for sometime, I've got another two questions:

  1. I noticed that in https://github.com/HKUNLP/UnifiedSKG/blob/main/utils/processor/table_truncate.py#L30 there is an argument of answers but it was not used in the seq2seq_construction, e.g., https://github.com/HKUNLP/UnifiedSKG/blob/main/seq2seq_construction/fetaqa.py#L72. Is this expected? Or you have tried it and found it didn't help?
  2. What's the best upsampling temperature for the multi-task fine-tuning and prefix-tuning respectively from your experience?

Hi,

Thanks for your question!

1, The function you mentioned is just a strategy when doing table truncation when the table is too large to fit into a language model, and the "answer" is used in training to avoid miss deleting the relevant rows and cause avoidable noise during training. In some cases we used that and some not, to our experience it won't affect a lot, just keep in mind we CANNOT use that in dev or test, then everything will be fine to try.
2, We tried 1, 2, 5, as upsampling weight, among them, 2 is the best(according to our watch inside like 1w steps).

Hope these information helpful!
Thanks!