Question on reproducing (Please help me delete this post, posted here by mistake)
zluw1117 opened this issue · 2 comments
Hi, @tscholak ,I'm trying to reproduce your CoSQL model trained based on t5.1.1.lm100k.large.
I trained the CoSQL model with a p3.16xlarge EC2 instance without using db_content (using 8 GPU, mini batch size per device = 1, gradient accumulation steps = 250, so that my batch_size is 2000). Given the model saved on /home/ubuntu/code/src/t5-v1_1-large is downloaded from gs://t5-data/pretrained_models/t5.1.1.lm100k.large, here is the config used in my model training:
{
"run_name": "t5-cosql",
"model_name_or_path": "/home/ubuntu/code/src/t5-v1_1-large",
"dataset": "cosql+spider",
"source_prefix": "",
"schema_serialization_type": "peteshaw",
"schema_serialization_randomized": false,
"schema_serialization_with_db_id": true,
"schema_serialization_with_db_content": false,
"normalize_query": true,
"target_with_db_id": true,
"output_dir": "/home/ubuntu/code/src/code_train",
"cache_dir": "/home/ubuntu/code/src/code_transformers_cache",
"do_train": true,
"do_eval": true,
"fp16": false,
"num_train_epochs": 250,
"per_device_train_batch_size": 1,
"per_device_eval_batch_size": 1,
"gradient_accumulation_steps": 250,
"label_smoothing_factor": 0.0,
"learning_rate": 1e-4,
"adafactor": true,
"adam_eps": 1e-6,
"lr_scheduler_type": "constant",
"warmup_ratio": 0.0,
"warmup_steps": 0,
"seed": 1,
"report_to": ["wandb"],
"logging_strategy": "steps",
"logging_first_step": true,
"logging_steps": 4,
"load_best_model_at_end": true,
"metric_for_best_model": "exact_match",
"greater_is_better": true,
"save_total_limit": 64,
"save_steps": 64,
"evaluation_strategy": "steps",
"eval_steps":64,
"predict_with_generate": true,
"num_beams": 1,
"num_beam_groups": 1,
"use_picard": false
}
For your CoSQL model (https://huggingface.co/tscholak/2jrayxos) and my model, I run evaluation on eval docker image with Picard enabled. Here's what I got:
Your model achieved
eval_exact_match = 0.5433
eval_exec = 0.6324
while my model only obtained
eval_exact_match = 0.5069
eval_exec = 0.5935
For both metrics, I am 4 percentage points away from your model performance. That seems like a big difference.
Does the config look good to you? Any tips on training t5.1.1.lm100k.large based models? Is there anything I miss for this reproducing experiment? Thank you.
Hi, Did you mean to open this issue in the PICARD repository? Putting it here is a bit odd.
Your config looks fine. You won't get the same performance as I got without db content, though. Furthermore, you want to turn on PICARD constrained inference for maximum accuracy.
Torsten
Thank you for your quick reply, Torsten.
Yeah, you are right. I posted the issue to wrong repository. Let me copy the issue to PICARD repository.
Can anyone help me delete this issue because I posted it here by mistake? Sorry.