Question on reproducing (Please help me delete this post, posted here by mistake)

Question

Question on reproducing (Please help me delete this post, posted here by mistake)

zluw1117 opened this issue 3 years ago · 2 comments

Hi, @tscholak ,I'm trying to reproduce your CoSQL model trained based on t5.1.1.lm100k.large.

I trained the CoSQL model with a p3.16xlarge EC2 instance without using db_content (using 8 GPU, mini batch size per device = 1, gradient accumulation steps = 250, so that my batch_size is 2000). Given the model saved on /home/ubuntu/code/src/t5-v1_1-large is downloaded from gs://t5-data/pretrained_models/t5.1.1.lm100k.large, here is the config used in my model training:

{
    "run_name": "t5-cosql",
    "model_name_or_path": "/home/ubuntu/code/src/t5-v1_1-large",
    "dataset": "cosql+spider",
    "source_prefix": "",
    "schema_serialization_type": "peteshaw",
    "schema_serialization_randomized": false,
    "schema_serialization_with_db_id": true,
    "schema_serialization_with_db_content": false,
    "normalize_query": true,
    "target_with_db_id": true,
    "output_dir": "/home/ubuntu/code/src/code_train",
    "cache_dir": "/home/ubuntu/code/src/code_transformers_cache",
    "do_train": true,
    "do_eval": true,
    "fp16": false,
    "num_train_epochs": 250,
    "per_device_train_batch_size": 1,
    "per_device_eval_batch_size": 1,
    "gradient_accumulation_steps": 250,
    "label_smoothing_factor": 0.0,
    "learning_rate": 1e-4,
    "adafactor": true,
    "adam_eps": 1e-6,
    "lr_scheduler_type": "constant",
    "warmup_ratio": 0.0,
    "warmup_steps": 0,
    "seed": 1,
    "report_to": ["wandb"],
    "logging_strategy": "steps",
    "logging_first_step": true,
    "logging_steps": 4,
    "load_best_model_at_end": true,
    "metric_for_best_model": "exact_match",
    "greater_is_better": true,
    "save_total_limit": 64,
    "save_steps": 64,
    "evaluation_strategy": "steps",
    "eval_steps":64,
    "predict_with_generate": true,
    "num_beams": 1,
    "num_beam_groups": 1,
    "use_picard": false
}

For your CoSQL model (https://huggingface.co/tscholak/2jrayxos) and my model, I run evaluation on eval docker image with Picard enabled. Here's what I got:

Your model achieved

eval_exact_match = 0.5433 
eval_exec = 0.6324

while my model only obtained

eval_exact_match = 0.5069
eval_exec = 0.5935

For both metrics, I am 4 percentage points away from your model performance. That seems like a big difference.
Does the config look good to you? Any tips on training t5.1.1.lm100k.large based models? Is there anything I miss for this reproducing experiment? Thank you.

Answer 1 · 2022-01-03T23:51:32.000Z

Hi, Did you mean to open this issue in the PICARD repository? Putting it here is a bit odd.
Your config looks fine. You won't get the same performance as I got without db content, though. Furthermore, you want to turn on PICARD constrained inference for maximum accuracy.
Torsten

Answer 2 · 2022-01-04T01:21:43.000Z

Thank you for your quick reply, Torsten.
Yeah, you are right. I posted the issue to wrong repository. Let me copy the issue to PICARD repository.

Can anyone help me delete this issue because I posted it here by mistake? Sorry.