Question about pretraining on language model

Question

Question about pretraining on language model

Closed this issue 2 years ago · 5 comments

Hi, thank you for your nice work.
When I try to pretrain the language model, I have a problem like this:

Here is my yaml of pretrain-language config, I only changed the epoch-related values.

global:
  name: my-pretrain-language
  phase: train
  stage: pretrain-language
  workdir: results
  seed: ~

dataset:
  train: {
    roots: ['data/WikiText-103.csv'],
    batch_size: 1024
  }
  test: {
    roots: ['data/WikiText-103_eval_d1.csv'],
    batch_size: 1024
  }
  valid: {
    roots: [ 'data/validation' ],
    batch_size: 384
  }

training:
  epochs: 80
  show_iters: 50
  eval_iters: 100
  save_iters: 3000

optimizer:
  type: Adam
  true_wd: False
  wd: 0.0
  bn_wd: False
  clip_grad: 20
  lr: 0.0001
  args: {
    betas: !!python/tuple [0.9, 0.999], # for default Adam
  }
  scheduler: {
    periods: [70, 10],
    gamma: 0.1,
  }

model:
  name: 'modules.model_language.BCNLanguage'
  language: {
    num_layers: 4,
    loss_weight: 1.,
    use_self_attn: False
  }

May I ask if you have encountered any relevant situation？
Thank you!

Answer 1 · 2023-02-20T14:09:44.000Z

Also tried to use the default yaml pretrain_language_model.yaml, but I got the same error.

Answer 2 · 2023-02-23T05:38:49.000Z

Hi, I ran with your yaml file and default yaml file, and both are working.
My run script is:

python main.py --config=configs/pretrain_language_model.yaml

Could you check it again, and give more information of your run environment?

Answer 3 · 2023-02-23T06:01:52.000Z

Hi, I ran with your yaml file and default yaml file, and both are working. My run script is:
python main.py --config=configs/pretrain_language_model.yaml
Could you check it again, and give more information of your run environment?

Hi, thank you for your answer. I ran with the default yaml file, and just changed the batch_size and eval_iters.
The same error occured after first eval_iter.

Here is the default yaml file

global:
  name: pretrain-language-model
  phase: train
  stage: pretrain-language
  workdir: results
  seed: ~
 
dataset:
  train: {
    roots: ['data/WikiText-103.csv'],
    batch_size: 1024
  }
  test: {
    roots: ['data/WikiText-103_eval_d1.csv'],
    batch_size: 1024
  }
  valid: {
    roots: [ 'data/validation' ],
    batch_size: 384
  }

training:
  epochs: 80
  show_iters: 50
  eval_iters: 100
  save_iters: 3000

optimizer:
  type: Adam
  true_wd: False
  wd: 0.0
  bn_wd: False
  clip_grad: 20
  lr: 0.0001
  args: {
    betas: !!python/tuple [0.9, 0.999], # for default Adam 
  }
  scheduler: {
    periods: [70, 10],
    gamma: 0.1,
  }

model:
  name: 'modules.model_language.BCNLanguage'
  language: {
    num_layers: 4,
    loss_weight: 1.,
    use_self_attn: False
  }

I am confused about the error. Maybe it is the wrong version of some packages.
I ran it on 1 2080Ti, and some primary package version are as followed:

torch=1.7.1
torchversion=0.8.2
Pillow=8.3.2
opencv-python=4.6.0.66

Would you mind sharing your env which I could compare with it?
Sorry for the trouble, thanks!

Answer 4 · 2023-02-23T07:02:00.000Z

Oh, I see. The problem occurs because of validation dataset.
We now fix this problem (efd29dd) (we only evaluate test dataset (not validation dataset) when pretraining language model) and the code is now working!

Thank you for your letting me know the error.

Answer 5 · 2023-02-23T08:18:29.000Z

It works after updating the code! I will close this issue.
Thanks for your work, have a nice day!