Question about pretraining on language model
Closed this issue · 5 comments
Hi, thank you for your nice work.
When I try to pretrain the language model, I have a problem like this:
Here is my yaml of pretrain-language config, I only changed the epoch-related values.
global:
name: my-pretrain-language
phase: train
stage: pretrain-language
workdir: results
seed: ~
dataset:
train: {
roots: ['data/WikiText-103.csv'],
batch_size: 1024
}
test: {
roots: ['data/WikiText-103_eval_d1.csv'],
batch_size: 1024
}
valid: {
roots: [ 'data/validation' ],
batch_size: 384
}
training:
epochs: 80
show_iters: 50
eval_iters: 100
save_iters: 3000
optimizer:
type: Adam
true_wd: False
wd: 0.0
bn_wd: False
clip_grad: 20
lr: 0.0001
args: {
betas: !!python/tuple [0.9, 0.999], # for default Adam
}
scheduler: {
periods: [70, 10],
gamma: 0.1,
}
model:
name: 'modules.model_language.BCNLanguage'
language: {
num_layers: 4,
loss_weight: 1.,
use_self_attn: False
}
May I ask if you have encountered any relevant situation?
Thank you!
Also tried to use the default yaml pretrain_language_model.yaml
, but I got the same error.
Hi, I ran with your yaml file and default yaml file, and both are working.
My run script is:
python main.py --config=configs/pretrain_language_model.yaml
Could you check it again, and give more information of your run environment?
Hi, I ran with your yaml file and default yaml file, and both are working. My run script is:
python main.py --config=configs/pretrain_language_model.yaml
Could you check it again, and give more information of your run environment?
Hi, thank you for your answer. I ran with the default yaml file, and just changed the batch_size and eval_iters.
The same error occured after first eval_iter.
Here is the default yaml file
global:
name: pretrain-language-model
phase: train
stage: pretrain-language
workdir: results
seed: ~
dataset:
train: {
roots: ['data/WikiText-103.csv'],
batch_size: 1024
}
test: {
roots: ['data/WikiText-103_eval_d1.csv'],
batch_size: 1024
}
valid: {
roots: [ 'data/validation' ],
batch_size: 384
}
training:
epochs: 80
show_iters: 50
eval_iters: 100
save_iters: 3000
optimizer:
type: Adam
true_wd: False
wd: 0.0
bn_wd: False
clip_grad: 20
lr: 0.0001
args: {
betas: !!python/tuple [0.9, 0.999], # for default Adam
}
scheduler: {
periods: [70, 10],
gamma: 0.1,
}
model:
name: 'modules.model_language.BCNLanguage'
language: {
num_layers: 4,
loss_weight: 1.,
use_self_attn: False
}
I am confused about the error. Maybe it is the wrong version of some packages.
I ran it on 1 2080Ti, and some primary package version are as followed:
torch=1.7.1
torchversion=0.8.2
Pillow=8.3.2
opencv-python=4.6.0.66
Would you mind sharing your env which I could compare with it?
Sorry for the trouble, thanks!
Oh, I see. The problem occurs because of validation dataset.
We now fix this problem (efd29dd) (we only evaluate test dataset (not validation dataset) when pretraining language model) and the code is now working!
Thank you for your letting me know the error.
It works after updating the code! I will close this issue.
Thanks for your work, have a nice day!