kohjingyu/gill

param.grad is None !

Opened this issue · 1 comments

Hi! Thank you for your great work!

After preparing datasets and pretrained model, I trained the model using this command:

randport=$(shuf -i8000-9999 -n1) # Generate a random port number
python -u main.py
--dist-url "tcp://127.0.0.1:${randport}" --dist-backend 'nccl'
--multiprocessing-distributed --world-size 1 --rank 0 --batch-size=256
--dataset=cc3m --val-dataset=cc3m
--exp-name='gill_exp' --image-dir='/data/vol1/public-datasets/03-CC/cc3m/' --log-base-dir='runs/'
--precision='bf16' --print-freq=100
--opt-version='/data/models/facebook/opt-6.7b' --visual-model='/data/pretrained_weights/openai/clip-vit-large-patch14' --workers=0

No code is modified execpt the data path. However, I got this error:

Traceback (most recent call last):
File "/root/miniconda3/envs/vlm/lib/python3.10/site-packages/torch/multiprocessing/spawn.py", line 74, in _wrap
fn(i, *args)
File "/data/vol1/zky/methods/gill/main.py", line 402, in main_worker
train(train_loader, model, tokenizer, criterion, optimizer, epoch, scheduler, args)
File "/data/vol1/zky/methods/gill/main.py", line 586, in train
assert param.grad.shape[0] == len(tokenizer)
AttributeError: 'NoneType' object has no attribute 'shape'

It seems like the parm in model.module.model.input_embeddings.parameters() has no grad. Could you teach me how to solve this problem? Thank you!

We encounter the same problem as you depicted, do you successfully solve it?