THUDM/ChatGLM-6B

[BUG/Help] <title>复现ptuning微调时出现RuntimeError: "bernoulli_scalar_cpu_" not implemented for 'Half'

ysqfirmament opened this issue · 3 comments

Is there an existing issue for this?

  • I have searched the existing issues

Current Behavior

进行微调的时候,尝试复现ADGEN数据集任务,在运行bash train.sh过程中出现此错误

执行

import torch
print(torch.cuda.is_available())

得到的结果为True

C:\Users\firmament\AppData\Roaming\Python\Python310\site-packages\transformers\optimization.py:391: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning
  warnings.warn(
input_ids [5, 65421, 61, 67329, 32, 98339, 61, 72043, 32, 65347, 61, 70872, 32, 69768, 61, 68944, 32, 67329, 64103, 61, 96914, 130001, 130004, 5, 87052, 96914, 81471, 64562, 65759, 64493, 64988, 6, 65840, 65388, 74531, 63825, 75786, 64009, 63823, 65626, 63882, 64619, 65388, 6, 64480, 65604, 85646, 110945, 10, 64089, 65966, 87052, 67329, 65544, 6, 71964, 70533, 64417, 63862, 89978, 63991, 63823, 77284, 88473, 64219, 63848, 112012, 6, 71231, 65099, 71252, 66800, 85768, 64566, 64338, 100323, 75469, 63823, 117317, 64218, 64257, 64051, 74197, 6, 63893, 130005, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3]
inputs ▒▒▒▒#▒▒*▒▒▒▒#▒▒▒▒*▒▒▒#▒Ը▒*ͼ▒▒#▒▒▒▒*▒▒▒▒#▒▒▒ȿ▒ ▒▒▒ɵ▒▒▒▒ȿ▒▒▒▒▒▒▒▒▒▒▒▒▒۲▒▒▒,▒▒▒▒ʱ▒д▒▒˵▒▒▒ͷ▒▒▒▒▒Ͼ▒▒ô▒ʱ▒▒,˭▒▒▒ܴ▒▒▒▒ȳ▒2▒׵▒Ч▒▒▒▒▒ɵĿ▒▒▒,▒▒Ȼ▒▒▒▒▒▒С▒▒▒ְ▒▒▒▒▒▒▒▒▒▒▒▒▒Ȼ▒▒▒▒▒▒,▒▒▒▒▒׷▒▒▒▒▒▒▒▒▒▒▒▒▒а▒▒▒▒ա▒ϵ▒▒▒▒▒▒▒▒▒▒▒▒ƿ▒▒▒,▒▒
label_ids [-100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, 130004, 5, 87052, 96914, 81471, 64562, 65759, 64493, 64988, 6, 65840, 65388, 74531, 63825, 75786, 64009, 63823, 65626, 63882, 64619, 65388, 6, 64480, 65604, 85646, 110945, 10, 64089, 65966, 87052, 67329, 65544, 6, 71964, 70533, 64417, 63862, 89978, 63991, 63823, 77284, 88473, 64219, 63848, 112012, 6, 71231, 65099, 71252, 66800, 85768, 64566, 64338, 100323, 75469, 63823, 117317, 64218, 64257, 64051, 74197, 6, 63893, 130005, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100]
labels <image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100> ▒▒▒ɵ▒▒▒▒ȿ▒▒▒▒▒▒▒▒▒▒▒▒▒۲▒▒▒,▒▒▒▒ʱ▒д▒▒˵▒▒▒ͷ▒▒▒▒▒Ͼ▒▒ô▒ʱ▒▒,˭▒▒▒ܴ▒▒▒▒ȳ▒2▒׵▒Ч▒▒▒▒▒ɵĿ▒▒▒,▒▒Ȼ▒▒▒▒▒▒С▒▒▒ְ▒▒▒▒▒▒▒▒▒▒▒▒▒Ȼ▒▒▒▒▒▒,▒▒▒▒▒׷▒▒▒▒▒▒▒▒▒▒▒▒▒а▒▒▒▒ա▒ϵ▒▒▒▒▒▒▒▒▒▒▒▒ƿ▒▒▒,▒▒<image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100>
  0%|          | 0/3000 [00:00<?, ?it/s]03/23/2024 23:23:53 - WARNING - transformers_modules.chatglm-6b-int4.modeling_chatglm - `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
Traceback (most recent call last):
  File "D:\GLM\ChatGLM-6B-main\ptuning\main.py", line 430, in <module>
    main()
  File "D:\GLM\ChatGLM-6B-main\ptuning\main.py", line 369, in main
    train_result = trainer.train(resume_from_checkpoint=checkpoint)
  File "D:\GLM\ChatGLM-6B-main\ptuning\trainer.py", line 1635, in train
    return inner_training_loop(
  File "D:\GLM\ChatGLM-6B-main\ptuning\trainer.py", line 1904, in _inner_training_loop
    tr_loss_step = self.training_step(model, inputs)
  File "D:\GLM\ChatGLM-6B-main\ptuning\trainer.py", line 2647, in training_step
    loss = self.compute_loss(model, inputs)
  File "D:\GLM\ChatGLM-6B-main\ptuning\trainer.py", line 2679, in compute_loss
    outputs = model(**inputs)
  File "C:\Users\firmament\AppData\Roaming\Python\Python310\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "C:\Users\firmament\AppData\Roaming\Python\Python310\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\Users\firmament/.cache\huggingface\modules\transformers_modules\chatglm-6b-int4\modeling_chatglm.py", line 1190, in forward
    transformer_outputs = self.transformer(
  File "C:\Users\firmament\AppData\Roaming\Python\Python310\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "C:\Users\firmament\AppData\Roaming\Python\Python310\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\Users\firmament/.cache\huggingface\modules\transformers_modules\chatglm-6b-int4\modeling_chatglm.py", line 930, in forward
    past_key_values = self.get_prompt(batch_size=input_ids.shape[0], device=input_ids.device,
  File "C:\Users\firmament/.cache\huggingface\modules\transformers_modules\chatglm-6b-int4\modeling_chatglm.py", line 878, in get_prompt
    past_key_values = self.dropout(past_key_values)
  File "C:\Users\firmament\AppData\Roaming\Python\Python310\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "C:\Users\firmament\AppData\Roaming\Python\Python310\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\Users\firmament\AppData\Roaming\Python\Python310\site-packages\torch\nn\modules\dropout.py", line 58, in forward
    return F.dropout(input, self.p, self.training, self.inplace)
  File "C:\Users\firmament\AppData\Roaming\Python\Python310\site-packages\torch\nn\functional.py", line 1266, in dropout
    return _VF.dropout_(input, p, training) if inplace else _VF.dropout(input, p, training)
RuntimeError: "bernoulli_scalar_cpu_" not implemented for 'Half'
  0%|          | 0/3000 [00:00<?, ?it/s]

Expected Behavior

No response

Steps To Reproduce

将ADGEN数据集文件夹放入ptuning文件夹
在ptuning文件夹运行bash trains.sh
出现错误

Environment

- OS: windows11
- Python:3.10
- Transformers: 4.27.1
- PyTorch: 2.2.1+cu121
- CUDA Support (`python -c "import torch; print(torch.cuda.is_available())"`) : True

Anything else?

No response

是不是我的电脑跑不动?

是不是我的电脑跑不动?

我觉得你应该先讲一下你显卡的型号显存 同时查一下自己的显卡是不是支持模型量化(我记得在根目录的readme有提示)

默认配置是量化到int4的 显存需求很低 而且你提示也不是oom 应该可以排除爆显存的可能(至少这一步报错的时候还不是)

我有个建议是你去把量化的参数改成fp16的(直接删掉也行) 不量化模型只是显存占用大些 速度能快好多 一是因为加载过程不用量化 二是fp16训练推理最快(我的测试中训练时间fp16<<int4<int8)

顺便一提 我的配置是4张tesla t4 16g显存 能跑所有p-tuning但是全量微调会爆显存 软件版本是

- Python:3.9.19
- Transformers: 4.27.1
- PyTorch: 1.3.1+cu116
- CUDA: 11.6

因为服务器没办法更新 另一个微调的环境需要transformers>=4.30 我还花了很久解决依赖问题依赖地狱 所以对依赖版本印象特别深

实在不行你可以试试和我的配置保持一致 管他那么多先跑通再说

顺便我是Linux跑的 要不你也试试找个服务器

看看你用的代码是不是最新的 这个报错应该是说有个标量不能用半精度实现 如果最新的代码还是报同样的错误 你可以试试把报错的代码中half()这种半精度量化的过程修改去除
如果你修改了代码 需要的显存大概会提升 而且量化到int的操作可能也会跟着变化 所以不推荐改你理解做什么的代码 也不推荐修改代码之后再进行int量化了 #462