作者大大，又出现了同样的问题

之前用自己的显卡4090 24G 微调显存不够，在恒源云上租了两块4090 24G的，代码其他的都没变，出现的问题：RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn，这次无论改不改之前您说的注释都没有用

是 lora int8 还是 ptv2 ？

ptv2

moss_finetuning/models/__init__.py

Line 22 in 920493a

self.backbone.enable_input_require_grads()

的确，已fixed

moss_finetuning/models/__init__.py

Line 22 in 920493a

self.backbone.enable_input_require_grads()

的确，已fixed

好的，新代码直接运行吗，那怎么调整训练方式呢

moss_finetuning/models/__init__.py

Line 22 in 920493a

self.backbone.enable_input_require_grads()

的确，已fixed

好的，新代码直接运行吗，那怎么调整训练方式呢

嗯可以跑的，调整训练方式指的是什么？

之前可以修改lora的参数True or False，那现在也要修改吗

之前可以修改lora的参数True or False，那现在也要修改吗

对根据自己训练方式修改~

请问这个在哪里修改呀，我找了但是没找到，能不能给点提示

config/init.py

好滴好滴，太谢谢了

config/init.py

又报了这个错：AttributeError: 'MyMossForCausalLM' object has no attribute 'enable_input_require_grads'

/hy-tmp/moss_finetuning-main/train.py:142 in │
│ │
│ 139 │ │ dataHelper.make_dataset_with_args(data_args.test_file,mode='test') │
│ 140 │ │
│ 141 │ │
│ ❱ 142 │ pl_model = MyTransformer(config=config, model_args=model_args, training_args=trainin │
│ 143 │ │ │ │ │ │ │ load_in_8bit=load_in_8bit, device_map={"": trainer.local_ra │
│ 144 │ if not load_in_8bit: │
│ 145 │ │ pl_model.half() │
│ │
│ /hy-tmp/moss_finetuning-main/models/init.py:22 in init │
│ │
│ 19 │ │ │ self.set_model(model, copy_attr=False) │
│ 20 │ │ elif prompt_args is not None and prompt_args.with_prompt: │
│ 21 │ │ │ # │
│ ❱ 22 │ │ │ self.backbone.enable_input_require_grads() │
│ 23 │ │ │ model: PromptModel = get_prompt_model(self.backbone, prompt_args) │
│ 24 │ │ │ print('*' * 30, 'prompt info') │
│ 25 │ │ │ model.print_trainable_parameters() │
│ │
│ /hy-tmp/moss_finetuning-main/models/moss_model.py:261 in enable_input_require_grads │
│ │
│ 258 │ def enable_input_require_grads(self): │
│ 259 │ │ setattr(self.model, 'model_parallel', True) │
│ 260 │ │ setattr(self.model, 'is_parallelizable', True) │
│ ❱ 261 │ │ self.model.enable_input_require_grads() │
│ 262 │
│ │
│ /usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py:1269 in getattr │
│ │
│ 1266 │ │ │ modules = self.dict['_modules'] │
│ 1267 │ │ │ if name in modules: │
│ 1268 │ │ │ │ return modules[name] │
│ ❱ 1269 │ │ raise AttributeError("'{}' object has no attribute '{}'".format( │
│ 1270 │ │ │ type(self).name, name)) │
│ 1271 │ │
│ 1272 │ def setattr(self, name: str, value: Union[Tensor, 'Module']) -> None:

@ddzz0210

transformers 版本低了吧 pip install -U transformers ，已修改 requirements 最低要求 4.28

哦哦，我用的还是之前您说的4.25
我现在改

@ddzz0210

transformers 版本低了吧 pip install -U transformers ，已修改 requirements 最低要求 4.28

两个4090 24G是不是不行了呀，都说显存不够了

@ddzz0210
transformers 版本低了吧 pip install -U transformers ，已修改 requirements 最低要求 4.28

两个4090 24G是不是不行了呀，都说显存不够了

是的不够，为了测试，你可以直接修改 config.json n_layer 例如 n_layer=14 应该可以跑起来。

@ddzz0210
transformers 版本低了吧 pip install -U transformers ，已修改 requirements 最低要求 4.28

两个4090 24G是不是不行了呀，都说显存不够了

是的不够，为了测试，你可以直接修改 config.json n_layer 例如 n_layer=14 应该可以跑起来。

修改层数之后又报了这个错误：NameError: name 'transpose_matmul_248_kernel' is not defined
并且devices=2

NameError: name 'transpose_matmul_248_kernel' is not defined 这个错误直接找到位置注释掉就可以了，或者拉取deep_training 代码覆盖一下.
计划发布 0.1.5post0 fix这个错误。

好的我试一下

NameError: name 'transpose_matmul_248_kernel' is not defined 这个错误直接找到位置注释掉就可以了，或者拉取deep_training 代码覆盖一下. 计划发布 0.1.5post0 fix这个错误。

麻烦再看一下：
运行infer_ptuning.py时报错：
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

CUDA_LAUNCH_BLOCKING=1 python infer_ptuning.py 发一下错误信息！

CUDA_LAUNCH_BLOCKING=1 python infer_ptuning.py 发一下错误信息！

│ /hy-tmp/moss_finetuning-main/infer_ptuning.py:36 in │
│ │
│ 33 │ │
│ 34 │ model = pl_model.get_llm_model() │
│ 35 │ query = "<|Human|>: 你好\n<|MOSS|>:" │
│ ❱ 36 │ response = model.chat(tokenizer, query, max_length=2048, │
│ 37 │ │ │ │ │ │ eos_token_id=config.eos_token_id, │
│ 38 │ │ │ │ │ │ do_sample=True, top_p=0.7, temperature=0.95, │
│ 39 │ │ │ │ │ │ ) │
│ │
│ /usr/local/lib/python3.8/dist-packages/torch/autograd/grad_mode.py:27 in decorate_context │
│ │
│ 24 │ │ @functools.wraps(func) │
│ 25 │ │ def decorate_context(*args, **kwargs): │
│ 26 │ │ │ with self.clone(): │
│ ❱ 27 │ │ │ │ return func(*args, **kwargs) │
│ 28 │ │ return cast(F, decorate_context) │
│ 29 │ │
│ 30 │ def wrap_generator(self, func): │
│ │
│ /hy-tmp/moss_finetuning-main/models/moss_model.py:104 in chat │
│ │
│ 101 │ │ kwargs.update(self.extra_param.param) │
│ 102 │ │ tokens = tokenizer.batch_encode_plus([self.extra_param.prefix + text], return_te │
│ 103 │ │ input_ids, attention_mask = tokens['input_ids'], tokens['attention_mask'] │
│ ❱ 104 │ │ outputs = self.chat_inner(input_ids, attention_mask,**kwargs) │
│ 105 │ │ preds = tokenizer.batch_decode(outputs) │
│ 106 │ │ res = self.postprocess_remove_prefix(preds[0]) │
│ 107 │ │ return res │
│ │
│ /usr/local/lib/python3.8/dist-packages/torch/autograd/grad_mode.py:27 in decorate_context │
│ │
│ 24 │ │ @functools.wraps(func) │
│ 25 │ │ def decorate_context(*args, **kwargs): │
│ 26 │ │ │ with self.clone(): │
│ ❱ 27 │ │ │ │ return func(*args, **kwargs) │
│ 28 │ │ return cast(F, decorate_context) │
│ 29 │ │
│ 30 │ def wrap_generator(self, func): │
│ │
│ /hy-tmp/moss_finetuning-main/models/moss_model.py:152 in chat_inner │
│ │
│ 149 │ │ │
│ 150 │ │ past_key_values = None │
│ 151 │ │ for i in range(int(max_iterations)): │
│ ❱ 152 │ │ │ logits, past_key_values = self.infer(input_ids if i == 0 else new_generated │
│ 153 │ │ │ │ │ │ │ │ │ │ │ │ past_key_values) │
│ 154 │ │ │ │
│ 155 │ │ │ if i == 0: │
│ │
│ /hy-tmp/moss_finetuning-main/models/moss_model.py:241 in infer │
│ │
│ 238 │ def infer_(self, input_ids, attention_mask, past_key_values): │
│ 239 │ │ inputs = {"input_ids": input_ids, "attention_mask": attention_mask, "past_key_va │
│ 240 │ │ with torch.no_grad(): │
│ ❱ 241 │ │ │ outputs = self.forward(**inputs,return_dict=True) │
│ 242 │ │ return outputs.logits, outputs.past_key_values │
│ 243 │
│ 244 │
│ │
│ /usr/local/lib/python3.8/dist-packages/deep_training/nlp/models/moss/modeling_moss.py:673 in │
│ forward │
│ │
│ 670 │ │ """ │
│ 671 │ │ return_dict = return_dict if return_dict is not None else self.config.use_return │
│ 672 │ │ │
│ ❱ 673 │ │ transformer_outputs = self.transformer( │
│ 674 │ │ │ input_ids, │
│ 675 │ │ │ past_key_values=past_key_values, │
│ 676 │ │ │ attention_mask=attention_mask, │
│ │
│ /usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py:1194 in _call_impl │
│ │
│ 1191 │ │ # this function, and just call forward. │
│ 1192 │ │ if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o │
│ 1193 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks): │
│ ❱ 1194 │ │ │ return forward_call(*input, **kwargs) │
│ 1195 │ │ # Do not call functions when jit is used │
│ 1196 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │
│ 1197 │ │ if self._backward_hooks or _global_backward_hooks: │
│ │
│ /usr/local/lib/python3.8/dist-packages/deep_training/nlp/models/moss/modeling_moss.py:543 in │
│ forward │
│ │
│ 540 │ │ │ │ │ head_mask[i], │
│ 541 │ │ │ │ ) │
│ 542 │ │ │ else: │
│ ❱ 543 │ │ │ │ outputs = block( │
│ 544 │ │ │ │ │ hidden_states=hidden_states, │
│ 545 │ │ │ │ │ layer_past=layer_past, │
│ 546 │ │ │ │ │ attention_mask=attention_mask, │
│ │
│ /usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py:1194 in _call_impl │
│ │
│ 1191 │ │ # this function, and just call forward. │
│ 1192 │ │ if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o │
│ 1193 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks): │
│ ❱ 1194 │ │ │ return forward_call(*input, **kwargs) │
│ 1195 │ │ # Do not call functions when jit is used │
│ 1196 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │
│ 1197 │ │ if self._backward_hooks or _global_backward_hooks: │
│ │
│ /usr/local/lib/python3.8/dist-packages/deep_training/nlp/models/moss/modeling_moss.py:268 in │
│ forward │
│ │
│ 265 │ ) -> Union[Tuple[torch.Tensor], Optional[Tuple[torch.Tensor, Tuple[torch.FloatTensor │
│ 266 │ │ residual = hidden_states │
│ 267 │ │ hidden_states = self.ln_1(hidden_states) │
│ ❱ 268 │ │ attn_outputs = self.attn( │
│ 269 │ │ │ hidden_states=hidden_states, │
│ 270 │ │ │ layer_past=layer_past, │
│ 271 │ │ │ attention_mask=attention_mask, │
│ │
│ /usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py:1194 in _call_impl │
│ │
│ 1191 │ │ # this function, and just call forward. │
│ 1192 │ │ if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o │
│ 1193 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks): │
│ ❱ 1194 │ │ │ return forward_call(*input, **kwargs) │
│ 1195 │ │ # Do not call functions when jit is used │
│ 1196 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │
│ 1197 │ │ if self._backward_hooks or _global_backward_hooks: │
│ │
│ /usr/local/lib/python3.8/dist-packages/deep_training/nlp/models/moss/modeling_moss.py:162 in │
│ forward │
│ │
│ 159 │ │ Tuple[torch.Tensor, Tuple[torch.Tensor]], │
│ 160 │ │ Optional[Tuple[torch.Tensor, Tuple[torch.Tensor], Tuple[torch.Tensor, ...]]], │
│ 161 │ ]: │
│ ❱ 162 │ │ qkv = self.qkv_proj(hidden_states) │
│ 163 │ │ # TODO(enijkamp): factor out number of logical TPU-v4 cores or make forward pass │
│ 164 │ │ mp_num = 4 │
│ 165 │ │ qkv_split = qkv.reshape(qkv.shape[:-1] + (mp_num, -1)) │
│ │
│ /usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py:1194 in _call_impl │
│ │
│ 1191 │ │ # this function, and just call forward. │
│ 1192 │ │ if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o │
│ 1193 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks): │
│ ❱ 1194 │ │ │ return forward_call(input, **kwargs) │
│ 1195 │ │ # Do not call functions when jit is used │
│ 1196 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │
│ 1197 │ │ if self._backward_hooks or _global_backward_hooks: │
│ │
│ /usr/local/lib/python3.8/dist-packages/deep_training/nlp/models/moss/quantization.py:369 in │
│ forward │
│ │
│ 366 │ │
│ 367 │ def forward(self, x): │
│ 368 │ │ out_shape = x.shape[:-1] + (self.outfeatures,) │
│ ❱ 369 │ │ out = QuantLinearFunction.apply(x.reshape(-1, x.shape[-1]), self.qweight, self.s │
│ 370 │ │ │ │ │ │ │ │ │ │ self.qzeros, self.g_idx, self.bits, self.maxq) │
│ 371 │ │ out = out + self.bias if self.bias is not None else out │
│ 372 │ │ return out.reshape(out_shape) │
│ │
│ /usr/local/lib/python3.8/dist-packages/torch/cuda/amp/autocast_mode.py:105 in decorate_fwd │
│ │
│ 102 │ │ │ │ with autocast(enabled=False): │
│ 103 │ │ │ │ │ return fwd(_cast(args, cast_inputs), **_cast(kwargs, cast_inputs)) │
│ 104 │ │ │ else: │
│ ❱ 105 │ │ │ │ return fwd(*args, **kwargs) │
│ 106 │ return decorate_fwd │
│ 107 │
│ 108 │
│ │
│ /usr/local/lib/python3.8/dist-packages/deep_training/nlp/models/moss/quantization.py:281 in │
│ forward │
│ │
│ 278 │ @staticmethod │
│ 279 │ @custom_fwd(cast_inputs=torch.float16) │
│ 280 │ def forward(ctx, input, qweight, scales, qzeros, g_idx, bits, maxq): │
│ ❱ 281 │ │ output = matmul248(input, qweight, scales, qzeros, g_idx, bits, maxq) │
│ 282 │ │ ctx.save_for_backward(qweight, scales, qzeros, g_idx) │
│ 283 │ │ ctx.bits, ctx.maxq = bits, maxq │
│ 284 │ │ return output │
│ │
│ /usr/local/lib/python3.8/dist-packages/deep_training/nlp/models/moss/quantization.py:250 in │
│ matmul248 │
│ │
│ 247 │ output = torch.empty((input.shape[0], qweight.shape[1]), device='cuda', dtype=torch. │
│ 248 │ grid = lambda META: ( │
│ 249 │ triton.cdiv(input.shape[0], META['BLOCK_SIZE_M']) * triton.cdiv(qweight.shape[1], ME │
│ ❱ 250 │ matmul_248_kernel[grid](input, qweight, output, │
│ 251 │ │ │ │ │ │ │ scales, qzeros, g_idx, │
│ 252 │ │ │ │ │ │ │ input.shape[0], qweight.shape[1], input.shape[1], bits, maxq │
│ 253 │ │ │ │ │ │ │ input.stride(0), input.stride(1), │
│ │
│ /usr/local/lib/python3.8/dist-packages/deep_training/nlp/models/moss/custom_autotune.py:89 in │
│ run │
│ │
│ 86 │ │ │ │ # prune configs │
│ 87 │ │ │ │ pruned_configs = self.prune_configs(kwargs) │
│ 88 │ │ │ │ bench_start = time.time() │
│ ❱ 89 │ │ │ │ timings = {config: self._bench(*args, config=config, **kwargs) │
│ 90 │ │ │ │ │ │ │ for config in pruned_configs} │
│ 91 │ │ │ │ bench_end = time.time() │
│ 92 │ │ │ │ self.bench_time = bench_end - bench_start │
│ │
│ /usr/local/lib/python3.8/dist-packages/deep_training/nlp/models/moss/custom_autotune.py:89 in │
│ │
│ │
│ 86 │ │ │ │ # prune configs │
│ 87 │ │ │ │ pruned_configs = self.prune_configs(kwargs) │
│ 88 │ │ │ │ bench_start = time.time() │
│ ❱ 89 │ │ │ │ timings = {config: self._bench(*args, config=config, **kwargs) │
│ 90 │ │ │ │ │ │ │ for config in pruned_configs} │
│ 91 │ │ │ │ bench_end = time.time() │
│ 92 │ │ │ │ self.bench_time = bench_end - bench_start │
│ │
│ /usr/local/lib/python3.8/dist-packages/deep_training/nlp/models/moss/custom_autotune.py:71 in │
│ _bench │
│ │
│ 68 │ │ try: │
│ 69 │ │ │ # In testings using only 40 reps seems to be close enough and it appears to │
│ 70 │ │ │ # PyTorch also sets fast_flush to True, but I didn't see any speedup so I'll │
│ ❱ 71 │ │ │ return triton.testing.do_bench(kernel_call, rep=40) │
│ 72 │ │ except triton.compiler.OutOfResources: │
│ 73 │ │ │ return float('inf') │
│ 74 │
│ │
│ /usr/local/lib/python3.8/dist-packages/triton/testing.py:144 in do_bench │
│ │
│ 141 │ │
│ 142 │ # Estimate the runtime of the function │
│ 143 │ fn() │
│ ❱ 144 │ torch.cuda.synchronize() │
│ 145 │ start_event = torch.cuda.Event(enable_timing=True) │
│ 146 │ end_event = torch.cuda.Event(enable_timing=True) │
│ 147 │ start_event.record() │
│ │
│ /usr/local/lib/python3.8/dist-packages/torch/cuda/init.py:566 in synchronize │
│ │
│ 563 │ """ │
│ 564 │ _lazy_init() │
│ 565 │ with torch.cuda.device(device): │
│ ❱ 566 │ │ return torch._C._cuda_synchronize() │
│ 567 │
│ 568 │
│ 569 def ipc_collect(): │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1

换一个空闲的卡执行应该可以吧

换一个空闲的卡执行应该可以吧

他就是空闲的

换一个空闲的卡执行应该可以吧

他就是空闲的

而且运行infer.py也会报这个错

pip uninstall triton
git clone https://github.com/openai/triton.git
cd triton/python
pip install cmake # build-time dependency
pip install -e .