zjunlp/EasyEdit

WISE编辑ZsRE-test-all.json报错

Closed this issue · 10 comments

使用WISE编辑ZsRE-test-all.json时(在run_knowedit_llama2.py基础上加上了WISE需要的loc_prompts),会报错:

Traceback (most recent call last):
  File "/home/xsong/EasyEdit/examples/run_knowedit_llama2.py", line 230, in <module>
    metrics, edited_model, _ = editor.edit(
  File "/home/xsong/EasyEdit/examples/../easyeditor/editors/editor.py", line 160, in edit
    return self.edit_requests(requests, sequential_edit, verbose, test_generation=test_generation, **kwargs)
  File "/home/xsong/EasyEdit/examples/../easyeditor/editors/editor.py", line 333, in edit_requests
    edit_evaluation(all_metrics, request, edited_model, i, test_generation, icl_examples, **kwargs)
  File "/home/xsong/EasyEdit/examples/../easyeditor/editors/editor.py", line 312, in edit_evaluation
    "post": compute_edit_quality(edited_model, self.model_name, self.hparams, self.tok, request, self.hparams.device, eval_metric=eval_metric, test_generation=test_generation),
  File "/home/xsong/EasyEdit/examples/../easyeditor/evaluate/evaluate.py", line 78, in compute_edit_quality
    compute_locality_quality(model, model_name, hparams, tok, locality_key,
  File "/home/xsong/EasyEdit/examples/../easyeditor/evaluate/evaluate.py", line 153, in compute_locality_quality
    loc_tokens = test_prediction_acc(model, tok, hparams, prompt, locality_ground_truth, device, locality=True, vanilla_generation=hparams.alg_name=='GRACE')
  File "/home/xsong/EasyEdit/examples/../easyeditor/evaluate/evaluate_utils.py", line 126, in test_prediction_acc
    outputs = model(**prompt_target_tok)
  File "/home/xsong/EasyEdit/examples/../easyeditor/models/wise/WISE.py", line 97, in __call__
    return self.model(**kwargs)
  File "/home/xsong/anaconda3/envs/EasyEdit/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/xsong/anaconda3/envs/EasyEdit/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 1208, in forward
    outputs = self.model(
  File "/home/xsong/anaconda3/envs/EasyEdit/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/xsong/anaconda3/envs/EasyEdit/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 1018, in forward
    layer_outputs = decoder_layer(
  File "/home/xsong/anaconda3/envs/EasyEdit/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/xsong/anaconda3/envs/EasyEdit/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 756, in forward
    hidden_states = self.mlp(hidden_states)
  File "/home/xsong/anaconda3/envs/EasyEdit/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/xsong/anaconda3/envs/EasyEdit/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 240, in forward
    down_proj = self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
  File "/home/xsong/anaconda3/envs/EasyEdit/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/xsong/EasyEdit/examples/../easyeditor/models/wise/WISE.py", line 428, in forward
    if min_dist.item() < threshold:
RuntimeError: a Tensor with 2 elements cannot be converted to Scalar

没遇到过,可以试试别传subject。如果之前都没问题,就从输入上找找原因

打打断点看看,我这边不会帮忙debug的

可能是因为WISE代码所处理的数据集,一个edit_prompt只能对应一个locality_prompts_Relation_Specificity/locality_prompts_Forgetfulness/.. ?

是的,目前只能进行单条(bs=1)推理。bs>1的情况目前没有支持

好的好的谢谢回复~
还有一个问题~
请问在Vanilla_generation=True这种情况下,wise的性能急剧下降,zsre上的reliability从75.55下降到27.15,generalizability: 71.85->23.64, Locality上升:35.33->91.01, Portability: 53.91->8.48, 这个结果合理嘛?

似乎是合理的,WISE在token分布纠正层面比较有作用,但是在vanilla_generation(生成token)上可能会遭遇ngram重叠的困境。

很遗憾,在我最新的实验中即便采用vanilla_generation,WISE依然能够达到近乎100%的reliability和generalization(单条编辑)。我无法复现您遭遇的问题。。。。
image

您好 上述结果是在sequential_edit=True的场景下编辑zsre_test.json

整个数据集运行需要比较长的时间,测试sequential时10条case观察到Rel和Gen为0.84, 0.74. 相比token by token的acc确实有所下降,但是没有看到大幅下降到0.3甚至0.2。

Metrics Summary:  {'post': {'rewrite_acc': 0.8400000000000001, 'rephrase_acc': 0.74, 'locality': {'neighborhood_acc': 1.0}, 'portability': {'one_hop_acc': 0.45}}}

第二个我想说的点,你测试的其他方法也都是token by token的acc评测,不是说vanilla generation就代表掌握了知识,只要能够干扰到知识分布以及泛化都叫模型编辑。另外你是否测试了其他方法在vanilla generation下的表现?

Anyway,你可以在论文/陈述中说明WISE的这一缺点,我不做更多的回复