zjunlp/EasyEdit

SafeEdit dataset and implementation for MEND

Closed this issue · 3 comments

I am looking to apply MEND model editing to remove toxicity with the SafeEdit dataset.
I have seen in examples/run_safety_editing.md, it says in the example script that other methods are not available yet, but as MEND is a baseline in the paper, I guess it must be possible to apply MEND with SafeEdit.

Firstly, is there an implementation available already? Secondly, is there a version of the SafeEdit dataset with the additional fields (such as rephrase)? (In issue 269 from a while back, #269 (comment) you say the dataset for mend is upcoming.)
In summary, could you please advise on how you implemented the MEND for SafeEdit as you did for table 1 of your SafeEdit paper "Detoxifying Large Language Models via Knowledge Editing"?

Really appreciate any help or advice!

We have added MEND for SafeEdit.

zxlzr commented

hi, do you have any further issues?

Hi @mengrusun and @zxlzr, looking at the MEND implementation, it makes sense to me, thanks for adding it and for responding so quick! I will test it out as soon as I can and let you know if I have any further questions.
Cheers!