SafeEdit dataset and implementation for MEND

Question

SafeEdit dataset and implementation for MEND

Closed this issue 4 days ago · 3 comments

I am looking to apply MEND model editing to remove toxicity with the SafeEdit dataset.
I have seen in examples/run_safety_editing.md, it says in the example script that other methods are not available yet, but as MEND is a baseline in the paper, I guess it must be possible to apply MEND with SafeEdit.

Firstly, is there an implementation available already? Secondly, is there a version of the SafeEdit dataset with the additional fields (such as rephrase)? (In issue 269 from a while back, #269 (comment) you say the dataset for mend is upcoming.)
In summary, could you please advise on how you implemented the MEND for SafeEdit as you did for table 1 of your SafeEdit paper "Detoxifying Large Language Models via Knowledge Editing"?

Really appreciate any help or advice!

Answer 1 · 2024-11-15T14:57:49.000Z

We have added MEND for SafeEdit.

Answer 2 · 2024-11-16T02:43:29.000Z

hi, do you have any further issues?

Answer 3 · 2024-11-18T04:09:11.000Z

Hi @mengrusun and @zxlzr, looking at the MEND implementation, it makes sense to me, thanks for adding it and for responding so quick! I will test it out as soon as I can and let you know if I have any further questions.
Cheers!