SafeEdit dataset and implementation for MEND
Closed this issue · 3 comments
I am looking to apply MEND model editing to remove toxicity with the SafeEdit dataset.
I have seen in examples/run_safety_editing.md, it says in the example script that other methods are not available yet, but as MEND is a baseline in the paper, I guess it must be possible to apply MEND with SafeEdit.
Firstly, is there an implementation available already? Secondly, is there a version of the SafeEdit dataset with the additional fields (such as rephrase)? (In issue 269 from a while back, #269 (comment) you say the dataset for mend is upcoming.)
In summary, could you please advise on how you implemented the MEND for SafeEdit as you did for table 1 of your SafeEdit paper "Detoxifying Large Language Models via Knowledge Editing"?
Really appreciate any help or advice!
We have added MEND for SafeEdit.
hi, do you have any further issues?
Hi @mengrusun and @zxlzr, looking at the MEND implementation, it makes sense to me, thanks for adding it and for responding so quick! I will test it out as soon as I can and let you know if I have any further questions.
Cheers!