LLM-Tuning-Safety/LLMs-Finetuning-Safety

How about the response quality beyond the finetune domain

wqw547243068 opened this issue · 1 comments

Since this paper reveal the Safety Risks of Fine-tuning Aligned LLMs, I am wondering:

  • If I tuned a model for some specific domain, such as personal assistant, is the response quality beyond the finetune domain(personal assistant) also affected?

I happened to find that system prompt (obviously contradicting the supervised dataset) doesn't work on the finetune model.

Hi,

In Appendix C, we have some results related to your question.
Also, here are some other relevant papers that may answer your question:
https://arxiv.org/abs/2309.06256
https://arxiv.org/abs/2309.10313

Thanks!