Re-Align/URIAL

base model continuously outputting uninterrupted and multi-turn dialogues?

Closed this issue · 6 comments

Firstly, the base model often continues outputting 'Answer' until the max_out_token is reached. Are there any good solutions for this, such as using a larger base model?

Secondly, regarding multi-turn dialogues, I have seen issue #1. Would providing some examples of multi-turn dialogues in the demonstration suffice?

  1. which particular models and examples are you testing? of course a larger base model works better in general.

  2. For multi-turn dialogue, you could follow the description of the workflow shown in the case study part of our paper. For now, you can just treat the previous dialogue history as new in-context examples and append them in the end. I'll consider using the HF's tokenizer template to make it easier to use in the near future.

  1. which particular models and examples are you testing? of course a larger base model works better in general.您正在测试哪些特定模型和示例?当然,一般来说,较大的基础模型效果更好。
  2. For multi-turn dialogue, you could follow the description of the workflow shown in the case study part of our paper. For now, you can just treat the previous dialogue history as new in-context examples and append them in the end. I'll consider using the HF's tokenizer template to make it easier to use in the near future.对于多轮对话,您可以遵循我们论文的案例研究部分中显示的工作流程的描述。现在,您可以将之前的对话历史记录视为新的上下文示例并将其附加到最后。我会考虑使用 HF 的 tokenizer 模板,以使其在不久的将来更易于使用。

Thank for your reply.

  1. I used stop-token-id and it worked
  2. I guess for now, I can only explicitly put it in

I'm working on integrating URIAL as a special template with fastchat's conversation templates so you can easily use them with vllm. plz stay tuned

I'm working on integrating URIAL as a special template with fastchat's conversation templates so you can easily use them with vllm. plz stay tuned

Still expecting~ hahaha
Your work has been really helpful to me when I evaluate my base model. Sadly I have still to sft and rlhf to align, base model is still easy to jailbreak

sorry for the late reply. it has been implemented long time ago in the repo, plz have a look at the new logic