Training and implement details for Chatlaw2

Question

Opened this issue a month ago · 0 comments

I am currently exploring the ChatLaw models and I have a few questions regarding their training schemes and roles within the ensemble model.

Could you please provide detailed information about the training schemes used for ChatLaw2_plain and ChatLaw2E_plain? Specifically, I am interested in the datasets, preprocessing steps, model architectures, and any fine-tuning techniques applied.
Additionally, I would like to understand the role that ChatLaw2_plain and ChatLaw2E_plain play within the ChatLaw2_MOE (Mixture of Experts) model. How do these models interact and contribute to the overall performance of ChatLaw2_MOE?

Thank you in advance for your assistance. I look forward to your response.