[Question] Training about aiXcoder-7B

Question

[Question] Training about aiXcoder-7B

Closed this issue a year ago · 1 comments

Congratulations on this wonderful work! I noticed that the Evol-Instruct method is utilized in aiXcoder-7B training. There are some differences between the traditional implementation of Evol-Instruct and aiXcoder's prompts modified based on FIM. Is there any specific implementation strategy or example for it? Thanks!

Answer 1 · 2024-04-10T09:27:41.000Z

We will use the open-source Evol-instruct dataset and algorithm problems dataset as the initial collection, then clean and convert them into code file formats. Additionally, we will enrich the diversity of the dataset by invoking other SOTA LLMs. Finally, combining these data with structured fill-in-the-middle tasks for training, thereby enhancing the aixcoder-7b-base's ability to respond to requirements.