204313508/noname_llm

微调样本能否分享?

Closed this issue · 3 comments

想要你处理的用于微调的技能样本,在自己本地试试微调。

skills.txt

https://github.com/QwenLM/Qwen/blob/main/README_CN.md

这个数据格式跟qwen的repo里微调部分给的多轮对话格式不太一样啊,
请问能否分享你的微调代码和数据?

训练格式根据微调代码的读取datasets方法决定,该数据技能及描述开头均为,你想用qwen自带的话写段代码分割一下就行,已关闭的议题最好新开一个issue,不然回复很麻烦,这段代码不能解决的话再issue一下,我启动一下服务器下载一下微调数据
with open('skills.txt', 'r', encoding='utf-8') as file: data = [] identity_counter = 0 for line in file: line_data = json.loads(line) conversation = [ { "from": "user", "value": line_data["prompt"] }, { "from": "assistant", "value": line_data["response"] } ] conversation_entry = { "id": f"identity_{identity_counter}", "conversations": conversation } data.append(conversation_entry) identity_counter += 1 print(data)