training script

Question

training script

Closed this issue a year ago · 9 comments

Thank you for your outstanding work. When it's convenient for you, could you provide the original script such that I could repeat the experiments in the paper if possible, thanks so much

Answer 1 · 2023-10-14T09:00:35.000Z

Thank you for your appreciation. Here is the original script that you can use to repeat the experiments in the paper:

deepspeed --num_gpus=2 --master_port=12345 train.py \
    --deepspeed ${deepspeed config path} \
    --model_name_or_path ${path to base model like vicuna-7b}  \
    --data_path ${data path} \
    --bf16 True \
    --output_dir outputs/vicuna-7b-toolalpaca/ \
    --num_train_epochs 3 \
    --per_device_train_batch_size 32 \
    --per_device_eval_batch_size 1 \
    --gradient_accumulation_steps 2 \
    --evaluation_strategy "no" \
    --save_strategy "epoch" \
    --save_total_limit 10 \
    --learning_rate 2e-5 \
    --weight_decay 0. \
    --warmup_ratio 0.03 \
    --lr_scheduler_type "cosine" \
    --logging_steps 1 \
    --tf32 True \
    --model_max_length 2048 \
    --gradient_checkpointing True \
    --lazy_preprocess True

Let me know if you need further assistance.

Answer 2 · 2023-10-19T02:42:41.000Z

thanks for your reply. After running build_dataset, I got the following data format. But I think there still exists a gap between this format and the format you used in the sft. If possible, can you provide train.py file such that we know how to go from the following format to the format we use for fine-tuning LLM.

The format I got is given below:
[
[
"A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions with the help of some tools.\nYou have access to the following tools:\n\ngetDetails: If the user's question lacks the essential information needed to answer the question effectively, or if the question contains vague terms or pronouns without sufficient context, invoke the getDetails function to prompt the user for the missing critical details. However, getDetails should not be used in cases where the user omits optional parameters, unless these parameters become necessary in the course of the conversation. \nParameters: {"Question": "The question to prompt user to provide sufficient information."}\nOutput: User's response.\nsendHttpRequest: Send an HTTP request with the specified method, headers, and data to the Httpbin API for testing purposes.\nParameters: {"method": "Required. string. One of: [GET, POST, PUT, DELETE, HEAD, PATCH]. The HTTP method to use (GET, POST, PUT, DELETE, HEAD, or PATCH).", "url": "Required. string. The endpoint URL to send the request to.", "headers": "Object. A key-value pair of headers to include in the request.", "data": "Object. A key-value pair of data to include in the request body."}\nOutput: Successful response.\n - Format: application/json\n - Structure: Object{response: Object{status_code, headers: Object, body}}\ngetClientRequestData: Retrieve the client's request data, including headers, form data, uploaded files, and cookies.\nParameters: {"url": "Required. string. The endpoint URL to send the request to."}\nOutput: Successful response.\n - Format: application/json\n - Structure: Object{requestData: Object{headers: Object, form: Object, files: Object, cookies: Object}}\ntestProxyHeaders: Send a request to the Httpbin API's proxy headers testing endpoint.\nParameters: {"url": "Required. string. The endpoint URL to send the request to.", "headers": "Object. A key-value pair of headers to include in the request."}\nOutput: Successful response.\n - Format: application/json\n - Structure: Object{response: Object{status_code, headers: Object, body}}\nsimulateStatusCode: Send a request to the Httpbin API's status code simulation endpoint to test how your application handles specific status codes.\nParameters: {"url": "Required. string. The endpoint URL to send the request to.", "statusCode": "Required. integer. The HTTP status code to simulate."}\nOutput: Successful response.\n - Format: application/json\n - Structure: Object{response: Object{status_code, headers: Object, body}}\n\nThe chat follows this format:\nUSER: the user's question\nASSISTANT Thought: the assistant's inner thought about what to do next \nASSISTANT Action: the action to take, must be one of [getDetails, sendHttpRequest, getClientRequestData, testProxyHeaders, simulateStatusCode].\nASSISTANT Action Input: the input for the action, in JSON format.\nASSISTANT Observation: the result of the action\n... (this Thought/Action/Action Input/Observation can repeat N times)\nASSISTANT Thought: summarize the information gathered\nASSISTANT Response: the final response to the user\nUSER: user's next question\n...\n\nBegin!\n\nUSER: I'm troubleshooting some requests, so can you help me send a POST request to https://httpbin.org/post with the header "Content-Type: application/json" and the data '{"name": "John Doe", "email": "john.doe@example.com"}'? Let me know the response details.\nASSISTANT Thought: ",
"I need to use the sendHttpRequest tool to send the POST request.\nASSISTANT Action: sendHttpRequest\nASSISTANT Action Input: {"method": "POST", "url": "https://httpbin.org/post", "headers": {"Content-Type": "application/json"}, "data": {"name": "John Doe", "email": "john.doe@example.com"}}\nASSISTANT Observation: ",
"Status Code: 200. Response: {"response":{"status_code":200,"headers":{"Content-Type":"application/json"},"body":"{\"args\":{\"method\":\"POST\",\"url\":\"https://httpbin.org/post\",\"headers\":{\"'Content-Type'\":\"'application/json'\"},\"data\":{\"'name'\":\"'John Doe'\",\"'email'\":\"'john.doe@example.com'\"}},\"data\":\"\",\"files\":{},\"form\":{},\"headers\":{\"Accept\":\"/\",\"Content-Length\":\"48\",\"Content-Type\":\"application/x-www-form-urlencoded\",\"Host\":\"httpbin.org\",\"User-Agent\":\"curl/7.68.0\"},\"json\":{\"'name'\":\"'John Doe'\",\"'email'\":\"'john.doe@example.com'\"},\"method\":\"POST\",\"origin\":\"127.0.0.1\",\"url\":\"https://httpbin.org/post\"}"}}\nASSISTANT Thought: ",
"The request was successful and the response contains the data sent in the request.\nASSISTANT Response: The POST request was successful and the response contains the data sent in the request. The response status code was 200."
],
[
false,
true,
false,
true
]
]

Thanks so much!

Answer 3 · 2023-10-19T03:10:32.000Z

Apologies for the oversight, I forgot to upload the train.py. I have now uploaded it. You can try using it to fine-tune the LLM.

Answer 4 · 2023-10-19T03:25:56.000Z

thanks for your quick response. After running the build_dataset script, do you use that to train the model directly? If possible, can you give me one sample which you used to fine tune LLM?

Answer 5 · 2023-10-19T03:38:39.000Z

After executing the build_dataset.py, you can directly train the model using the training scripts I provided. The one you mentioned above is indeed a suitable sample for fine-tuning the LLM.

Answer 6 · 2023-10-19T03:46:52.000Z

the example I provided above is one sample which will be fed into the model or actually it is FOUR samples to fine tune LLM? BTW, do we have any requirements on the max-input/output token length for the base model?

Answer 7 · 2023-10-19T03:57:16.000Z

if i use the format {'input':sample, 'ouput':label} to fine-tune LLM, can I regard one last sentence in the above example, "The request was successful and the response contains the data sent in the request.\nASSISTANT Response: The POST request was successful and the response contains the data sent in the request. The response status code was 200." as the sample, and use "True" as the label? Am I right? I am a little bit confused

Answer 8 · 2023-10-19T04:31:38.000Z

Regarding the training data, the example you provided above is a complete piece of training data. Here, True/False represents whether it is trainable or not. All you need to do is pass the data file generated by build_dataset.py directly to train.py.
For the context length of the base model, we recommend using a value greater than or equal to 2048. In the scenario of tool use, a longer context length is required.

Answer 9 · 2023-10-19T07:37:51.000Z

that's great, would you mind sharing the deepspeed config file when it's convenient for you? Thanks so much