Description
This Python script converts data from a job_skill CSV file to OpenAI finetuning format, specifically JSONL format. The script extracts relevant fields from each row of the CSV file and constructs a JSONL object containing a prompt and a completion. The prompt includes the location, responsibilities, and minimum qualifications for the job, while the completion includes the job title and category.
Requirements
- Python 3.x
- The
csv
andjson
modules, which are included in Python by default - The OpenAI API, which can be installed using
pip install openai
Usage
-
Ensure that the CSV file containing the data is in the same directory as the script.
-
Update the script with the appropriate filename for the CSV file and the desired filename for the output JSONL file.
-
Run the script in a terminal or command prompt using the command
python transform.py
. -
After the script has finished running, prepare the data for finetuning using the OpenAI tools. In a terminal or command prompt, execute the following command:
openai tools fine_tunes.prepare_data -f output.jsonl
This command will prepare the data in the
output.jsonl
file for finetuning and store the prepared data in a new file.
Output
The script will create a new file in JSONL format, containing one JSON object per line. Each object will have a prompt and a completion.
The format of each object is as follows:
{
"prompt": "Location: {location}\nResponsibilities: {responsibilities}\n Qualifications: {minimum_qualifications}",
"completion": "{title}\n{category}"
}
Next step
You can follow Fine-tuning in Azure OpenAI (with example code) to fine-tune and test a Azure OpenAI model using this output