Other datasets
Closed this issue · 2 comments
Ph0rk0z commented
Any easy way to support other datasets?
I have a big one that is just Prompt and then response without the "input".
I thought to just add new dataset and edit:
# Auxiliary methods
def generate_prompt(self, data_point, **kwargs):
return "{0}\n\n{1}\n{2}\n\n{3}\n{4}\n\n{5}\n{6}".format(
"Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.",
"### Instruction:",
data_point["instruction"],
"### Input:",
data_point["input"],
"### Response:",
data_point["output"]
But how to tell if dateset is being fed to the model correctly?
Hoping using this directly is faster than training through textgen.. I also find that xformers slows things down :(
johnsmith0031 commented
If you want to use customized dataset I think currently the best way is just to copy finetune.py to some other file such as finetune1.py and edit it, then check if the data format is correct before training.
Ph0rk0z commented
I think I actually figured it out and got it working