johnsmith0031/alpaca_lora_4bit

Other datasets

Closed this issue · 2 comments

Any easy way to support other datasets?

I have a big one that is just Prompt and then response without the "input".

I thought to just add new dataset and edit:

    # Auxiliary methods
    def generate_prompt(self, data_point, **kwargs):
        return "{0}\n\n{1}\n{2}\n\n{3}\n{4}\n\n{5}\n{6}".format(
            "Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.",
            "### Instruction:",
            data_point["instruction"],
            "### Input:",
            data_point["input"],
            "### Response:",
            data_point["output"]

But how to tell if dateset is being fed to the model correctly?

Hoping using this directly is faster than training through textgen.. I also find that xformers slows things down :(

If you want to use customized dataset I think currently the best way is just to copy finetune.py to some other file such as finetune1.py and edit it, then check if the data format is correct before training.

I think I actually figured it out and got it working