Generate prompt masks from JSONL

Question

Generate prompt masks from JSONL

harubaru opened this issue 2 years ago · 1 comments

Currently, the tokenizer accepts plaintext and when fed through the GPT Finetuner it optimizes on the entire context. For use-cases where optimization should only be performed on a target response, it would be useful to format training data in JSONL which would consist of request/response pairs in order to mask the request when calculating loss during training.

{"prompt":"Overjoyed with the new iPhone! ->", "completion":" positive"}
{"prompt":"@lakers disappoint for a third straight night ->", "completion":" negative"}

In the end, the tokenizer should output a .tokens and .mask file, where .mask contains the masking for the prompts in .tokens.

Answer 1 · 2023-04-20T21:27:00.000Z

Moving to coreweave/kubernetes-cloud#182