training functions?

Question

training functions?

win10ogod opened this issue a year ago · 4 comments

Hello, does this project have similar training functions to llama2.c?

Answer 1 · 2024-01-10T13:33:17.000Z

Hello, no, for now you have to train it "manually", there isnt a training script (you can see an example in examples/example_e2e_training.ipynb)
I might add a training script like llama2.c in the near future

Answer 2 · 2024-01-26T16:51:51.000Z

I created some training scripts. They're centered around this ChessGPT repo: https://github.com/adamkarvonen/nanoGPT, and they're kinda a mess, so I don't want to submit them as a PR but they should make for a good start.

train.py works with bin files and a fixed (training time) sequence length.

train_bygame.py works with parquet files. It assumes the data is one sequence per row and sorted by sequence length, then split into a files. It then randomly reads from these files, and the sequence length for the training iteration is the max length of the sequences in the batch (or max_seq_len, which is there to cap VRAM use). I did this instead of just df.sample from the complete dataset because the rapid changing of sequence length was causing crashes ... this way you get the speed benefit of only using the sequence length you need, without the instability.
train.zip

Answer 3 · 2024-01-26T22:13:33.000Z

Thanks, I will take a look !

Answer 4 · 2024-02-05T17:29:55.000Z

Hello, if anyone is interested about a full-fledged training script, you can check the othello_mamba repo. It features a complete training script (similar to llama2.c) that you can easily adapt to your needs. It is compatible with mamba.py (it doesn't use mamba_lm.py but a more general lm.py that also works for Transformers)