Welcome to my repository!
This repository is the recipe for create japanese-mistral-300m.
The feature is ...
- Suppression of unknown word generation by using byte fallback in SentencePiece tokenizer and conversion to huggingface Tokenizers format
- Speed up training with torch.compile (about 2 times)
- Speeding up training with flash attention 2 (about 1.2 times)
- RAM offloading using DeepSpeed ZERO also supports learning with small-scale VRAM
- Use of Mistral 300M
Yukkuri shite ittene!
If you want to try out the contents of this repository quickly and easily, please use this ipynb file.
Build a Python environment using Docker files.
git clone https://github.com/ce-lery/japanese-mistral-300m-recipe.git
cd japanese-mistral-300m-recipe
docker build -t cuda12.1-cudnn8-python3.11.6 ./
docker run -v ./:/home/japanese-mistral-300m-recipe/ -it --gpus all cuda12.1-cudnn8-python3.11.6
Run the shell script with the following command.
Execute python virtual environment construction, pretrain, and fine tuning in order.
bash run_all.sh
The User Guide for this repository is published here. It is written in Japanese