japanese-mistral-300m-recipe

Overview

Welcome to my repository!
This repository is the recipe for create japanese-mistral-300m.

The feature is ...

Suppression of unknown word generation by using byte fallback in SentencePiece tokenizer and conversion to huggingface Tokenizers format
Speed up training with torch.compile (about 2 times)
Speeding up training with flash attention 2 (about 1.2 times)
RAM offloading using DeepSpeed ZERO also supports learning with small-scale VRAM
Use of Mistral 300M

Yukkuri shite ittene!

Quick Started

If you want to try out the contents of this repository quickly and easily, please use this ipynb file.

Getting Started

Build a Python environment using Docker files.

git clone https://github.com/ce-lery/japanese-mistral-300m-recipe.git
cd japanese-mistral-300m-recipe
docker build -t cuda12.1-cudnn8-python3.11.6 ./
docker run -v ./:/home/japanese-mistral-300m-recipe/ -it --gpus all cuda12.1-cudnn8-python3.11.6

Run the shell script with the following command.
Execute python virtual environment construction, pretrain, and fine tuning in order.

bash run_all.sh

User Guide

The User Guide for this repository is published here. It is written in Japanese