/mixtral

Mixtral finetuning

Primary LanguageJupyter Notebook

mixtral

This repo aims to fine-tune Mixtral model.

Requirements

I provided a small Docker image to run the code. You can build it or pull from this repo.

I have not been able to install latest flash_attn in any container that has the necessary PyTorch and Cuda versions. This image is made to run on H100 GPUs.

Run

  • Run the simple_inference.py script to test the model. It actually runs on an A100 with 40GB of memory!

Train

Test you system by running the simple_train.py script. It will train a model on a small dataset. It takes around 1 hour on a 8xH100 machine.

Axolotl

Our experiments are being conducted using the Axolotl lib. For this we are uisng a Docker image that can be pulled from this repo. It bakes the execution of the axolotl_launcher.py script as a replacement of the axolotl.cli.train script. We do this so we can inject parameters from the W&B UI as a launch job.