/soft_optim

A repo for distributed training of language models with Reinforcement Learning via Human Feedback (RLHF)

Primary LanguagePython

Soft Optimization

Use

Setup

First run the setup script (replace -j with the correct user):

sh ./setup.sh -j

Running a script

To run directly:

poetry run python soft_optim/fine_tune.py

To launch Accelerate use:

accelerate launch --config_file configs/deepspeed_configs/default_configs.yml examples/simulacra_tmp.py