This is the implementation for the paper Reversing the Forget-Retain Objectives: An Efficient LLM Unlearning Framework from Logit Difference.
conda create -n uld python=3.10 -y
conda activate uld
conda install pytorch==2.1.1 torchvision==0.16.1 torchaudio==2.1.1 pytorch-cuda=11.8 -c pytorch -c nvidia
conda install -c "nvidia/label/cuda-11.8.0" cuda-toolkit -y
pip install flash-attn==2.5.6 --no-build-isolation
pip install deepspeed
pip install -e .
python scripts/hf_forget_train.py \
data=[tofu|harry] \
data.dataset.split=${DATASPLIT} \
model=[tofu-llama-2|mistral] \
model_mode=uld \
model_mode.num_layer=8 \
unlearn_loss=remember+uniform \
trainer.strategy=ddp \
OUTPUTMODELDIR=${OUTPUTMODELDIR}
For more detailed training options, please refer to bashes/tofu/uld_train_eval.sh
and bashes/harry/uld_train_eval.sh
. This would save the assistant model to ${OUTPUTMODELDIR}
.
python scripts/hf_forget_train.py \
data=[tofu|harry] \
data.dataset.split=${DATASPLIT} \
model=[tofu-llama-2|mistral] \
model_mode=offset \
unlearn_loss=${UNLEARN_LOSS} \
trainer.strategy=deepspeed \
OUTPUTMODELDIR=${OUTPUTMODELDIR}
For more detailed training options, please refer to bashes/tofu/offset_train_eval.sh
and bashes/harry/offset_train_eval.sh
. This would save the assistant model to ${OUTPUTMODELDIR}
.
python scripts/hf_forget_train.py \
data=[tofu|harry] \
data.dataset.split=${DATASPLIT} \
model=[tofu-llama-2|mistral] \
model_mode=base \
unlearn_loss=${UNLEARN_LOSS} \
trainer.strategy=deepspeed \
OUTPUTMODELDIR=${OUTPUTMODELDIR}
For more detailed training options, please refer to bashes/tofu/base_train_eval.sh
and bashes/harry/base_train_eval.sh
. This would save the unlearned model to ${OUTPUTMODELDIR}
.
python scripts/eval_tofu.py \
data=[tofu|harry] \
model=[tofu-llama-2|mistral] \
model_mode=[base|uld|offset] \
ckpt_path=${CHECKPOINT_DIR} \
data.dataset.split=${DATASPLIT}
For more detailed options, please refer to bashes/tofu/tofu_eval.sh
and bashes/harry/harry_eval.sh
.
We also implement several other unlearning methods employed in previous works, including:
- Offset Unlearning for Large Language Models
- Negative Preference Optimization: From Catastrophic Collapse to Effective Unlearning
- TOFU: A Task of Fictitious Unlearning for LLMs
You can follow the guide below to implement other unlearning methods.
scripts/
: scripts for training and evaluationuld/data/
: data processing and dataloaderuld/models/
: model definitionuld/trainer/
: unlearn trainer and unlearn losses.
- Add dataset to
uld/data/
and register it inuld/data/__init__.py
- Implement new dataset class by inheriting
TrainDataModule
class, reference implementation for ToFU dataset is inuld/data/tofu.py
. Typically, you need to implement the logic to load forget data and retain data.
- Add unlearn loss to
uld/trainer/unlearn_losses.py
and add it inconfigs/unlearn_loss
. - Implement new unlearn loss class by defining the
forget_loss_func
andretain_loss_func
forForgetRetainLoss
class, reference implementation is increate_unlearn_loss
function.
If you find this work useful, please consider cite our paper:
@article{ji2024reversing,
title = {Reversing the Forget-Retain Objectives: An Efficient LLM Unlearning Framework from Logit Difference},
author = {Jiabao Ji and Yujian Liu and Yang Zhang and Gaowen Liu and Ramana Rao Kompella and Sijia Liu and Shiyu Chang},
year = {2024},
journal = {arXiv preprint arXiv: 2406.08607}
}
Huge thanks for following repos that greatly help our implementation: