/MIA-LLMs

The legacy repo of NeurIPS'24 paper "Membership Inference Attacks against Fine-tuned Large Language Models via Self-prompt Calibration"

Primary LanguagePython

Practical Membership Inference Attacks against Fine-tuned Large Language Models via Self-prompt Calibration

This is the official implementation of the paper "Practical Membership Inference Attacks against Fine-tuned Large Language Models via Self-prompt Calibration". The proposed Membership Inference Attack based on Self-calibrated Probabilistic Variation (SPV-MIA) is implemented as follows.

The overall architecture of SPV-MIA

Requirements

  • torch>=1.11.0
  • accelerate==0.20.3
  • transformers==4.34.0.dev0
  • trl==0.7.1
  • datasets==2.13.1
  • numpy>=1.23.4
  • scikit-learn>=1.1.3
  • pyyaml>=6.0
  • tqdm>=4.64.1

Dependency can be installed with the following command:

pip install -r requirements.txt

Target Model Fine-tuning

All large language models (LLMs) are built on the top of transformers, a go-to library for state-of-the-art transformer models, on which you can fine-tune arbitrary well-known LLMs you want, including LLaMA, GPT-series, Falcon, etc. We recommend training LLMs with multi-GPU and accelerate, a library that enables the same PyTorch code to be run across any distributed configuration:

accelerate launch ./ft_llms/llms_finetune.py \
--output_dir ./ft_llms/*pretrained_model_name*/*dataset_name*/target/ \
--block_size 128 --eval_steps 100 --save_epochs 100 --log_steps 100 \
-d *dataset_name* -m *pretrained_model_name* --packing --use_dataset_cache \
-e 10 -b 4 -lr 1e-4 --gradient_accumulation_steps 1 \
--train_sta_idx=0 --train_end_idx=10000 --eval_sta_idx=0 --eval_end_idx=1000

Please replace *pretrained_model_name* and *dataset_name* with the names of pretrained LLM and training dataset, such as decapoda-research/llama-7b-hf and ag_news.

Recommended pretrained models

Recommended datasets

Self-prompt Reference Model Fine-tuning

Before fine-tuning the self-prompt reference model, the reference dataset can be sampled via our proposed self-prompt approach over the fine-tuned LLM.

accelerate launch refer_data_generate.py \
-tm *fine_tuned_model* \
-m *pretrained_model_name* -d *dataset_name*

Replace *fine_tuned_model* with the directory of the fine-tuned target model (i.e., the output directory of the Target Model Fine-tuning phase).

Then fine-tune the self-prompt reference model in the same manner as the target model, but with a smaller training epoch:

accelerate launch ./ft_llms/llms_finetune.py --refer \
--output_dir ./ft_llms/*pretrained_model_name*/*dataset_name*/refer/ \
--block_size 128 --eval_steps 100 --save_epochs 100 --log_steps 100 \
-d *dataset_name* -m *pretrained_model_name* --packing --use_dataset_cache \
-e 2 -b 4 -lr 5e-5 --gradient_accumulation_steps 1 \
--train_sta_idx=0 --train_end_idx=10000 --eval_sta_idx=0 --eval_end_idx=1000

Run SPV-MIA

After accomplishing the preliminary operations, here is the command for deploying SPV-MIA on the target model.

python attack.py

Footnotes

  1. This third-party repo decapoda-research/llama-7b-hf seems to be deleted by unknown reasons, using forked repos luodian/llama-7b-hf or baffo32/decapoda-research-llama-7B-hf as alternatives.

  2. Please add an additional argument --dataset_config_name wikitext-2-raw-v1 to specify this dataset.