Practical Membership Inference Attacks against Fine-tuned Large Language Models via Self-prompt Calibration

Requirements
Target Model Fine-tuning
Self-prompt Reference Model Fine-tuning
Run SPV-MIA

This is the official implementation of the paper "Practical Membership Inference Attacks against Fine-tuned Large Language Models via Self-prompt Calibration". The proposed Membership Inference Attack based on Self-calibrated Probabilistic Variation (SPV-MIA) is implemented as follows.

Requirements

torch>=1.11.0
accelerate==0.20.3
transformers==4.34.0.dev0
trl==0.7.1
datasets==2.13.1
numpy>=1.23.4
scikit-learn>=1.1.3
pyyaml>=6.0
tqdm>=4.64.1

Dependency can be installed with the following command:

pip install -r requirements.txt

Target Model Fine-tuning

All large language models (LLMs) are built on the top of transformers, a go-to library for state-of-the-art transformer models, on which you can fine-tune arbitrary well-known LLMs you want, including LLaMA, GPT-series, Falcon, etc. We recommend training LLMs with multi-GPU and accelerate, a library that enables the same PyTorch code to be run across any distributed configuration:

accelerate launch ./ft_llms/llms_finetune.py \
--output_dir ./ft_llms/*pretrained_model_name*/*dataset_name*/target/ \
--block_size 128 --eval_steps 100 --save_epochs 100 --log_steps 100 \
-d *dataset_name* -m *pretrained_model_name* --packing --use_dataset_cache \
-e 10 -b 4 -lr 1e-4 --gradient_accumulation_steps 1 \
--train_sta_idx=0 --train_end_idx=10000 --eval_sta_idx=0 --eval_end_idx=1000

Please replace *pretrained_model_name* and *dataset_name* with the names of pretrained LLM and training dataset, such as decapoda-research/llama-7b-hf and ag_news.

Recommended pretrained models

GPT-2 (https://huggingface.co/gpt2)
GPT-J (https://huggingface.co/EleutherAI/gpt-j-6b)
Falcon (https://huggingface.co/tiiuae/falcon-7b)
LLaMA (https://huggingface.co/decapoda-research/llama-7b-hf) ¹

Recommended datasets

Ag News (https://huggingface.co/datasets/ag_news)
Wikitext-103 (https://huggingface.co/datasets/wikitext) ²
Xsum (https://huggingface.co/datasets/xsum)

Self-prompt Reference Model Fine-tuning

Before fine-tuning the self-prompt reference model, the reference dataset can be sampled via our proposed self-prompt approach over the fine-tuned LLM.

accelerate launch refer_data_generate.py \
-tm *fine_tuned_model* \
-m *pretrained_model_name* -d *dataset_name*

Replace *fine_tuned_model* with the directory of the fine-tuned target model (i.e., the output directory of the Target Model Fine-tuning phase).

Then fine-tune the self-prompt reference model in the same manner as the target model, but with a smaller training epoch:

accelerate launch ./ft_llms/llms_finetune.py --refer \
--output_dir ./ft_llms/*pretrained_model_name*/*dataset_name*/refer/ \
--block_size 128 --eval_steps 100 --save_epochs 100 --log_steps 100 \
-d *dataset_name* -m *pretrained_model_name* --packing --use_dataset_cache \
-e 2 -b 4 -lr 5e-5 --gradient_accumulation_steps 1 \
--train_sta_idx=0 --train_end_idx=10000 --eval_sta_idx=0 --eval_end_idx=1000

Run SPV-MIA

After accomplishing the preliminary operations, here is the command for deploying SPV-MIA on the target model.

python attack.py

This third-party repo decapoda-research/llama-7b-hf seems to be deleted by unknown reasons, using forked repos luodian/llama-7b-hf or baffo32/decapoda-research-llama-7B-hf as alternatives. ↩
Please add an additional argument --dataset_config_name wikitext-2-raw-v1 to specify this dataset. ↩

wjfu99/MIA-LLMs