MIMIR - Python package for measuring memorization in LLMs.
Documentation is available here.
First install the python dependencies
pip install -r requirements.txt
Then, install our package
pip install -e .
To use, run the scripts in scripts/bash
Note: Intermediate results are saved in tmp_results/
and tmp_results_cross/
for bash scripts. If your experiment completes successfully, the results will be moved into the results/
and results_cross/
directory.
You can either provide the following environment variables, or pass them via your config/CLI:
MIMIR_CACHE_PATH: Path to cache directory
MIMIR_DATA_SOURCE: Path to data directory
The data we used for our experiments is available on Hugging Face Datasets. You can either choose to either load the data directly from Hugging Face with the load_from_hf
flag in the config (preferred), or download the cache_100_200_....
folders into your MIMIR_CACHE_PATH
directory.
python run.py --config configs/mi.json
We include and implement the following attacks, as described in our paper.
- Likelihood (
loss
). Works by simply using the likelihood of the target datapoint as score. - Reference-based (
ref
). Normalizes likelihood score with score obtained from a reference model. - Zlib Entropy (
zlib
). Uses the zlib compression size of a sample to approximate local difficulty of sample. - Neighborhood (
ne
). Generates neighbors using auxiliary model and measures change in likelihood. - Min-K% Prob (
min_k
). Uses k% of tokens with minimum likelihood for score computation. - Min-K%++ (
min_k++
). Uses k% of tokens with minimum normalized likelihood for score computation. - Gradient Norm (
gradnorm
). Uses gradient norm of the target datapoint as score. - ReCaLL(
recall
). Operates by comparing the unconditional and conditional log-likelihoods.
To extend the package for your own dataset, you can directly load your data inside load_cached()
in data_utils.py
, or add an additional if-else within load()
in data_utils.py
if it cannot be loaded from memory (or some source) easily. We will probably add a more general way to do this in the future.
To add an attack, create a file for your attack (e.g. attacks/my_attack.py
) and implement the interface described in attacks/all_attacks.py
.
Then, add a name for your attack to the dictionary in attacks/utils.py
.
If you would like to submit your attack to the repository, please open a pull request describing your attack and the paper it is based on.
If you use MIMIR in your research, please cite our paper:
@inproceedings{duan2024membership,
title={Do Membership Inference Attacks Work on Large Language Models?},
author={Michael Duan and Anshuman Suri and Niloofar Mireshghallah and Sewon Min and Weijia Shi and Luke Zettlemoyer and Yulia Tsvetkov and Yejin Choi and David Evans and Hannaneh Hajishirzi},
year={2024},
booktitle={Conference on Language Modeling (COLM)},
}