This repository contains the official code of the paper: "Evaluating the Ripple Effects of Knowledge Editing in Language Models".
The benchmark creation and all experiments and evaluations were conducted in a Python 3.9 environment. To clone the repository and set up the environment, please run the following commands:
git clone https://github.com/edenbiran/RippleEdits.git
cd RippleEdits
pip install -r requirements.txt
The benchmark files and statistics can be found under data/benchmark/
and data/stats/
.
The benchmark is split into three files named according to the benchmark`s three subsets: RECENT
, RANDOM
and POPULAR
.
For more details please refer to section 4 of the paper.
The source code for generating the benchmark can be found under src/
.
Generating the benchmark from scratch can be done using src/build_benchmark.py
.
Benchmark popularity statistics can be extracted using src/benchmark_statistics.py
.
Each benchmark json contains a list of entries. Each entry is an edit containing the edit information (which also contains the original fact if applicable) and the 6 evaluation criteria. Each evaluation criteria contains a list of tests, where each test contains the test prompt, answers and conditions. An example (shortened for brevity) of an edit entry can be seen below:
{
"example_type": "popular",
"edit": {
"prompt": "The name of the country of citizenship of Leonardo DiCaprio is Syria.",
"subject_id": "Q38111",
"relation": "COUNTRY_OF_CITIZENSHIP",
"target_id": "Q858",
"original_fact": {
"prompt": "The name of the country of citizenship of Leonardo DiCaprio is United States of America.",
"subject_id": "Q38111",
"relation": "COUNTRY_OF_CITIZENSHIP",
"target_id": "Q30"
}
},
"Relation_Specifity": [
{
"test_queries": [
{
"prompt": "The name of the mother of Leonardo DiCaprio is",
"answers": [
{
"value": "Irmelin DiCaprio",
"aliases": [
"Irmelin Indenbirken",
"Irmelin Indenbirken-DiCaprio"
]
}
],
"query_type": "regular",
"subject_id": "Q38111",
"relation": "MOTHER",
"target_ids": [
"Q22984557"
],
"phrase": null
}
],
"test_condition": "OR",
"condition_queries": [
{
"prompt": "The name of the mother of Leonardo DiCaprio is",
"answers": [
{
"value": "Irmelin DiCaprio",
"aliases": [
"Irmelin Indenbirken",
"Irmelin Indenbirken-DiCaprio"
]
}
],
"query_type": "regular",
"subject_id": "Q38111",
"relation": "MOTHER",
"target_ids": [
"Q22984557"
],
"phrase": null
}
]
},
...
],
"Logical_Generalization": [...],
"Subject_Aliasing": [...],
"Compositionality_I": [...],
"Compositionality_II": [...],
"Forgetfulness": [...]
}
The source code for all evaluations of the benchmark can be found under src/
.
All evaluations can be conducted using src/evaluation.py
.
In order to evaluate the benchmark on a language model not currently supported extend the class QueryExecutor
in src/queryexecutor.py
and add the new QueryExecutor
to src/evaluation.py
.
In order to evaluate the benchmark on a knowledge editing technique not currently supported extend the class ModelEditor
in src/modeleditor.py
and add the new ModelEditor
to src/evaluation.py
.
@article{cohen2024evaluating,
title={Evaluating the ripple effects of knowledge editing in language models},
author={Cohen, Roi and Biran, Eden and Yoran, Ori and Globerson, Amir and Geva, Mor},
journal={Transactions of the Association for Computational Linguistics},
volume={12},
pages={283--298},
year={2024},
publisher={MIT Press One Broadway, 12th Floor, Cambridge, Massachusetts 02142, USA~…}
}