data-contamination
There are 7 repositories under data-contamination topic.
mravanelli/pySpeechRev
This python code performs an efficient speech reverberation starting from a dataset of close-talking speech signals and a collection of acoustic impulse responses.
lyy1994/awesome-data-contamination
The Paper List on Data Contamination for Large Language Models Evaluation.
yyy01/PAC
The official implementation of the paper "Data Contamination Calibration for Black-box LLMs" (ACL 2024)
nlx-group/overlapy
Python package developed to evaluate textual overlap (N-Grams) between two volumes of text.
shahriargolchin/time-travel-in-llms
The official repository for the paper entitled "Time Travel in LLMs: Tracing Data Contamination in Large Language Models."
THU-KEG/DICE
DICE: Detecting In-distribution Data Contamination with LLM's Internal State
shahriargolchin/DCQ
The official repository for the paper entitled "Data Contamination Quiz: A Tool to Detect and Estimate Contamination in Large Language Models."