data-contamination

There are 7 repositories under data-contamination topic.

  • mravanelli/pySpeechRev

    This python code performs an efficient speech reverberation starting from a dataset of close-talking speech signals and a collection of acoustic impulse responses.

    Language:Python9410825
  • lyy1994/awesome-data-contamination

    The Paper List on Data Contamination for Large Language Models Evaluation.

  • yyy01/PAC

    The official implementation of the paper "Data Contamination Calibration for Black-box LLMs" (ACL 2024)

    Language:Python12220
  • nlx-group/overlapy

    Python package developed to evaluate textual overlap (N-Grams) between two volumes of text.

    Language:Python9202
  • shahriargolchin/time-travel-in-llms

    The official repository for the paper entitled "Time Travel in LLMs: Tracing Data Contamination in Large Language Models."

    Language:Python7102
  • THU-KEG/DICE

    DICE: Detecting In-distribution Data Contamination with LLM's Internal State

    Language:Python5510
  • shahriargolchin/DCQ

    The official repository for the paper entitled "Data Contamination Quiz: A Tool to Detect and Estimate Contamination in Large Language Models."

    Language:Python3100