Pinned Repositories
Agent-Smith
[ICML 2024] Agent Smith: A Single Image Can Jailbreak One Million Multimodal LLM Agents Exponentially Fast
Cheating-LLM-Benchmarks
[SafeGenAi @ NeurIPS 2024] Cheating Automatic LLM Benchmarks: Null Models Achieve High Win Rates
D-TRAK
Intriguing Properties of Data Attribution on Diffusion Models (ICLR 2024)
I-FSJ
Improved Few-Shot Jailbreaking Can Circumvent Aligned Language Models and Their Defenses (NeurIPS 2024)
regmix
š§¬ RegMix: Data Mixture as Regression for Language Model Pre-training
Classical-LOO
Leave One Out
LLM-TRAK
memorization
An Empirical Study of Memorization in NLP (ACL 2022)
xszheng2020
xszheng2020.github.io
xszheng2020's Repositories
xszheng2020/memorization
An Empirical Study of Memorization in NLP (ACL 2022)
xszheng2020/LLM-TRAK
xszheng2020/Classical-LOO
Leave One Out
xszheng2020/xszheng2020
xszheng2020/xszheng2020.github.io
xszheng2020/fast-influence-functions
xszheng2020/heldout-influence-estimation
xszheng2020/readme-best-practices
Best practices for writing a README for your open source project
xszheng2020/responsibleNLPresearch
templates and other documents regarding responsible NLP research
xszheng2020/stable-diffusion-analytic-dpm
xszheng2020/alpaca_eval
An automatic evaluator for instruction-following language models. Human-validated, high-quality, cheap, and fast.