Pinned Repositories
Evaluating-Durable-Safeguards
[ICLR 2025] On Evluating the Durability of Safegurads for Open-Weight LLMs
alignment-attribution-code
[ICML 2024] Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications
boyiwei.github.io
Github Pages template for academic personal websites, forked from mmistakes/minimal-mistakes
COS598D-Pruning
Assignments for COS598D: System and Machine Learning
cos598d_sp24
CoTaEval
[NeurIPS 2024 D&B] Evaluating Copyright Takedown Methods for Language Models
ReG-NAS
RepNoise-Reproduce
tamper-resistance
Official Repository for "Tamper-Resistant Safeguards for Open-Weight LLMs"
TAR-Reproduce
boyiwei's Repositories
boyiwei/alignment-attribution-code
[ICML 2024] Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications
boyiwei/CoTaEval
[NeurIPS 2024 D&B] Evaluating Copyright Takedown Methods for Language Models
boyiwei/ReG-NAS
boyiwei/RepNoise-Reproduce
boyiwei/TAR-Reproduce
boyiwei/boyiwei.github.io
Github Pages template for academic personal websites, forked from mmistakes/minimal-mistakes
boyiwei/COS598D-Pruning
Assignments for COS598D: System and Machine Learning
boyiwei/cos598d_sp24
boyiwei/tamper-resistance
Official Repository for "Tamper-Resistant Safeguards for Open-Weight LLMs"