nli0
ML evaluations and safety @scaleapi @centerforaisafety, CS @ucberkeley. nli0.github.io
@ucberkeleySan Francisco, CA
Pinned Repositories
machiavelli
Intro_to_ML_Safety
wmdp
WMDP is a LLM proxy benchmark for hazardous knowledge in bio, cyber, and chemical security. We also release code for RMU, an unlearning method which reduces LLM performance on WMDP while retaining general capabilities.
AgentSocieties
ethics
Aligning AI With Shared Human Values (ICLR 2021)
coup_environment
A pettingzoo environment for the card game "Coup".
ethics
Aligning AI With Shared Human Values (ICLR 2021)
Intro_to_ML_Safety
partisan-gerrymanders
One Way to Spot More Partisan Gerrymanders
nli0's Repositories
nli0/coup_environment
A pettingzoo environment for the card game "Coup".
nli0/ethics
Aligning AI With Shared Human Values (ICLR 2021)
nli0/Intro_to_ML_Safety
nli0/partisan-gerrymanders
One Way to Spot More Partisan Gerrymanders