ed1d1a8d

AI safety researcher. PhD student at MIT.

ed1d1a8d's Stars

xai-org/grok-1
Grok open release
Language:Python49.6k 576 2118.3k
openai/openai-python
The official Python library for the OpenAI API
Language:Python23.2k 306 8173.3k
pqrs-org/Karabiner-Elements
Karabiner-Elements is a powerful tool for customizing keyboards on macOS
Language:C++19k 207 3.8k840
KindXiaoming/pykan
Kolmogorov Arnold Networks
Language:Jupyter Notebook15.1k 111 4131.4k
overleaf/overleaf
A web-based collaborative LaTeX editor
Language:JavaScript14.2k 211 1k1.5k
astral-sh/rye
a Hassle-Free Python Experience
Language:Rust13.9k 61 668468
PaulJuliusMartinez/jless
jless is a command-line JSON viewer designed for reading, exploring, and searching through JSON data.
Language:Rust4.8k 25 11592
chaifeng/ufw-docker
To fix the Docker and UFW security flaw without disabling iptables
Language:Shell4.6k 51 110387
allenai/RL4LMs
A modular RL library to fine-tune language models to human preferences
Language:Python2.2k 24 59190
TransformerLensOrg/TransformerLens
A library for mechanistic interpretability of GPT-style language models
Language:Python1.6k 16 263308
AlignmentResearch/tuned-lens
Tools for understanding how transformer predictions are built layer-by-layer
Language:Python439 7 5447
lebrice/SimpleParsing
Simple, Elegant, Typed Argument Parsing with argparse
Language:Python438 11 15353
openphilanthropy/unrestricted-adversarial-examples
Contest Proposal and infrastructure for the Unrestricted Adversarial Examples Challenge
Language:Python329 36 5255
sony/ctm
Language:Python240 18 813
justinchiu/openlogprobs
Extract full next-token probabilities via language model APIs
Language:Python230 3 114
ArthurConmy/Automatic-Circuit-Discovery
Language:Jupyter Notebook189 0 1738
GraySwanAI/circuit-breakers
Improving Alignment and Robustness with Circuit Breakers
Language:Jupyter Notebook156 15 1120
wzekai99/DM-Improves-AT
Code for the paper "Better Diffusion Models Further Improve Adversarial Training" (ICML 2023)
Language:Python125 4 216
ethz-spylab/rlhf_trojan_competition
Finding trojans in aligned LLMs. Official repository for the competition hosted at SaTML 2024.
Language:Python107 4 79
MadryLab/datamodels-data
Data for "Datamodels: Predicting Predictions with Training Data"
Language:Python91 7 43
anthropics/sleeper-agents-paper
Contains random samples referenced in the paper "Sleeper Agents: Training Robustly Deceptive LLMs that Persist Through Safety Training".
85 3 110
cgarciae/einop
Language:Python58 4 22
max-andr/adversarial-random-search-gpt4
Adversarial Attacks on GPT-4 via Simple Random Search [Dec 2023]
Language:Jupyter Notebook43 3 01
thestephencasper/everything-you-need
we got you bro
33 1 00
99991/cifar10-fast-simple
Train CIFAR10 to 94% accuracy in a few minutes/seconds. Based on https://github.com/davidcpage/cifar10-fast
Language:Python20 2 14
ml-postech/robust-evaluation-of-diffusion-based-purification
[ICCV 2023 Oral] Official implementation of "Robust Evaluation of Diffusion-Based Adversarial Purification"
Language:Python19 10 21
jplhughes/evals_template
Template for any evals project using LLM apis
Language:Python5 2 01
Shavit-Lab/Sparse-Expansion
Code for the paper "Sparse Expansion and Neuronal Disentanglement."
Language:Python5 1 00
xuwangyin/AT-EBMs
Language:Python4 1 02
GilgameshxZero/xena
Android SVG editor optimized for e-ink note-taking tablets, such as the Onyx Boox series.
Language:Java3 1 00

ed1d1a8d

ed1d1a8d's Stars

xai-org/grok-1

openai/openai-python

pqrs-org/Karabiner-Elements

KindXiaoming/pykan

overleaf/overleaf

astral-sh/rye

PaulJuliusMartinez/jless

chaifeng/ufw-docker

allenai/RL4LMs

TransformerLensOrg/TransformerLens

AlignmentResearch/tuned-lens

lebrice/SimpleParsing

openphilanthropy/unrestricted-adversarial-examples

sony/ctm

justinchiu/openlogprobs

ArthurConmy/Automatic-Circuit-Discovery

GraySwanAI/circuit-breakers

wzekai99/DM-Improves-AT

ethz-spylab/rlhf_trojan_competition

MadryLab/datamodels-data

anthropics/sleeper-agents-paper

cgarciae/einop

max-andr/adversarial-random-search-gpt4

thestephencasper/everything-you-need

99991/cifar10-fast-simple

ml-postech/robust-evaluation-of-diffusion-based-purification

jplhughes/evals_template

Shavit-Lab/Sparse-Expansion

xuwangyin/AT-EBMs

GilgameshxZero/xena