dtch1997
Mechanistic interpretability researcher. Interested in interpreting multimodal foundation models
Pinned Repositories
awesome-ml-dev-tools
Collection of development tools for ML engineering or research
cpg-locomotion
Repository implementing CPG-conditioned locomotion
CrowdHuman-dataset-prep
A repository to download and prepare CrowdHuman dataset for training in PyTorch
feature-lens
Visualizing SAE features in terms of their upstream and downstream features
IsaacGymEnvs
AMP implementation for quadruped legged robot in IsaacGymEnvs
quadruped-gym
An OpenAI gym environment for the training of legged robots
rl_cbf
Code accompanying "Value Functions are Control Barrier Functions: Verification of Safe Policies using Control Theory"
sae-probe
Investigating the feasibility of using SAE features as a basis for sparse reconstructions of linear probes
steering-bench
Evaluation suite for steering vectors
token-trace-demo
dtch1997's Repositories
dtch1997/rl_cbf
Code accompanying "Value Functions are Control Barrier Functions: Verification of Safe Policies using Control Theory"
dtch1997/sae-probe
Investigating the feasibility of using SAE features as a basis for sparse reconstructions of linear probes
dtch1997/feature_composition
Experiments on feature composition in toy models and SAEs
dtch1997/repepo
Codebase for comparing Representation Engineering vs baselines on a variety of tasks
dtch1997/sae-eap
Edge attribution patching with SAEs
dtch1997/feature-lens
Visualizing SAE features in terms of their upstream and downstream features
dtch1997/steering-bench
Evaluation suite for steering vectors
dtch1997/advprompter
dtch1997/auto-circuit
A library for efficient patching and automatic circuit discovery.
dtch1997/belief-state-superposition
A repository for training transformers with belief states
dtch1997/circuit-finder
dtch1997/diff-interp
dtch1997/dtch1997.github.io
dtch1997/eindex
My interpretation of what einops indexing would look like (created to work on during my SERI MATS project).
dtch1997/feature-circuits
dtch1997/Gymnasium-Robotics
A collection of robotics simulation environments for reinforcement learning
dtch1997/hacking
dtch1997/jam
Jam - JAX models
dtch1997/protein-model-steering
dtch1997/sae-attrib-lens
dtch1997/sae-dream
Synthetic max-activating examples for SAE features generated with EPO
dtch1997/sae-experiments
dtch1997/SAELens
Training Sparse Autoencoders on Language Models
dtch1997/smol-sae
dtch1997/stock-images
A collection of stock images for doing vision interp
dtch1997/SycophancySteering
Modulating sycophancy in llama-2 via activation steering
dtch1997/token-trace
dtch1997/token-trace-demo
dtch1997/transcoder_circuits
dtch1997/transcoders-slim
A minimal implementation of transcoders