Pinned Repositories
alphabetical_probe
Experimental code which trains 26 linear probes to detect the presence of alphabetic letters in GPT-J token strings, given their embeddings. Exploring the resulting vector arithmetic and its impact on GPT-J spelling abilities
ARENA_2.0-RLHF
Preparing content for the ARENA RLHF day.
DecisionTransformerInterpretability
Interpreting how transformers simulate agents performing RL tasks
SAEDashboard
SAELens
Training Sparse Autoencoders on Language Models
SparseAutoencoderSuperposition
SpellingSAEExperiment
toy_model_interpretability
I'd like to start playing around with toy models to better understand results in recent papers.
TransformerLens
TransformerLens
A library for mechanistic interpretability of GPT-style language models
jbloomAus's Repositories
jbloomAus/SAELens
Training Sparse Autoencoders on Language Models
jbloomAus/DecisionTransformerInterpretability
Interpreting how transformers simulate agents performing RL tasks
jbloomAus/SAEDashboard
jbloomAus/alphabetical_probe
Experimental code which trains 26 linear probes to detect the presence of alphabetic letters in GPT-J token strings, given their embeddings. Exploring the resulting vector arithmetic and its impact on GPT-J spelling abilities
jbloomAus/ARENA_2.0-RLHF
Preparing content for the ARENA RLHF day.
jbloomAus/SparseAutoencoderSuperposition
jbloomAus/SpellingSAEExperiment
jbloomAus/toy_model_interpretability
I'd like to start playing around with toy models to better understand results in recent papers.
jbloomAus/TransformerLens
jbloomAus/arena-v1
jbloomAus/arena-v1-ldn
jbloomAus/ARENA_2.0
I'm teaching ARENA 2.0 and providing students with direction on careers and personal development.
jbloomAus/babyai
BabyAI platform. A testbed for training agents to understand and execute language commands.
jbloomAus/Backwards
jbloomAus/Exploring-2L-SAE
jbloomAus/geom_median
Fast and differentiable geometric median, a multivariate median analogue. Install with `pip install geom-median`
jbloomAus/Minigrid
Simple and easily configurable grid world environments for reinforcement learning
jbloomAus/Module-1
Module 1 - Autodifferentiation
jbloomAus/post--memory-dt-features
jbloomAus/protein-inference
A python package for protein inference in Mass Spectrometric data analysis.
jbloomAus/rust_cli_project
I'm teaching myself Rust.
jbloomAus/rust_text_editor
Learning by doing with Rust. Following along the Hecto tutorial https://www.philippflenker.com/hecto/
jbloomAus/SAE_Bench_Template
jbloomAus/sparse_autoencoder
Sparse Autoencoder for Mechanistic Interpretability