ckkissane

United States

ckkissane's Stars

openai/transformer-debugger
Language:Python4.1k 25 14241
gpu-mode/lectures
Material for gpu-mode lectures
Language:Jupyter Notebook3.3k 48 9338
TransformerLensOrg/TransformerLens
A library for mechanistic interpretability of GPT-style language models
Language:Python1.7k 16 266314
jacobhilton/deep_learning_curriculum
Language model alignment-focused deep learning curriculum
1.3k 17 1110
jbloomAus/SAELens
Training Sparse Autoencoders on Language Models
Language:Jupyter Notebook545 7 120132
EleutherAI/sae
Sparse autoencoders
Language:Python390 7 1451
openai/sparse_autoencoder
Language:Python389 11 1439
callummcdougall/ARENA_3.0
Language:HTML387 8 19236
imbue-ai/cluster-health
Language:Python283 14 838
TransformerLensOrg/CircuitsVis
Mechanistic Interpretability Visualizations using React
Language:Jupyter Notebook212 2 2631
callummcdougall/ARENA_2.0
Resources for skilling up in AI alignment research engineering. Covers basics of deep learning, mechanistic interpretability, and RL.
Language:HTML206 5 580
ai-safety-foundation/sparse_autoencoder
Sparse Autoencoder for Mechanistic Interpretability
Language:Python200 4 4140
HoagyC/sparse_coding
Using sparse coding to find distributed representations used by neural networks.
Language:Jupyter Notebook198 2 428
anthropics/PySvelte
A library for bridging Python and HTML/Javascript (via Svelte) for creating interactive visualizations
Language:Python176 38 035
callummcdougall/sae_vis
Create feature-centric and prompt-centric visualizations for sparse autoencoders (like those from Anthropic's published research).
Language:HTML173 7 2137
likenneth/othello_world
Emergent world representations: Exploring a sequence model trained on a synthetic task
Language:Jupyter Notebook172 5 440
saprmarks/dictionary_learning
Language:Python167 5 741
andyrdt/refusal_direction
Code and results accompanying the paper "Refusal in Language Models Is Mediated by a Single Direction".
Language:Python139 4 628
saprmarks/feature-circuits
Language:Python119 4 524
nrimsky/LM-exp
LLM experiments done during SERI MATS - focusing on activation steering / interpreting activation spaces
Language:Jupyter Notebook82 1 023
wesg52/sparse-probing-paper
Sparse probing paper full code.
Language:Jupyter Notebook52 2 210
EleutherAI/aria
Language:Python42 4 011
wesg52/universal-neurons
Universal Neurons in GPT2 Language Models
Language:Jupyter Notebook27 3 26
callummcdougall/sae_visualizer
Language:HTML24 4 11
jbloomAus/SAEDashboard
Language:Python24 4 43
callummcdougall/TransformerLens-intro
Language:HTML11 2 01
callummcdougall/path_patching
Implementation of path patching & activation patching (will eventually add to TransformerLens).
Language:Python10 2 12
neelnanda-io/Neuroscope
Accompanying codebase for neuroscope.io, a website for displaying max activating dataset examples for language model neurons
Language:Python9 1 05
callummcdougall/CircuitsVis
Mechanistic Interpretability Visualizations using React
Language:Jupyter Notebook32
neelnanda-io/Tiny-Stories-SAEs
Language:Python3 1 00

ckkissane

ckkissane's Stars

openai/transformer-debugger

gpu-mode/lectures

TransformerLensOrg/TransformerLens

jacobhilton/deep_learning_curriculum

jbloomAus/SAELens

EleutherAI/sae

openai/sparse_autoencoder

callummcdougall/ARENA_3.0

imbue-ai/cluster-health

TransformerLensOrg/CircuitsVis

callummcdougall/ARENA_2.0

ai-safety-foundation/sparse_autoencoder

HoagyC/sparse_coding

anthropics/PySvelte

callummcdougall/sae_vis

likenneth/othello_world

saprmarks/dictionary_learning

andyrdt/refusal_direction

saprmarks/feature-circuits

nrimsky/LM-exp

wesg52/sparse-probing-paper

EleutherAI/aria

wesg52/universal-neurons

callummcdougall/sae_visualizer

jbloomAus/SAEDashboard

callummcdougall/TransformerLens-intro

callummcdougall/path_patching

neelnanda-io/Neuroscope

callummcdougall/CircuitsVis

neelnanda-io/Tiny-Stories-SAEs