Pinned Repositories
deep_learning_curriculum
Language model alignment-focused deep learning curriculum
TransformerLens
rlhf-shakespeare
Shakespeare transformer fine-tuned to generate positive sentiment samples using RLHF
sae-transfer
Code to reproduce key results accompanying "SAEs (usually) Transfer Between Base and Chat Models"
attention-output-saes
Code to reproduce key results for "Interpreting Attention Layer Outputs with Sparse Autoencoders"
shakespeare-transformer
Decoder only transformer trained on the works of Shakespeare
base-models-refuse
Code to reproduce key results accompanying "Base LLMs refuse too"
optimizers-from-scratch
Implementations of popular optimizers in Pytorch
1L-Sparse-Autoencoder
ARENA_2.0
Resources for skilling up in AI alignment research engineering. Covers basics of deep learning, mechanistic interpretability, and RL.
ckkissane's Repositories
ckkissane/base-models-refuse
Code to reproduce key results accompanying "Base LLMs refuse too"
ckkissane/attention-output-saes
Code to reproduce key results for "Interpreting Attention Layer Outputs with Sparse Autoencoders"
ckkissane/sae-transfer
Code to reproduce key results accompanying "SAEs (usually) Transfer Between Base and Chat Models"
ckkissane/SAELens
Training Sparse Autoencoders on Language Models
ckkissane/TransformerLens
ckkissane/attn-sae-gpt2-small-viz
ckkissane/attn-sae-gelu-2l-viz
ckkissane/sparse_autoencoder
Sparse Autoencoder for Mechanistic Interpretability
ckkissane/sae_visualizer
ckkissane/1L-Sparse-Autoencoder
ckkissane/mech-interp-practice
Collection of mechanistic interpretability practice problems with accompanying tutorials
ckkissane/august-monthly-challenge
ckkissane/sparse_coding
Using sparse coding to find distributed representations used by neural networks.
ckkissane/ARENA_2.0
Resources for skilling up in AI alignment research engineering. Covers basics of deep learning, mechanistic interpretability, and RL.
ckkissane/CircuitsVis
Mechanistic Interpretability Visualizations using React
ckkissane/othello_world
Emergent world representations: Exploring a sequence model trained on a synthetic task
ckkissane/attention-head-wiki
ckkissane/Neuroscope
Accompanying codebase for neuroscope.io, a website for displaying max activating dataset examples for language model neurons
ckkissane/induction-heads-transformer-lens
Replication of induction heads phase change results using TransformerLens and PyTorch
ckkissane/micrograd-tensor
Extension of micrograd. Uses Tensors instead of Values
ckkissane/shakespeare-transformer
Decoder only transformer trained on the works of Shakespeare
ckkissane/rlhf-shakespeare
Shakespeare transformer fine-tuned to generate positive sentiment samples using RLHF
ckkissane/deep_learning_curriculum
Language model alignment-focused deep learning curriculum
ckkissane/pytorch
Tensors and Dynamic neural networks in Python with strong GPU acceleration
ckkissane/minitorch
The full minitorch student suite.
ckkissane/optimizers-from-scratch
Implementations of popular optimizers in Pytorch
ckkissane/jax
Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more
ckkissane/numpy
The fundamental package for scientific computing with Python.