ckkissane

United States

Pinned Repositories

attention-output-saes
Code to reproduce key results for "Interpreting Attention Layer Outputs with Sparse Autoencoders"
Language:HTML5 2 02
base-models-refuse
Code to reproduce key results accompanying "Base LLMs refuse too"
Language:Python3 1 01
crosscoder-model-diff-replication
Open source replication of Anthropic's Crosscoders for Model Diffing
Language:Python219
deep_learning_curriculum
Language model alignment-focused deep learning curriculum
Language:Jupyter Notebook4 0 01
rlhf-shakespeare
Shakespeare transformer fine-tuned to generate positive sentiment samples using RLHF
Language:Python10 1 00
sae-dataset-dependence
Language:Python7 1 00
sae-transfer
Code to reproduce key results accompanying "SAEs (usually) Transfer Between Base and Chat Models"
Language:Python9 1 02
sae_vis
Create feature-centric and prompt-centric visualizations for sparse autoencoders (like those from Anthropic's published research).
Language:HTML24
shakespeare-transformer
Decoder only transformer trained on the works of Shakespeare
Language:Python3 1 00
TransformerLens
Language:Jupyter Notebook0 0 00

ckkissane's Repositories

ckkissane/crosscoder-model-diff-replication
Open source replication of Anthropic's Crosscoders for Model Diffing
Language:Python219
ckkissane/rlhf-shakespeare
Shakespeare transformer fine-tuned to generate positive sentiment samples using RLHF
Language:Python10 1 00
ckkissane/sae-transfer
Code to reproduce key results accompanying "SAEs (usually) Transfer Between Base and Chat Models"
Language:Python9 1 02
ckkissane/sae-dataset-dependence
Language:Python7 1 00
ckkissane/attention-output-saes
Code to reproduce key results for "Interpreting Attention Layer Outputs with Sparse Autoencoders"
Language:HTML5 2 02
ckkissane/deep_learning_curriculum
Language model alignment-focused deep learning curriculum
Language:Jupyter Notebook4 0 01
ckkissane/base-models-refuse
Code to reproduce key results accompanying "Base LLMs refuse too"
Language:Python3 1 01
ckkissane/shakespeare-transformer
Decoder only transformer trained on the works of Shakespeare
Language:Python3 1 00
ckkissane/sae_vis
Create feature-centric and prompt-centric visualizations for sparse autoencoders (like those from Anthropic's published research).
Language:HTML24
ckkissane/optimizers-from-scratch
Implementations of popular optimizers in Pytorch
Language:Python1 1 00
ckkissane/1L-Sparse-Autoencoder
Language:Python0 0 00
ckkissane/ARENA_2.0
Resources for skilling up in AI alignment research engineering. Covers basics of deep learning, mechanistic interpretability, and RL.
Language:Python0 0 00
ckkissane/TransformerLens
Language:Jupyter Notebook0 0 00
ckkissane/attention-head-wiki
Language:HTML
ckkissane/attn-sae-gelu-2l-viz
Language:HTML1 0
ckkissane/attn-sae-gpt2-small-viz
Language:HTML1 01
ckkissane/august-monthly-challenge
Language:HTML
ckkissane/CircuitsVis
Mechanistic Interpretability Visualizations using React
ckkissane/induction-heads-transformer-lens
Replication of induction heads phase change results using TransformerLens and PyTorch
Language:Jupyter Notebook2 0
ckkissane/jax
Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more
Language:Python0 0
ckkissane/mech-interp-practice
Collection of mechanistic interpretability practice problems with accompanying tutorials
Language:Jupyter Notebook
ckkissane/micrograd-tensor
Extension of micrograd. Uses Tensors instead of Values
Language:Python1 0
ckkissane/minitorch
The full minitorch student suite.
Language:Python0 0
ckkissane/Neuroscope
Accompanying codebase for neuroscope.io, a website for displaying max activating dataset examples for language model neurons
ckkissane/othello_world
Emergent world representations: Exploring a sequence model trained on a synthetic task
Language:Jupyter Notebook0 0
ckkissane/pytorch
Tensors and Dynamic neural networks in Python with strong GPU acceleration
Language:C++
ckkissane/sae_visualizer
Language:HTML0 0
ckkissane/SAELens
Training Sparse Autoencoders on Language Models
Language:HTML0 0
ckkissane/sparse_autoencoder
Sparse Autoencoder for Mechanistic Interpretability
Language:Python0 0
ckkissane/sparse_coding
Using sparse coding to find distributed representations used by neural networks.
Language:Jupyter Notebook0 0