firstuserhere
Taking apart neural networks and putting them back together for a living. Personal website: https://kunvarthaman.com
firstuserhere's Stars
Jazhyc/llm-sandbag-activation-steering
roedoejet/AnyLanguage-Word-Guessing-Game
A word guessing game that can be modified and translated to your language!
SambhavG/dine
OlineRanum/Ponita_SLR
Fast, Expressive SE(n) Equivariant Networks through Weight-Sharing in Position-Orientation Space.
sfcompute/tinynarrations
A synthetic story narration dataset to study small audio LMs.
michaelneuper/hugo-texify3
A LaTeX-style hugo theme with the gruvbox color scheme for personal blogging
randomaccess2023/MG2023
kakaobrain/rq-vae-transformer
The official implementation of Autoregressive Image Generation using Residual Quantization (CVPR '22)
facebookresearch/llm-transparency-tool
LLM Transparency Tool (LLM-TT), an open-source interactive toolkit for analyzing internal workings of Transformer-based language models. *Check out demo at* https://huggingface.co/spaces/facebook/llm-transparency-tool-demo
YannDubs/disentangling-vae
Experiments for understanding disentanglement in VAE latent representations
1Konny/Beta-VAE
Pytorch implementation of β-VAE
neverix/saex
SAEs in Jax
kronusaturn/lw2-viewer
An alternative frontend for LessWrong 2.0
PAIR-code/lit
The Learning Interpretability Tool: Interactively analyze ML models to understand their behavior in an extensible and framework agnostic interface.
JohnVinyard/matching-pursuit
This repository contains research and experiments aimed at producing sparse, interpretable representations of audio.
evanhanders/superposition-geometry-toys
Experiments for running toy models of superposition as in Anthropic's 2022 paper. These experiments focus on superposition of composed features.
shacharKZ/VISIT-Visualizing-Transformers
saprmarks/feature-circuits
Nix07/finetuning
This repository contains the code used for the experiments in the paper "Fine-Tuning Enhances Existing Mechanisms: A Case Study on Entity Tracking".
callummcdougall/sae_vis
Create feature-centric and prompt-centric visualizations for sparse autoencoders (like those from Anthropic's published research).
fletchel/aisc_oocl_experiments
experiments trying to elicit out of context learning when training a transformer on a simple task
thestephencasper/everything-you-need
we got you bro
microsoft/generative-ai-for-beginners
18 Lessons, Get Started Building with Generative AI 🔗 https://microsoft.github.io/generative-ai-for-beginners/
amlweems/xzbot
notes, honeypot, and exploit demo for the xz backdoor (CVE-2024-3094)
Baidicoot/sae_alternatives
openai/grok
StavC/ComPromptMized
ComPromptMized: Unleashing Zero-click Worms that Target GenAI-Powered Applications
openai/transformer-debugger
lucidrains/x-transformers
A simple but complete full-attention transformer with a set of promising experimental features from various papers
typeling1578/Year-Progress-Bar