Pinned Repositories
acdc-adria
algebraic_value_editing
Experiments testing the algebraic value-editing conjecture (AVEC) on GPT-2 models
Automatic-Circuit-Discovery
CoqLegion
A partial formalization of the Legion type system in Coq
othello-gpt-ideas
Submission to Neel Nanda's 2022 SERI MATS stream.
sae-enhanced-cd
Replication of the paper "Sparse Autoencoders Enable Scalable and Reliable Circuit Identification in Language Models" (https://arxiv.org/pdf/2405.12522)
Sledgehammer
A code-golf language written in Mathematica
Spacechem-in-SAT
tkwa's Repositories
tkwa/Sledgehammer
A code-golf language written in Mathematica
tkwa/CoqLegion
A partial formalization of the Legion type system in Coq
tkwa/othello-gpt-ideas
Submission to Neel Nanda's 2022 SERI MATS stream.
tkwa/sae-enhanced-cd
Replication of the paper "Sparse Autoencoders Enable Scalable and Reliable Circuit Identification in Language Models" (https://arxiv.org/pdf/2405.12522)
tkwa/Spacechem-in-SAT
tkwa/acdc-adria
tkwa/algebraic_value_editing
Experiments testing the algebraic value-editing conjecture (AVEC) on GPT-2 models
tkwa/Automatic-Circuit-Discovery
tkwa/catastrophic-goodhart
Plots and empirical results for Catastrophic Goodhart https://www.lesswrong.com/s/6rhjdbnEXoek4YiH7
tkwa/cislate
tkwa/exist-mood-import
Import scripts for existing mood tracking app data
tkwa/feitzin.github.io
tkwa/legion
The Legion Parallel Programming System
tkwa/tkwa.github.io
tkwa/iit
A replication and extension of the paper "Inducing Causal Structure for Interpretable Neural Networks" by Atticus Geiger
tkwa/katago_retarget
Retarget KataGo to output the worst move by flipping activations.
tkwa/nonsurrounding-polyomino
Finding a polyomino that cannot surround a 1x1 square, using the ORTools SAT solver.
tkwa/notifications-demo
tkwa/OpenRLHF
An Easy-to-use, Scalable and High-performance RLHF Framework (70B+ PPO Full Tuning & Iterative DPO & LoRA & Mixtral)
tkwa/ShortcutBadger
An Android library supports badge notification like iOS in Samsung, LG, Sony and HTC launchers.
tkwa/tracr
tkwa/turntrout-plots
Data files from Alex Turner's experiments and posts