This is the accompanying code to the paper "Towards Automated Circuit Discovery for Mechanistic Interpretability".
- ⚡ To run ACDC, see
acdc/main.py
, or this Colab notebook - 🔧 To see how edit edges in computational graphs in models, see
notebooks/editing_edges.py
or this Colab notebook - ❇️ To look at the abstractions we use to make ACDC, look at our upcoming notebook on them.
This library builds upon the abstractions (HookPoint
s and standardised HookedTransformer
s) from TransformerLens 🔎
First, install the system dependencies for either Mac or Linux.
Then, you need Python 3.8+ and Poetry to install ACDC, like so
git clone git+https://github.com/ArthurConmy/Automatic-Circuit-Discovery.git
cd Automatic-Circuit-Discovery
poetry env use 3.10 # Python 3.10 is recommended but use any Python version >= 3.8
poetry install
On vast.ai machines (and perhaps other machines using Docker containers) you can get setup fast by running poetry config virtualenvs.create false
instead of the poetry env use 3.10
line.
sudo apt-get update && sudo apt-get install libgl1-mesa-glx graphviz build-essential graphviz-dev
You may also need apt-get install python3.x-dev
where x
is your Python version (also see the issue and pygraphviz installation troubleshooting)
On Mac, you need to let pip (inside poetry) know about the path to the Graphviz libraries.
brew install graphviz
export CFLAGS="-I$(brew --prefix graphviz)/include"
export LDFLAGS="-L$(brew --prefix graphviz)/lib"
To reproduce the Pareto Frontier of KL divergences against number of edges for ACDC runs, run python experiments/launch_induction.py
. Similarly, python experiments/launch_sixteen_heads.py
and python subnetwork_probing/train.py
were used to generate individual data points for the other methods, using the CLI help. All these three commands can produce wandb runs. We use notebooks/roc_plot_generator.py
to process data from wandb runs into JSON files (see experiments/results/plots_data/Makefile
for the commands) and notebooks/plotly_roc_plot.py
to produce plots from these JSON files.
From the root directory, run
pytest -vvv -m "not slow"
This will only select tests not marked as slow
. These tests take a long time, and are good to run occasionally, but
not every time.
You can run the slow tests with
pytest -s -m slow
We welcome issues where the code is unclear!
If you make a PR, make sure you run
chmod +x experiments/make_notebooks.sh
./experiments/make_notebooks.sh
And check that no errors arise. It is essential that the notebooks converted here consist only of #%% [markdown]
markdown-only cells, and #%%
cells with code.
If you use ACDC, please reach out! You can reference the work as follows:
@misc{conmy2023automated,
title={Towards Automated Circuit Discovery for Mechanistic Interpretability},
author={Arthur Conmy and Augustine N. Mavor-Parker and Aengus Lynch and Stefan Heimersheim and Adri{\`a} Garriga-Alonso},
year={2023},
eprint={2304.14997},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
[ x ] Make TransformerLens
install be Neel's code not my PR
[ x ] Add hook_mlp_in
to TransformerLens
and delete hook_resid_mid
(and test to ensure no bad things?)
[ x ] Delete arthur-try-merge-tl
references from the repo
[ ] Make notebook on abstractions
[ ? ] Fix huge edge sizes in Induction Main example and change that occurred
[ x ] Find a better way to deal with the versioning on the Colabs installs...
[ ] Neuron-level experiments
[ ] Position-level experiments
[ x ] tracr
and other dependencies better managed
[ ? ] Make SP tests work (lots outdated so skipped) - and check SubnetworkProbing installs properly (no init.pys !!!)
[ ? ] Make the 9 tests also failing on TransformerLens-main pass
[ x ] Remove Codebase under construction