openai/automated-interpretability

More unified dataset

shayneoneill opened this issue · 3 comments

Hi!

Spectacular work here folks. Is there any plan to release a more unified dataset, as in rather than having to request every neuron on every layer, downloading a single monolithic file that could be, say, indexed in a database for searcheability, or whatever?

This would be very useful for guiding alignment efforts and generic research on how GPTs internal ontology works. (Ie loading the data into Neo4J and applying some good old fashion graph-theory number crunching to try and work out whats up with the nodes GPT4 couldnt make heads and tails of (Ie are they part of the deep structure of its linguistic thinking, are they secondary nodes to superpositions, etc. My intuition tells me these are solveable)

we're not planning to do this but anyone is feel free to try such things! agree it could be exciting

diziet commented

Jeff @WuTheFWasThat , https://openaipublic.blob.core.windows.net/neuron-explainer/neuron-viewer/index.html#/layers/31/neurons/1594 is missing the explanations for this neuron:

https://openaipublic.blob.core.windows.net/neuron-explainer/data/explanations/31/1593.jsonl

whoops, good to see you alex. not sure what went wrong but we likely won't fix