EleutherAI/delphi
Delphi was the home of a temple to Phoebus Apollo, which famously had the inscription, 'Know Thyself.' This library lets language models know themselves through automated interpretability.
PythonApache-2.0
Issues
- 0
- 2
Got Problem while Running the Command in the README
#152 opened by visionxyz - 4
Simulator scorer not in a working state
#137 opened by luciaquirke - 0
Using "active" as a fuzz_type throws an error
#144 opened by Abzinger - 1
getting started documentation is out of date
#143 opened by skunnavakkam - 3
- 4
`EmbeddingScorer._prepare()` passes arg of wrong type to `examples_to_samples()`
#132 opened by anthonyduong9 - 1
`delphi/clients/openrouter.py` didn't use the `Response` dataclass from `delphi/clients/client.py`
#125 opened by LyzanderAndrylie - 2
Refactor latents and prompt creation
#124 opened by luciaquirke - 15
It'd be helpful to publish `delphi` to PyPI
#116 opened by anthonyduong9 - 1
Decoder vectors / source models for dashboard demo
#115 opened by roek-pnnl - 0
Fix latent_contexts.ipynb and optionally convert it to something other than a jupyter notebook
#102 opened by luciaquirke - 1
Log number of single token features
#87 opened by luciaquirke - 1
- 3
Minor Scorer Bugs
#40 opened by hijohnnylin - 2
- 1
- 0
Reduce latent loader arguments
#36 opened by SrGonao - 1
Figure out type checking with TensorType
#83 opened by SrGonao - 1
Unified autoencoder loader
#34 opened by SrGonao - 1
Add plotting logic for results
#63 opened by cmmcirvin - 1
Maybe remove support for OpenAI SAEs?
#62 opened by norabelrose - 4
- 0
Rename features-> latents, recall->detection
#37 opened by SrGonao - 0
- 1
- 0
- 3
Save cached latents as caching progresses
#38 opened by SrGonao - 1
OS import error
#33 opened by HaydenMM - 0
Support for SAE lens
#35 opened by SrGonao - 4
- 0
Feature Sorting Tasks
#15 opened by cadentj - 0
[Experiments] - Explore recall difficulty
#8 opened by SrGonao - 0
- 0
- 0
- 1
[Experiments] - Score explanations generated by COT and Simple explanations in GPT2
#6 opened by SrGonao - 0
- 2
[Merging] - Make similar scripts to the ones in scripts but using the current code
#5 opened by SrGonao - 1
Quality of Life
#1 opened by SrGonao - 0
- 0
Look at two peaks on activations density
#12 opened by cadentj - 2
- 2