EleutherAI/delphi

Feature Sorting Tasks

cadentj opened this issue · 0 comments

Not all features are the same, so it doesn't make sense to use the same explainers/scorers on them.

  • Peak finding on activation density, check if there's any signal there.
  • Sort out single token features, decide on what top-k we should determine a feature as single token.
  • Run word embedding/synonym filtering on top activating tokens.
  • Run work embedding similarity on top activating sentences to see if semantically similar sentences are more mono-semantic.