Feature Sorting Tasks
cadentj opened this issue · 0 comments
cadentj commented
Not all features are the same, so it doesn't make sense to use the same explainers/scorers on them.
- Peak finding on activation density, check if there's any signal there.
- Sort out single token features, decide on what top-k we should determine a feature as single token.
- Run word embedding/synonym filtering on top activating tokens.
- Run work embedding similarity on top activating sentences to see if semantically similar sentences are more mono-semantic.