EleutherAI/delphi

[Bug]: ClassifierOutput's 'prediction' has incorrect type

hijohnnylin opened this issue · 4 comments

The ClassifierOutput's prediction type shows bool, but when it is used, it's assigned an int of either -1, 0, or 1. The -1 case I think means that there was some error in predicting.

We should decide on:

  1. Make it a bool or not
  2. What to do with the error case (consider it false? maybe create a specific type with 3 states?)

Would be good for the solution to be somewhat backward compatible.

https://github.com/EleutherAI/sae-auto-interp/blob/3659ff3bfefbe2628d37484e5bcc0087a5b10a27/sae_auto_interp/scorers/classifier/sample.py#L32

I don't think considering it false is good because it will change the score (where now I just filter when prediction is -1).
It is not the prettiest, but we could switch its type to a int - and have the -1 explicitly stated to be an error option?

Hmm, I think that might be a bit confusing. In Python any non-zero number is "true-ish", so if you run if -1 then it will evaluate as true.

We've updated the error value to None and the ClassifierOutput prediction type to bool | None, let me know if you have any issues with the updated library and I'll resolve them ASAP.

I'm closing this and Johnny can make a new one if anything comes up