EleutherAI/delphi

`EmbeddingScorer._prepare()` passes arg of wrong type to `examples_to_samples()`

Closed this issue · 4 comments

In

for i, examples in enumerate(record.test):
samples.extend(
examples_to_samples(
examples, # type: ignore
distance=i + 1,
**defaults, # type: ignore
)
)
, when record.test is list[ActivatingExample] (the type expected by annotations), examples is of type ActivatingExample, so when
def examples_to_samples(
examples: list[Example],
tokenizer: PreTrainedTokenizer,
**sample_kwargs,
) -> list[Sample]:
samples = []
for example in examples:
if tokenizer is not None:
text = "".join(tokenizer.batch_decode(example.tokens))
else:
text = "".join(example.tokens)
activations = example.activations.tolist()
samples.append(
Sample(
text=text,
activations=activations,
data=EmbeddingOutput(text=text, **sample_kwargs),
)
)
return samples
runs, it tries to iterate over an ActivatingExample, leading to a TypeError.

I just realized that in

for i, examples in enumerate(record.test):
examples = assert_type(list, examples)
samples.extend(
examples_to_samples(
examples,
distance=i + 1,
**defaults,
)
)
, record.test is expected to be list[list[ActivatingExample]] rather than list[ActivatingExample]. But I think this might be incorrect, as this is due to #99, which fixed type errors, and in the rest of the codebase, record.test is expected to be list[ActivatingExample] (again, expected by type annotations).

If we want both places to expect record.test to be list[ActivatingExample], I can make the changes in my PR and mark it as ready for review.

All these old scores have the wrong types I think, they weren't checked by my type checker for some reason, and we didn't extend tests to them, so they are wrong. Indeed record.test should be a list of ActivatingExample

Ok, I've made the changes to #133. The issue wasn't found by Pyright because of a # type: ignore.

examples, # type: ignore

I think I fixed this on your PR (also the surprisal scorer). There was a bunch of old code that was useless and I end up chaning other things as well so it's not the most clean PR but yea