Lexemes are unhashable (v0.101.0)

Question

Lexemes are unhashable (v0.101.0)

bwegge opened this issue 9 years ago · 6 comments

When I try to add Lexemes to a set or dict, it fails since Lexemes are unhashable:

cat = nlp.vocab['cat']
dog = nlp.vocab['dog']
my_animals = {cat, dog}

Traceback (most recent call last):

  File "<ipython-input-30-8ffec97fae23>", line 1, in <module>
    my_animals = {cat, dog}

TypeError: unhashable type: 'spacy.lexeme.Lexeme'

Maybe lexeme.orth can be used (together with lexeme.lang) as hash value?

Another funny observation is that looking up the same word multiple times through nlp.vocab[word] produces Lexemes at different addresses (although comparison works thanks to the newly implemented rich comparison):

nlp.vocab['cat']
Out[17]: <spacy.lexeme.Lexeme at 0xe865401e10>

nlp.vocab['cat']
Out[18]: <spacy.lexeme.Lexeme at 0xe865401d80>

Answer 1 · 2016-05-12T08:41:18.000Z

To save memory, the Lexeme class is a wrapper around the LexemeC struct. So the Python objects are indeed created afresh each time. You can see the implementation here: https://github.com/spacy-io/spaCy/blob/master/spacy/lexeme.pyx#L31

Adding a __hash__ method is a good idea though. Will do.

Answer 2 · 2016-05-12T08:54:39.000Z

Sounds reasonable, thanks for the explanation!

Answer 3 · 2016-05-15T22:02:05.000Z

Is there a workaround for this in the meantime? I'm new to NLP and trying to follow this guide, specifically the part where it mentions word vector representations.

Answer 4 · 2016-07-12T13:38:29.000Z

@lylebrown
Replace the curly braces ({ }) with square brackets ([ ]) in the following line:

allWords = list({w for w in parser.vocab if w.has_vector and w.orth_.islower() and w.lower_ != "nasa"})

Answer 5 · 2016-07-12T14:24:11.000Z

Btw the line should probably be:

allWords = [w for w in parser.vocab if w.has_vector and w.is_lower and w.lower_ != "nasa"]

The old .repvec property is now named .vector, too.

The __hash__ method will be there in the next release.

Answer 6 · 2018-05-09T11:12:03.000Z

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.