koaning/whatlies

Assignment/modifying embedding in embeddingset using [] operator

mkaze opened this issue · 3 comments

mkaze commented

I was wondering whether we should allow for assignment or modification of an embedding in a EmbeddingSet instance using [ ] operator, i.e. __setitem__ method? For example:

embset = EmbeddingSet(foo, bar)

# Change an existing embedding
embset['foo'] = Embedding('foo_new', ...)
embset['foo'] = np.array([1,2,3])
embset['foo'] = embset['foo'] | embset['bar']
embset['foo'] += embset['bar']

# Add new embedding
embset['buz'] = Embedding('buz', ...)
embset['buz'] = embset['foo'] - embset['bar']

I can imagine that this is sensible. For examples, you might want to create a plot that highlights one point with another color. Manually specifying a property might make sense in such a situation.

I want to be a little bit careful though. I'd prefer if the vector, name and orig attributes remain protected. The vector should never change dimensionality unless the entire set changes. The name and orig might alter the representation of the Embedding in an unintuitive way.

mkaze commented

For examples, you might want to create a plot that highlights one point with another color. Manually specifying a property might make sense in such a situation.

That's exactly the scenario I encountered yesterday and came to this question! 😄

I'd prefer if the vector, name and orig attributes remain protected.

I also have the same opinion (unless a good use case proves me wrong!). Especially, changing the name attribute might create confusion: when an EmbeddingSet is created, the keys to access the embeddings are initialized with the value of name attribute, and I think after any operation this consistency still holds. However, this leads to another question that whether we should allow these:

embset['foo'] = embset['foo'] - embset['bar']
print(embset['foo'].name)  # this would be "foo - bar", not "foo"?!

embset['buz'] = Embedding('qux', ...)   # name and key differs?!

Or instead for adding new embedding, should we create a separate method to add one Embedding instance to an EmbeddingSet instance? Although, there is already one way to do that: embset = embset.merge(EmbeddingSet(buz)). Alternatively, we can extend merge to also support a single Embedding as well.

I'm closing issues because ever since the project moved to my personal account it's been more into maintenance mode than a "active work" mode.