Assignment/modifying embedding in embeddingset using [] operator
mkaze opened this issue · 3 comments
I was wondering whether we should allow for assignment or modification of an embedding in a EmbeddingSet
instance using [ ]
operator, i.e. __setitem__
method? For example:
embset = EmbeddingSet(foo, bar)
# Change an existing embedding
embset['foo'] = Embedding('foo_new', ...)
embset['foo'] = np.array([1,2,3])
embset['foo'] = embset['foo'] | embset['bar']
embset['foo'] += embset['bar']
# Add new embedding
embset['buz'] = Embedding('buz', ...)
embset['buz'] = embset['foo'] - embset['bar']
I can imagine that this is sensible. For examples, you might want to create a plot that highlights one point with another color. Manually specifying a property might make sense in such a situation.
I want to be a little bit careful though. I'd prefer if the vector
, name
and orig
attributes remain protected. The vector
should never change dimensionality unless the entire set changes. The name
and orig
might alter the representation of the Embedding
in an unintuitive way.
For examples, you might want to create a plot that highlights one point with another color. Manually specifying a property might make sense in such a situation.
That's exactly the scenario I encountered yesterday and came to this question! 😄
I'd prefer if the
vector
,name
andorig
attributes remain protected.
I also have the same opinion (unless a good use case proves me wrong!). Especially, changing the name
attribute might create confusion: when an EmbeddingSet
is created, the keys to access the embeddings are initialized with the value of name
attribute, and I think after any operation this consistency still holds. However, this leads to another question that whether we should allow these:
embset['foo'] = embset['foo'] - embset['bar']
print(embset['foo'].name) # this would be "foo - bar", not "foo"?!
embset['buz'] = Embedding('qux', ...) # name and key differs?!
Or instead for adding new embedding, should we create a separate method to add one Embedding
instance to an EmbeddingSet
instance? Although, there is already one way to do that: embset = embset.merge(EmbeddingSet(buz))
. Alternatively, we can extend merge
to also support a single Embedding
as well.
I'm closing issues because ever since the project moved to my personal account it's been more into maintenance mode than a "active work" mode.