koaning/whatlies

Addition of useful Verbs

Closed this issue · 1 comments

It might make sense to think about the EmbeddingSet as if it is a DataFrame. There are some operations missing.

  • left_join, inner_join take the embeddings of two sets and join them together. If a word from a set is missing then we should fill it in with zeros I guess or maybe throw an error?
  • concat currently we have a method called "merge" for this but concatenate feels like a better term.
  • merge might be a nice verb for allowing a pandas dataframe to be merged. we could take a column containing names and a list of columns that can be added to the embedding as properties.
  • sort would be really nice. you can cluster embeddings and sort them before using plot_distances
  • reverse because why not
  • drop allows you to drop some names from the embeddingset. we should allow for a copy setting to allow the method to be used inplace. My preference is to have inplace=False as a default because it is easier to reason about and it allows for chaining.

I'm closing this issue in favour of making many small, specific, issues.