koaning/whatlies

Bulk Labelling Feature

koaning opened this issue · 1 comments

I think it makes sense to add bulk labeling as a feature. Since it is very easy to fetch embedding from all sorts of sources it feels like a logical thing to add. We can use human-learn under the hood.

The API might include something like this.

from whatlies.label import Bulk

engine = Bulk(embset)

# Generate the chart with the selections
# The `show_all` flag will allow you to only draw points that don't have 
# a proper `propert_name` attached.
engine.generate_chart(property_name="label", show_all=False)

# Retreive the strings of all (or a subset) of the selected embeddings
engine.list_selection(n=20)

# Attach a property in place on the selected items.
# Note that the property name here needs to correspond with above
engine.label_selection(property_name="label")

Different API.

from whatlies.label import bulk_label

df = bulk_label(
  dataf, 
  text_col, 
  color=None, 
  language=Language(**settings), 
  reducer=Pca(2), 
  n_examples=10, 
  label_col_name="label"
)