Bulk Labelling Feature
koaning opened this issue · 1 comments
koaning commented
I think it makes sense to add bulk labeling as a feature. Since it is very easy to fetch embedding from all sorts of sources it feels like a logical thing to add. We can use human-learn under the hood.
The API might include something like this.
from whatlies.label import Bulk
engine = Bulk(embset)
# Generate the chart with the selections
# The `show_all` flag will allow you to only draw points that don't have
# a proper `propert_name` attached.
engine.generate_chart(property_name="label", show_all=False)
# Retreive the strings of all (or a subset) of the selected embeddings
engine.list_selection(n=20)
# Attach a property in place on the selected items.
# Note that the property name here needs to correspond with above
engine.label_selection(property_name="label")
koaning commented
Different API.
from whatlies.label import bulk_label
df = bulk_label(
dataf,
text_col,
color=None,
language=Language(**settings),
reducer=Pca(2),
n_examples=10,
label_col_name="label"
)