filter_pos bug
Opened this issue · 3 comments
Describe the bug
When trying to run ops.text.clean.filter_pos("NOUN", keep_matching_tokens=True)
, getting: module 'jange.ops.text.clean' has no attribute 'filter_pos'
.
When changed to ops.text.clean.pos_filter("NOUN", keep_matching_tokens=True)
, getting: OSError: [E050] Can't find model 'en_core_web_sm'. It doesn't seem to be a shortcut link, a Python package or a valid path to a data directory.
To Reproduce
From examples:
clusters_ds = ds.apply(
ops.text.clean.pos_filter("NOUN", keep_matching_tokens=True),
ops.text.encode.tfidf(max_features=5000, name="tfidf"),
ops.cluster.minibatch_kmeans(n_clusters=5),
result_collector=result_collector,
)
you'll need to download spacy's model using python -m spacy download en_core_web_sm
Thanks, that worked! Although now I am getting an error with features_ds = result_collector[clusters_ds.applied_ops.find_by_name("tfidf")]
from the example (if I print features_ds
, I get a StopIteration
error). I can open a separate issue for that.
Thanks, that worked! Although now I am getting an error with
features_ds = result_collector[clusters_ds.applied_ops.find_by_name("tfidf")]
from the example (if I printfeatures_ds
, I get aStopIteration
error). I can open a separate issue for that.
Hey did you find a solution ? I have the same error...