jangedoo/jange

filter_pos bug

Opened this issue · 3 comments

Describe the bug
When trying to run ops.text.clean.filter_pos("NOUN", keep_matching_tokens=True), getting: module 'jange.ops.text.clean' has no attribute 'filter_pos'.

When changed to ops.text.clean.pos_filter("NOUN", keep_matching_tokens=True), getting: OSError: [E050] Can't find model 'en_core_web_sm'. It doesn't seem to be a shortcut link, a Python package or a valid path to a data directory.

To Reproduce

From examples:

clusters_ds = ds.apply(
    ops.text.clean.pos_filter("NOUN", keep_matching_tokens=True),
    ops.text.encode.tfidf(max_features=5000, name="tfidf"),
    ops.cluster.minibatch_kmeans(n_clusters=5),
    result_collector=result_collector,
)

you'll need to download spacy's model using python -m spacy download en_core_web_sm

Thanks, that worked! Although now I am getting an error with features_ds = result_collector[clusters_ds.applied_ops.find_by_name("tfidf")] from the example (if I print features_ds, I get a StopIteration error). I can open a separate issue for that.

Thanks, that worked! Although now I am getting an error with features_ds = result_collector[clusters_ds.applied_ops.find_by_name("tfidf")] from the example (if I print features_ds, I get a StopIteration error). I can open a separate issue for that.

Hey did you find a solution ? I have the same error...