explosion/spacy-streamlit

Be able to load spacy.lang.en.English model (and more)

NixBiks opened this issue · 1 comments

I have a pipeline that builds on spacy.lang.en.English. I replace the tokenizer and add some custom components. Now spacy_streamlit uses spacy.load to load models. Is it possible to register my pipeline and be loadable via spacy.load?

I am aware that I can do nlp.to_disk on spacy.lang.en.English with my replaced tokenizer and that I can register my components using entry_points but I'd rather not have to do nlp.to_disk (e.g. shouldn't keep that in my git repo and it seems uneccesary!?).

Another alternative is to make spacy.lang.en.English with my replaced tokenizer as its own language and add that to entry_points but it feels kinda wrong and then I wouldn't be able to get the lexeme normalization table from spacy-lookups-data.

I hope it makes sense.

I just realized that I just have to implement a load method in the root of my package, e.g.

from typing import Iterable


def load(vocab: bool, disable: Iterable[str], exclude: Iterable[str], config):
    from spacy.lang.en import English

    return English()