Be able to load spacy.lang.en.English model (and more)
NixBiks opened this issue · 1 comments
I have a pipeline that builds on spacy.lang.en.English
. I replace the tokenizer and add some custom components. Now spacy_streamlit
uses spacy.load
to load models. Is it possible to register my pipeline and be loadable via spacy.load
?
I am aware that I can do nlp.to_disk
on spacy.lang.en.English
with my replaced tokenizer and that I can register my components using entry_points
but I'd rather not have to do nlp.to_disk
(e.g. shouldn't keep that in my git repo and it seems uneccesary!?).
Another alternative is to make spacy.lang.en.English
with my replaced tokenizer as its own language and add that to entry_points
but it feels kinda wrong and then I wouldn't be able to get the lexeme normalization table from spacy-lookups-data
.
I hope it makes sense.
I just realized that I just have to implement a load
method in the root of my package, e.g.
from typing import Iterable
def load(vocab: bool, disable: Iterable[str], exclude: Iterable[str], config):
from spacy.lang.en import English
return English()