spaCy integration has no `.pipe()` method, hence will fallback to individual `.call()`
davidberenstein1957 opened this issue · 1 comments
davidberenstein1957 commented
Not sure what works better during inference (individual sentences or longer segments in larger batches, but maybe something like this could work.
def pipe(self, stream, batch_size=128, include_sent=None):
"""
predict the class for a spacy Doc stream
Args:
stream (Doc): a spacy doc
Returns:
Doc: spacy doc with spanmarker entities
"""
if isinstance(stream, str):
stream = [stream]
if not isinstance(stream, types.GeneratorType):
stream = self.nlp.pipe(stream, batch_size=batch_size)
for docs in util.minibatch(stream, size=batch_size):
batch_results = self.model.predict(docs)
for doc, prediction in zip(docs, batch_results):
yield self.post_process_batch(doc, prediction)
tomaarsen commented
You're probably right, this would definitely be more efficient. I'll throw it on my todo list.