tomaarsen/SpanMarkerNER

spaCy integration has no `.pipe()` method, hence will fallback to individual `.call()`

davidberenstein1957 opened this issue · 1 comments

Not sure what works better during inference (individual sentences or longer segments in larger batches, but maybe something like this could work.

    def pipe(self, stream, batch_size=128, include_sent=None):
        """
        predict the class for a spacy Doc stream

        Args:
            stream (Doc): a spacy doc

        Returns:
            Doc: spacy doc with spanmarker entities
        """
        if isinstance(stream, str):
            stream = [stream]

        if not isinstance(stream, types.GeneratorType):
            stream = self.nlp.pipe(stream, batch_size=batch_size)

        for docs in util.minibatch(stream, size=batch_size):
            batch_results = self.model.predict(docs)

            for doc, prediction in zip(docs, batch_results):
                yield self.post_process_batch(doc, prediction)

You're probably right, this would definitely be more efficient. I'll throw it on my todo list.