Spacy sentence splitting results in notebook do not reflect current Spacy version

Question

Spacy sentence splitting results in notebook do not reflect current Spacy version

Closed this issue 2 years ago · 1 comments

Description

The notebook on Sentence Detector DL contains some comparisons with Spacy:
https://github.com/JohnSnowLabs/spark-nlp-workshop/blob/master/tutorials/Certification_Trainings/Public/9.SentenceDetectorDL.ipynb

I'm running Spacy 3.0.8 and I get very different output than what's reported in the notebook.

Running your example below, I find that the sentence boundaries are detected correctly by Spacy.

Steps to Reproduce

You can reproduce the output above by running the example using Spacy 3.0.8 and the same model:

nlp = spacy.load("en_core_web_sm")

I don't know how to reproduce the output in the current version of the notebook. Possibly you used an older version of Spacy?

Your Environment

Spark-NLP version: Not relevant
Apache Spark version: Not relevant
Operating System and version: Mac OS 12.3
Deployment (Docker, Jupyter, Scala, pip, conda, etc.): pip

Answer 1 · 2022-06-17T10:34:15.000Z

On Colab, Spacy version is still 2.2.4 and the it works as we showed in the notebook. When Colab updates Spacy version, we can consider to update. Thanks for reporting!