JohnSnowLabs/spark-nlp-workshop

Spacy sentence splitting results in notebook do not reflect current Spacy version

Closed this issue · 1 comments

Description

The notebook on Sentence Detector DL contains some comparisons with Spacy:
https://github.com/JohnSnowLabs/spark-nlp-workshop/blob/master/tutorials/Certification_Trainings/Public/9.SentenceDetectorDL.ipynb

I'm running Spacy 3.0.8 and I get very different output than what's reported in the notebook.

Running your example below, I find that the sentence boundaries are detected correctly by Spacy.

image

Steps to Reproduce

You can reproduce the output above by running the example using Spacy 3.0.8 and the same model:

nlp = spacy.load("en_core_web_sm")

I don't know how to reproduce the output in the current version of the notebook. Possibly you used an older version of Spacy?

Your Environment

  • Spark-NLP version: Not relevant
  • Apache Spark version: Not relevant
  • Operating System and version: Mac OS 12.3
  • Deployment (Docker, Jupyter, Scala, pip, conda, etc.): pip

On Colab, Spacy version is still 2.2.4 and the it works as we showed in the notebook. When Colab updates Spacy version, we can consider to update. Thanks for reporting!