JohnSnowLabs/spark-nlp-workshop

CoNLL reader function setting document = sentence

Dekermanjian opened this issue · 1 comments

The CoNLL().readDataset() is not working as expected. The document is equal to the sentence and is not being built by the -DOCSTART- -X- -X- O flag.

I am not sure if this issue will affect the training of a NERDL model. However, it makes it impossible to refer back to a specific document (not sentence) where the entity is detected. To reproduce the example you can go through the example notebook provided here : https://github.com/JohnSnowLabs/spark-nlp-workshop/blob/master/jupyter/training/english/dl-ner/ner_dl.ipynb

and inspect the columns after reading the CoNLL data.

https://github.com/JohnSnowLabs/spark-nlp-workshop/blob/master/jupyter/training/english/dl-ner/ner_dl.ipynb

Your Environment

  • Spark-NLP version: 3.1.2
  • Apache Spark version: 3.1.2
  • Operating System and version: MacOS 11.4
  • Deployment (local Jupyter notebook):

Not the right repo for this issue.