I can't fit the pipeline for RoBerta For Sequence Classification

Question

I can't fit the pipeline for RoBerta For Sequence Classification

kareemgamalmahmoud opened this issue 2 years ago · 1 comments

kareemgamalmahmoud commented 2 years ago

Description

When i try to run (HuggingFace in Spark NLP - RoBertaForSequenceClassification) notebook with colab i always face issues with the last fitting cell with pipline

Error 1 :
NameError: name 'Pipeline' is not defined
And after importing the pipline ( from pyspark.ml import Pipeline )

Then Error 2 :

IllegalArgumentException: requirement failed: Wrong or missing inputCols annotators in REGEX_TOKENIZER_cfae21e0e52f.

Current inputCols: doc ument. Dataset's columns:
(column_name=text,is_nlp_annotator=false)
(column_name=document,is_nlp_annotator=true,type=document).
Make sure such annotators exist in your pipeline, with the right output names and that they have following annotator types: document

I don't know hot to solve !!!

Answer 1 · 2022-08-27T14:49:39.000Z

Hi @kareemgamalmahmoud,
There was a typo in this notebook a missing import line as you mentioned. You can use it now without any issues. Thanks for reporting.

https://github.com/JohnSnowLabs/spark-nlp-workshop/blob/master/jupyter/transformers/HuggingFace%20in%20Spark%20NLP%20-%20RoBertaForSequenceClassification.ipynb