JohnSnowLabs/spark-nlp-workshop

Error downloading pretrained pipeline

csyhuang opened this issue · 6 comments

I'm running the jupyter notebook with the Docker, but when executing

pipeline = PretrainedPipeline('explain_document_dl')

there is a No such file or directory error. The error messages are included below:

---------------------------------------------------------------------------
Py4JJavaError                             Traceback (most recent call last)
<timed exec> in <module>

/usr/lib/python3.6/site-packages/sparknlp/pretrained.py in __init__(self, name, lang, remote_loc)
     28 
     29     def __init__(self, name, lang='en', remote_loc=None):
---> 30         self.model = ResourceDownloader().downloadPipeline(name, lang, remote_loc)
     31         self.light_model = LightPipeline(self.model)
     32 

/usr/lib/python3.6/site-packages/sparknlp/pretrained.py in downloadPipeline(name, language, remote_loc)
     16     @staticmethod
     17     def downloadPipeline(name, language, remote_loc=None):
---> 18         j_obj = _internal._DownloadPipeline(name, language, remote_loc).apply()
     19         jmodel = JavaModel(j_obj)
     20         return jmodel

/usr/lib/python3.6/site-packages/sparknlp/internal.py in __init__(self, name, language, remote_loc)
     63     def __init__(self, name, language, remote_loc):
     64         super(_DownloadPipeline, self).__init__("com.johnsnowlabs.nlp.pretrained.PythonResourceDownloader.downloadPipeline")
---> 65         self._java_obj = self._new_java_obj(self._java_obj, name, language, remote_loc)
     66 
     67 

/usr/lib/python3.6/site-packages/pyspark/ml/wrapper.py in _new_java_obj(java_class, *args)
     65             java_obj = getattr(java_obj, name)
     66         java_args = [_py2java(sc, arg) for arg in args]
---> 67         return java_obj(*java_args)
     68 
     69     @staticmethod

/usr/lib/python3.6/site-packages/py4j/java_gateway.py in __call__(self, *args)
   1255         answer = self.gateway_client.send_command(command)
   1256         return_value = get_return_value(
-> 1257             answer, self.gateway_client, self.target_id, self.name)
   1258 
   1259         for temp_arg in temp_args:

/usr/lib/python3.6/site-packages/pyspark/sql/utils.py in deco(*a, **kw)
     61     def deco(*a, **kw):
     62         try:
---> 63             return f(*a, **kw)
     64         except py4j.protocol.Py4JJavaError as e:
     65             s = e.java_exception.toString()

/usr/lib/python3.6/site-packages/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name)
    326                 raise Py4JJavaError(
    327                     "An error occurred while calling {0}{1}{2}.\n".
--> 328                     format(target_id, ".", name), value)
    329             else:
    330                 raise Py4JError(

Py4JJavaError: An error occurred while calling z:com.johnsnowlabs.nlp.pretrained.PythonResourceDownloader.downloadPipeline.
: java.lang.UnsatisfiedLinkError: /tmp/tensorflow_native_libraries-1553289340774-0/libtensorflow_jni.so: Error loading shared library ld-linux-x86-64.so.2: No such file or directory (needed by /tmp/tensorflow_native_libraries-1553289340774-0/libtensorflow_jni.so)
	at java.lang.ClassLoader$NativeLibrary.load(Native Method)
	at java.lang.ClassLoader.loadLibrary0(ClassLoader.java:1941)
	at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1824)
	at java.lang.Runtime.load0(Runtime.java:809)
	at java.lang.System.load(System.java:1086)
	at org.tensorflow.NativeLibrary.load(NativeLibrary.java:101)
	at org.tensorflow.TensorFlow.init(TensorFlow.java:66)
	at org.tensorflow.TensorFlow.<clinit>(TensorFlow.java:70)
	at org.tensorflow.Graph.<clinit>(Graph.java:361)
	at com.johnsnowlabs.ml.tensorflow.TensorflowWrapper$.readGraph(TensorflowWrapper.scala:98)
	at com.johnsnowlabs.ml.tensorflow.TensorflowWrapper$.read(TensorflowWrapper.scala:172)
	at com.johnsnowlabs.ml.tensorflow.ReadTensorflowModel$class.readTensorflowModel(TensorflowSerializeModel.scala:57)
	at com.johnsnowlabs.nlp.annotators.ner.dl.NerDLModel$.readTensorflowModel(NerDLModel.scala:97)
	at com.johnsnowlabs.nlp.annotators.ner.dl.ReadsNERGraph$class.readNerGraph(NerDLModel.scala:84)
	at com.johnsnowlabs.nlp.annotators.ner.dl.NerDLModel$.readNerGraph(NerDLModel.scala:97)
	at com.johnsnowlabs.nlp.annotators.ner.dl.ReadsNERGraph$$anonfun$2.apply(NerDLModel.scala:88)
	at com.johnsnowlabs.nlp.annotators.ner.dl.ReadsNERGraph$$anonfun$2.apply(NerDLModel.scala:88)
	at com.johnsnowlabs.nlp.ParamsAndFeaturesReadable$$anonfun$com$johnsnowlabs$nlp$ParamsAndFeaturesReadable$$onRead$1.apply(ParamsAndFeaturesReadable.scala:31)
	at com.johnsnowlabs.nlp.ParamsAndFeaturesReadable$$anonfun$com$johnsnowlabs$nlp$ParamsAndFeaturesReadable$$onRead$1.apply(ParamsAndFeaturesReadable.scala:30)
	at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
	at com.johnsnowlabs.nlp.ParamsAndFeaturesReadable$class.com$johnsnowlabs$nlp$ParamsAndFeaturesReadable$$onRead(ParamsAndFeaturesReadable.scala:30)
	at com.johnsnowlabs.nlp.ParamsAndFeaturesReadable$$anonfun$read$1.apply(ParamsAndFeaturesReadable.scala:41)
	at com.johnsnowlabs.nlp.ParamsAndFeaturesReadable$$anonfun$read$1.apply(ParamsAndFeaturesReadable.scala:41)
	at com.johnsnowlabs.nlp.FeaturesReader.load(ParamsAndFeaturesReadable.scala:19)
	at com.johnsnowlabs.nlp.FeaturesReader.load(ParamsAndFeaturesReadable.scala:8)
	at org.apache.spark.ml.util.DefaultParamsReader$.loadParamsInstance(ReadWrite.scala:652)
	at org.apache.spark.ml.Pipeline$SharedReadWrite$$anonfun$4.apply(Pipeline.scala:274)
	at org.apache.spark.ml.Pipeline$SharedReadWrite$$anonfun$4.apply(Pipeline.scala:272)
	at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
	at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
	at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
	at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
	at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
	at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186)
	at org.apache.spark.ml.Pipeline$SharedReadWrite$.load(Pipeline.scala:272)
	at org.apache.spark.ml.PipelineModel$PipelineModelReader.load(Pipeline.scala:348)
	at org.apache.spark.ml.PipelineModel$PipelineModelReader.load(Pipeline.scala:342)
	at com.johnsnowlabs.nlp.pretrained.ResourceDownloader$.downloadPipeline(ResourceDownloader.scala:134)
	at com.johnsnowlabs.nlp.pretrained.ResourceDownloader$.downloadPipeline(ResourceDownloader.scala:128)
	at com.johnsnowlabs.nlp.pretrained.PythonResourceDownloader$.downloadPipeline(ResourceDownloader.scala:197)
	at com.johnsnowlabs.nlp.pretrained.PythonResourceDownloader.downloadPipeline(ResourceDownloader.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
	at py4j.Gateway.invoke(Gateway.java:282)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.GatewayConnection.run(GatewayConnection.java:238)
	at java.lang.Thread.run(Thread.java:748)

Are the notebooks supposed to run without error in the docker? Thanks!

Hi @csyhuang and thanks for reporting this.
Just to have more details, could you please tell me about your Operating System and the process you took that lead to this error? (pull the image, run the image, and then you opened the Jupyter notebook and which example failed?)
Many thanks

Hi again @csyhuang, I apologize because the issue was our Docker image.

Now everything should work as expected if you can run the docker pull again to download the latest changes, please.

PS: Keep in mind some of the examples with POS() and OCR may not work until late Sunday when we release the hotfix.

Thanks for your report and please let us know if you have any other issue with the examples.

docker pull johnsnowlabs/spark-nlp-workshop:latest

docker run -it --rm -p 8888:8888 -p 4040:4040 johnsnowlabs/spark-nlp-workshop

Hi @maziyarpanahi , thanks for fixing the docker image. I tried running the notebook again but another error emerges:

Py4JError: An error occurred while calling z:com.johnsnowlabs.nlp.pretrained.PythonResourceDownloader.downloadPipeline

Since this ticket is close, I'll open a new one and link it here.

Thanks for getting back to me. Can we try to see if these two commands are the ones you tried?

docker pull johnsnowlabs/spark-nlp-workshop:latest

docker run -it --rm -p 8888:8888 -p 4040:4040 johnsnowlabs/spark-nlp-workshop

PS: Could you paste paste the entire trace of errors?

@maziyarpanahi Yes. I used the two commands you mentioned to run the docker.

I opened the new ticket before seeing your reply... Shall I continue replying here or we get to #31 ?

No worries, we can continue in your new issue.