[BigDL 2.0] examples on k8s integration tests
glorysdj opened this issue · 17 comments
glorysdj commented
- Kubernetets cluster env @ Almaren 200
- Jenkins Slave and Kubernetes client env
- Hyper Zoo Doicker image build job
- dllib examples
- orca examples
- chronos examples #18
- friesian examples #18
- dllib scala examples #18
dllib examples
Module | Example | Added | Client Mode | Cluster Mode |
---|---|---|---|---|
autograd | custom.py | Y | Succeed | Succeed |
autograd | customloss.py | Y | Succeed | Succeed |
nnframes | imageInference | Y | Succeed | Succeed |
nnframes | imageTransferLearning | Y | Succeed | Succeed |
orca examples
Module | Example | Added | Client Mode | Cluster Mode |
---|---|---|---|---|
automl | autoestimator/autoestimator_pytorch.py (#30) | N | #22 | #22 |
automl | autoxgboost/AutoXGBoostClassifier.p (#27) | N | #22 | #22 |
automl | autoxgboost/AutoXGBoostRegressor.py (#27) | N | #22 | #22 |
data | spark_pandas.py | Y | Succeed | Succeed |
bigdl | learn/bigdl/attention/transformer.py | Y | Succeed | Succeed |
bigdl | learn/bigdl/imageInference/imageInference.py | Y | Succeed | Succeed |
horovod | learn/horovod/pytorch_estimator.py | Y | Succeed | Succeed |
horovod | simple_horovod_pytorch.py | Y | Succeed | Succeed |
mxnet | learn/mxnet/lenet_mnist.py | Y | Succeed | Succeed |
openvino | learn/openvino/predict.py | N | N | #17 |
pytorch | learn/pytorch/cifar10/cifar10.py | Y | Succeed | Succeed |
pytorch | learn/pytorch/fashion_mnist/fashion_mnist.py | Y | Succeed | Succeed |
pytorch | learn/pytorch/super_resolution/super_resolution.py | Y | Succeed | Succeed |
tf | learn/tf/basic_text_classification/basic_text_classification.py | Y | Succeed | Succeed |
tf | learn/tf/image_segmentation/image_segmentation.py | Y | Succeed | Succeed |
tf | learn/tf/inception/inception.py | Y | Succeed | Succeed |
tf | learn/tf/transfer_learning/transfer_learning.py | Y | Succeed | Succeed |
tf2 | learn/tf2/resnet/resnet-50-imagenet.py | N | Succeed | Succeed |
tf2 | learn/tf2/yolov3/yoloV3.py | Y | Succeed | Succeed |
ray_on_spark | ray_on_spark/parameter_server/async_parameter_server.py | Y | Succeed | Succeed |
ray_on_spark | ray_on_spark/parameter_server/sync_parameter_server.py | Y | Succeed | Succeed |
ray_on_spark | ray_on_spark/rl_pong/rl_pong.py | Y | Succeed | Succeed |
ray_on_spark | ray_on_spark/rl_pong/rl_pong.py | Y | Succeed | Succeed |
ray_on_spark | ray_on_spark/rllib/multiagent_two_trainers.py | Y | Succeed | Succeed |
tfpark | tfpark/estimator/estimator_dataset.py | Y | Succeed | #17 |
tfpark | tfpark/estimator/estimator_inception.py | Y | Succeed | #17 |
tfpark | tfpark/estimator/pre-made-estimator.py | Y | Succeed | #17 |
tfpark | tfpark/gan/gan_train_and_evaluate.py | Y | Succeed | #17 |
tfpark | tfpark/keras/keras_dataset.py | Y | Succeed | #17 |
tfpark | tfpark/keras/keras_ndarray.py | Y | Succeed | #17 |
tfpark | tfpark/tf_optimizer/evaluate.py | Y | Succeed | #17 |
tfpark | tfpark/tf_optimizer/train.py | Y | Succeed | #17 |
torchmodel | torchmodel/train/imagenet/main.py | Y | Succeed | #17 |
torchmodel | torchmodel/train/mnist/main.py | Y | Succeed | #17 |
torchmodel | torchmodel/train/resnet_finetune/resnet_finetune.py | Y | Succeed | #17 |
zzti-bsj commented
Module | Example | Added | Client Mode | Cluster Mode |
---|---|---|---|---|
automl | autoestimator/autoestimator_pytorch.py | N | ||
automl | autoxgboost/AutoXGBoostClassifier.py | N | ||
automl | autoxgboost/AutoXGBoostRegressor.py | N | ||
data | spark_pandas.py | N | ||
bigdl | learn/bigdl/attention/transformer.py | N | ||
bigdl | learn/bigdl/imageInference/imageInference.py | N | ||
horovod | learn/horovod/pytorch_estimator.py | N | ||
horovod | simple_horovod_pytorch.py | N | ||
mxnet | learn/mxnet/lenet_mnist.py | N | ||
openvino | learn/openvino/predict.py | N | ||
pytorch | learn/pytorch/cifar10/cifar10.py | N | ||
pytorch | learn/pytorch/fashion_mnist/fashion_mnist.py | N | ||
pytorch | learn/pytorch/super_resolution/super_resolution.py | N | ||
tf | learn/tf/basic_text_classification/basic_text_classification.py | N | ||
tf | learn/tf/image_segmentation/image_segmentation.py | N | ||
tf | learn/tf/inception/inception.py | N | ||
tf | learn/tf/transfer_learning/transfer_learning.py | N | ||
tf2 | learn/tf2/resnet/resnet-50-imagenet.py | N | ||
tf2 | learn/tf2/yolov3/predict.py | N | ||
tf2 | learn/tf2/yolov3/yoloV3.py | N | ||
ray_on_spark | ray_on_spark/parameter_server/async_parameter_server.py | N | ||
ray_on_spark | ray_on_spark/parameter_server/sync_parameter_server.py | N | ||
ray_on_spark | ray_on_spark/rl_pong/rl_pong.py | N | ||
ray_on_spark | ray_on_spark/rl_pong/rl_pong.py | N | ||
ray_on_spark | ray_on_spark/rllib/multiagent_two_trainers.py | N | ||
tfpark | tfpark/estimator/estimator_dataset.py | N | ||
tfpark | tfpark/estimator/estimator_inception.py | N | ||
tfpark | tfpark/estimator/pre-made-estimator.py | N | ||
tfpark | tfpark/gan/gan_train_and_evaluate.py | N | ||
tfpark | tfpark/keras/keras_dataset.py | N | ||
tfpark | tfpark/keras/keras_ndarray.py | N | ||
tfpark | tfpark/tf_optimizer/evaluate.py | N | ||
tfpark | tfpark/tf_optimizer/train.py | N | ||
torchmodel | torchmodel/train/imagenet/main.py | N | ||
torchmodel | torchmodel/train/mnist/main.py | N | ||
torchmodel | torchmodel/train/resnet_finetune/resnet_finetune.py | N |
zzti-bsj commented
Module | Example | Added | Cluster Mode | Client Mode |
---|---|---|---|---|
autograd | custom.py | N | ||
autograd | customloss.py | N | ||
nnframes | imageInference | Y | Succeed | Succeed |
nnframes | imageTransferLearning | Y | Succeed | Succeed |
zzti-bsj commented
autograd custom.py
client command
${SPARK_HOME}/bin/spark-submit \
--master ${RUNTIME_SPARK_MASTER} \
--deploy-mode client \
--conf spark.driver.host=172.16.0.200 \
--conf spark.driver.port=54321 \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=${RUNTIME_K8S_SERVICE_ACCOUNT} \
--name analytics-zoo-autoestimator \
--conf spark.kubernetes.container.image=${RUNTIME_K8S_SPARK_IMAGE} \
--conf spark.executor.instances=${RUNTIME_EXECUTOR_INSTANCES} \
--conf spark.kubernetes.driver.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.options.claimName=${RUNTIME_PERSISTENT_VOLUME_CLAIM} \
--conf spark.kubernetes.driver.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.mount.path=/tmp \
--conf spark.kubernetes.executor.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.options.claimName=${RUNTIME_PERSISTENT_VOLUME_CLAIM} \
--conf spark.kubernetes.executor.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.mount.path=/tmp \
--conf spark.kubernetes.driver.label.az=true \
--conf spark.kubernetes.executor.label.az=true \
--conf spark.kubernetes.node.selector.spark=true \
--executor-cores ${RUNTIME_EXECUTOR_CORES} \
--executor-memory ${RUNTIME_EXECUTOR_MEMORY} \
--total-executor-cores ${RUNTIME_TOTAL_EXECUTOR_CORES} \
--driver-cores ${RUNTIME_DRIVER_CORES} \
--driver-memory ${RUNTIME_DRIVER_MEMORY} \
--properties-file ${BIGDL_HOME}/conf/spark-bigdl.conf \
--py-files local://${BIGDL_HOME}/python/bigdl-friesian-spark_${SPARK_VERSION}-${BIGDL_VERSION}-python-api.zip,local://${BIGDL_HOME}/python/bigdl-serving-spark_${SPARK_VERSION}-${BIGDL_VERSION}-python-api.zip,local://${BIGDL_HOME}/python/bigdl-dllib-spark_${SPARK_VERSION}-${BIGDL_VERSION}-python-api.zip,local://${BIGDL_HOME}/python/bigdl-orca-spark_${SPARK_VERSION}-${BIGDL_VERSION}-python-api.zip,local:///opt/bigdl-0.14.0-SNAPSHOT/examples/dllib/autograd/custom.py \
--conf spark.driver.extraJavaOptions=-Dderby.stream.error.file=/tmp \
--conf spark.sql.catalogImplementation='in-memory' \
--conf spark.driver.extraClassPath=local://${BIGDL_HOME}/jars/bigdl-orca-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar:local://${BIGDL_HOME}/jars/bigdl-dllib-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar:local://${BIGDL_HOME}/jars/bigdl-friesian-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar \
--conf spark.executor.extraClassPath=local://${BIGDL_HOME}/jars/bigdl-orca-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar:local://${BIGDL_HOME}/jars/bigdl-dllib-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar:local://${BIGDL_HOME}/jars/bigdl-friesian-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar \
local:///opt/bigdl-0.14.0-SNAPSHOT/examples/dllib/autograd/custom.py
cluster command
${SPARK_HOME}/bin/spark-submit \
--master ${RUNTIME_SPARK_MASTER} \
--deploy-mode cluster \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=${RUNTIME_K8S_SERVICE_ACCOUNT} \
--name analytics-zoo-autoestimator \
--conf spark.kubernetes.container.image=${RUNTIME_K8S_SPARK_IMAGE} \
--conf spark.executor.instances=${RUNTIME_EXECUTOR_INSTANCES} \
--conf spark.kubernetes.driver.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.options.claimName=${RUNTIME_PERSISTENT_VOLUME_CLAIM} \
--conf spark.kubernetes.driver.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.mount.path=/tmp \
--conf spark.kubernetes.executor.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.options.claimName=${RUNTIME_PERSISTENT_VOLUME_CLAIM} \
--conf spark.kubernetes.executor.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.mount.path=/tmp \
--conf spark.kubernetes.driver.label.az=true \
--conf spark.kubernetes.executor.label.az=true \
--conf spark.kubernetes.node.selector.spark=true \
--executor-cores ${RUNTIME_EXECUTOR_CORES} \
--executor-memory ${RUNTIME_EXECUTOR_MEMORY} \
--total-executor-cores ${RUNTIME_TOTAL_EXECUTOR_CORES} \
--driver-cores ${RUNTIME_DRIVER_CORES} \
--driver-memory ${RUNTIME_DRIVER_MEMORY} \
--properties-file ${BIGDL_HOME}/conf/spark-bigdl.conf \
--py-files local://${BIGDL_HOME}/python/bigdl-friesian-spark_${SPARK_VERSION}-${BIGDL_VERSION}-python-api.zip,local://${BIGDL_HOME}/python/bigdl-serving-spark_${SPARK_VERSION}-${BIGDL_VERSION}-python-api.zip,local://${BIGDL_HOME}/python/bigdl-dllib-spark_${SPARK_VERSION}-${BIGDL_VERSION}-python-api.zip,local://${BIGDL_HOME}/python/bigdl-orca-spark_${SPARK_VERSION}-${BIGDL_VERSION}-python-api.zip,local:///opt/bigdl-0.14.0-SNAPSHOT/examples/dllib/autograd/custom.py \
--conf spark.driver.extraJavaOptions=-Dderby.stream.error.file=/tmp \
--conf spark.sql.catalogImplementation='in-memory' \
--conf spark.driver.extraClassPath=local://${BIGDL_HOME}/jars/bigdl-orca-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar:local://${BIGDL_HOME}/jars/bigdl-dllib-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar:local://${BIGDL_HOME}/jars/bigdl-friesian-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar \
--conf spark.executor.extraClassPath=local://${BIGDL_HOME}/jars/bigdl-orca-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar:local://${BIGDL_HOME}/jars/bigdl-dllib-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar:local://${BIGDL_HOME}/jars/bigdl-friesian-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar \
local:///opt/bigdl-0.14.0-SNAPSHOT/examples/dllib/autograd/custom.py
exception
cls.getname: com.intel.analytics.bigdl.dllib.utils.python.api.Sample
BigDLBasePickler registering: bigdl.dllib.utils.common Sample
cls.getname: com.intel.analytics.bigdl.dllib.utils.python.api.EvaluatedResult
BigDLBasePickler registering: bigdl.dllib.utils.common EvaluatedResult
cls.getname: com.intel.analytics.bigdl.dllib.utils.python.api.JTensor
BigDLBasePickler registering: bigdl.dllib.utils.common JTensor
cls.getname: com.intel.analytics.bigdl.dllib.utils.python.api.JActivity
BigDLBasePickler registering: bigdl.dllib.utils.common JActivity
creating: createZooKerasSequential
creating: createZooKerasDense
creating: createDefault
creating: createSGD
creating: createZooKerasVariable
creating: createZooKerasVariable
creating: createZooKerasCustomLoss
2021-10-19 05:51:19 INFO LocalOptimizer$:69 - Clone 1 model...
2021-10-19 05:51:20 INFO LocalOptimizer$:69 - Clone 2 model...
2021-10-19 05:51:20 INFO LocalOptimizer$:69 - Clone 3 model...
2021-10-19 05:51:20 INFO LocalOptimizer$:69 - Clone 4 model...
2021-10-19 05:51:20 INFO LocalOptimizer$:69 - Clone 5 model...
2021-10-19 05:51:20 INFO LocalOptimizer$:69 - Clone 6 model...
2021-10-19 05:51:20 INFO LocalOptimizer$:69 - Clone 7 model...
2021-10-19 05:51:20 INFO LocalOptimizer$:69 - Clone 8 model...
2021-10-19 05:51:20 INFO LocalOptimizer$:69 - Clone 9 model...
2021-10-19 05:51:20 INFO LocalOptimizer$:69 - Clone 10 model...
2021-10-19 05:51:20 INFO LocalOptimizer$:69 - Clone 11 model...
2021-10-19 05:51:20 INFO LocalOptimizer$:69 - Clone 12 model...
2021-10-19 05:51:20 INFO LocalOptimizer$:69 - Clone 13 model...
2021-10-19 05:51:20 INFO LocalOptimizer$:69 - Clone 14 model...
2021-10-19 05:51:20 INFO LocalOptimizer$:69 - Clone 15 model...
2021-10-19 05:51:20 INFO LocalOptimizer$:69 - Clone 16 model...
2021-10-19 05:51:20 INFO LocalOptimizer$:119 - model thread pool size is 1
2021-10-19 05:51:20 ERROR ThreadPool$:136 - Error: Layer info: Model[ce577d6d]/KerasLayerWrapper[Mean76c60542_wrapper]
java.lang.IllegalArgumentException: requirement failed: dimension exceeds input dimensionsdimension 1, input dimension 0
at scala.Predef$.require(Predef.scala:281)
at com.intel.analytics.bigdl.dllib.nn.Sum.getPositiveDimension(Sum.scala:64)
at com.intel.analytics.bigdl.dllib.nn.Sum.updateOutput(Sum.scala:75)
at com.intel.analytics.bigdl.dllib.nn.Sum.updateOutput(Sum.scala:44)
at com.intel.analytics.bigdl.dllib.nn.keras.KerasLayer.updateOutput(KerasLayer.scala:274)
at com.intel.analytics.bigdl.dllib.nn.abstractnn.AbstractModule.forward(AbstractModule.scala:283)
at com.intel.analytics.bigdl.dllib.nn.StaticGraph.updateOutput(StaticGraph.scala:62)
at com.intel.analytics.bigdl.dllib.nn.keras.KerasLayer.updateOutput(KerasLayer.scala:274)
at com.intel.analytics.bigdl.dllib.nn.abstractnn.AbstractModule.forward(AbstractModule.scala:283)
at com.intel.analytics.bigdl.dllib.keras.autograd.CustomLoss.updateOutput(CustomLoss.scala:104)
at com.intel.analytics.bigdl.dllib.keras.autograd.CustomLoss.updateOutput(CustomLoss.scala:66)
at com.intel.analytics.bigdl.dllib.nn.abstractnn.AbstractCriterion.forward(AbstractCriterion.scala:73)
at com.intel.analytics.bigdl.dllib.optim.LocalOptimizer.$anonfun$optimize$8(LocalOptimizer.scala:149)
at scala.runtime.java8.JFunction0$mcD$sp.apply(JFunction0$mcD$sp.java:23)
at com.intel.analytics.bigdl.dllib.utils.ThreadPool.$anonfun$invokeAndWait$2(ThreadPool.scala:133)
at scala.concurrent.Future$.$anonfun$apply$1(Future.scala:659)
at scala.util.Success.$anonfun$map$1(Try.scala:255)
at scala.util.Success.map(Try.scala:213)
at scala.concurrent.Future.$anonfun$map$1(Future.scala:292)
at scala.concurrent.impl.Promise.liftedTree1$1(Promise.scala:33)
at scala.concurrent.impl.Promise.$anonfun$transform$1(Promise.scala:33)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:64)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
at com.intel.analytics.bigdl.dllib.nn.abstractnn.AbstractModule.forward(AbstractModule.scala:289)
at com.intel.analytics.bigdl.dllib.nn.StaticGraph.updateOutput(StaticGraph.scala:62)
at com.intel.analytics.bigdl.dllib.nn.keras.KerasLayer.updateOutput(KerasLayer.scala:274)
at com.intel.analytics.bigdl.dllib.nn.abstractnn.AbstractModule.forward(AbstractModule.scala:283)
at com.intel.analytics.bigdl.dllib.keras.autograd.CustomLoss.updateOutput(CustomLoss.scala:104)
at com.intel.analytics.bigdl.dllib.keras.autograd.CustomLoss.updateOutput(CustomLoss.scala:66)
at com.intel.analytics.bigdl.dllib.nn.abstractnn.AbstractCriterion.forward(AbstractCriterion.scala:73)
at com.intel.analytics.bigdl.dllib.optim.LocalOptimizer.$anonfun$optimize$8(LocalOptimizer.scala:149)
at scala.runtime.java8.JFunction0$mcD$sp.apply(JFunction0$mcD$sp.java:23)
at com.intel.analytics.bigdl.dllib.utils.ThreadPool.$anonfun$invokeAndWait$2(ThreadPool.scala:133)
at scala.concurrent.Future$.$anonfun$apply$1(Future.scala:659)
at scala.util.Success.$anonfun$map$1(Try.scala:255)
at scala.util.Success.map(Try.scala:213)
at scala.concurrent.Future.$anonfun$map$1(Future.scala:292)
at scala.concurrent.impl.Promise.liftedTree1$1(Promise.scala:33)
at scala.concurrent.impl.Promise.$anonfun$transform$1(Promise.scala:33)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:64)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
2021-10-19 05:51:20 ERROR ThreadPool$:136 - Error: Layer info: Model[ce577d6d]/KerasLayerWrapper[Mean76c60542_wrapper]
java.lang.IllegalArgumentException: requirement failed: dimension exceeds input dimensionsdimension 1, input dimension 0
at scala.Predef$.require(Predef.scala:281)
at com.intel.analytics.bigdl.dllib.nn.Sum.getPositiveDimension(Sum.scala:64)
at com.intel.analytics.bigdl.dllib.nn.Sum.updateOutput(Sum.scala:75)
at com.intel.analytics.bigdl.dllib.nn.Sum.updateOutput(Sum.scala:44)
at com.intel.analytics.bigdl.dllib.nn.keras.KerasLayer.updateOutput(KerasLayer.scala:274)
at com.intel.analytics.bigdl.dllib.nn.abstractnn.AbstractModule.forward(AbstractModule.scala:283)
at com.intel.analytics.bigdl.dllib.nn.StaticGraph.updateOutput(StaticGraph.scala:62)
at com.intel.analytics.bigdl.dllib.nn.keras.KerasLayer.updateOutput(KerasLayer.scala:274)
at com.intel.analytics.bigdl.dllib.nn.abstractnn.AbstractModule.forward(AbstractModule.scala:283)
at com.intel.analytics.bigdl.dllib.keras.autograd.CustomLoss.updateOutput(CustomLoss.scala:104)
at com.intel.analytics.bigdl.dllib.keras.autograd.CustomLoss.updateOutput(CustomLoss.scala:66)
at com.intel.analytics.bigdl.dllib.nn.abstractnn.AbstractCriterion.forward(AbstractCriterion.scala:73)
at com.intel.analytics.bigdl.dllib.optim.LocalOptimizer.$anonfun$optimize$8(LocalOptimizer.scala:149)
at scala.runtime.java8.JFunction0$mcD$sp.apply(JFunction0$mcD$sp.java:23)
at com.intel.analytics.bigdl.dllib.utils.ThreadPool.$anonfun$invokeAndWait$2(ThreadPool.scala:133)
at scala.concurrent.Future$.$anonfun$apply$1(Future.scala:659)
at scala.util.Success.$anonfun$map$1(Try.scala:255)
at scala.util.Success.map(Try.scala:213)
at scala.concurrent.Future.$anonfun$map$1(Future.scala:292)
at scala.concurrent.impl.Promise.liftedTree1$1(Promise.scala:33)
at scala.concurrent.impl.Promise.$anonfun$transform$1(Promise.scala:33)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:64)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
at com.intel.analytics.bigdl.dllib.nn.abstractnn.AbstractModule.forward(AbstractModule.scala:289)
at com.intel.analytics.bigdl.dllib.nn.StaticGraph.updateOutput(StaticGraph.scala:62)
at com.intel.analytics.bigdl.dllib.nn.keras.KerasLayer.updateOutput(KerasLayer.scala:274)
at com.intel.analytics.bigdl.dllib.nn.abstractnn.AbstractModule.forward(AbstractModule.scala:283)
at com.intel.analytics.bigdl.dllib.keras.autograd.CustomLoss.updateOutput(CustomLoss.scala:104)
at com.intel.analytics.bigdl.dllib.keras.autograd.CustomLoss.updateOutput(CustomLoss.scala:66)
at com.intel.analytics.bigdl.dllib.nn.abstractnn.AbstractCriterion.forward(AbstractCriterion.scala:73)
at com.intel.analytics.bigdl.dllib.optim.LocalOptimizer.$anonfun$optimize$8(LocalOptimizer.scala:149)
at scala.runtime.java8.JFunction0$mcD$sp.apply(JFunction0$mcD$sp.java:23)
at com.intel.analytics.bigdl.dllib.utils.ThreadPool.$anonfun$invokeAndWait$2(ThreadPool.scala:133)
at scala.concurrent.Future$.$anonfun$apply$1(Future.scala:659)
at scala.util.Success.$anonfun$map$1(Try.scala:255)
at scala.util.Success.map(Try.scala:213)
at scala.concurrent.Future.$anonfun$map$1(Future.scala:292)
at scala.concurrent.impl.Promise.liftedTree1$1(Promise.scala:33)
at scala.concurrent.impl.Promise.$anonfun$transform$1(Promise.scala:33)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:64)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
piaolaidelangman commented
autoestimator_pytorch.py
client command
${SPARK_HOME}/bin/spark-submit \
--master ${RUNTIME_SPARK_MASTER} \
--deploy-mode client \
--conf spark.driver.host=172.16.0.200 \
--conf spark.driver.port=54321 \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=${RUNTIME_K8S_SERVICE_ACCOUNT} \
--name analytics-zoo-autoestimator \
--conf spark.kubernetes.container.image=${RUNTIME_K8S_SPARK_IMAGE} \
--conf spark.executor.instances=${RUNTIME_EXECUTOR_INSTANCES} \
--conf spark.kubernetes.driver.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.options.claimName=${RUNTIME_PERSISTENT_VOLUME_CLAIM} \
--conf spark.kubernetes.driver.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.mount.path=/tmp \
--conf spark.kubernetes.executor.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.options.claimName=${RUNTIME_PERSISTENT_VOLUME_CLAIM} \
--conf spark.kubernetes.executor.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.mount.path=/tmp \
--conf spark.kubernetes.driver.label.az=true \
--conf spark.kubernetes.executor.label.az=true \
--conf spark.kubernetes.node.selector.spark=true \
--executor-cores ${RUNTIME_EXECUTOR_CORES} \
--executor-memory ${RUNTIME_EXECUTOR_MEMORY} \
--total-executor-cores ${RUNTIME_TOTAL_EXECUTOR_CORES} \
--driver-cores ${RUNTIME_DRIVER_CORES} \
--driver-memory ${RUNTIME_DRIVER_MEMORY} \
--properties-file ${BIGDL_HOME}/conf/spark-bigdl.conf \
--py-files local://${BIGDL_HOME}/python/bigdl-friesian-spark_${SPARK_VERSION}-${BIGDL_VERSION}-python-api.zip,local://${BIGDL_HOME}/python/bigdl-serving-spark_${SPARK_VERSION}-${BIGDL_VERSION}-python-api.zip,local://${BIGDL_HOME}/python/bigdl-dllib-spark_${SPARK_VERSION}-${BIGDL_VERSION}-python-api.zip,local://${BIGDL_HOME}/python/bigdl-orca-spark_${SPARK_VERSION}-${BIGDL_VERSION}-python-api.zip,local:///opt/bigdl-0.14.0-SNAPSHOT/examples/orca/automl/autoestimator/autoestimator_pytorch.py \
--conf spark.driver.extraJavaOptions=-Dderby.stream.error.file=/tmp \
--conf spark.sql.catalogImplementation='in-memory' \
--conf spark.driver.extraClassPath=local://${BIGDL_HOME}/jars/bigdl-orca-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar:local://${BIGDL_HOME}/jars/bigdl-dllib-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar:local://${BIGDL_HOME}/jars/bigdl-friesian-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar \
--conf spark.executor.extraClassPath=local://${BIGDL_HOME}/jars/bigdl-orca-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar:local://${BIGDL_HOME}/jars/bigdl-dllib-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar:local://${BIGDL_HOME}/jars/bigdl-friesian-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar \
local:///opt/bigdl-0.14.0-SNAPSHOT/examples/orca/automl/autoestimator/autoestimator_pytorch.py
cluster command
${SPARK_HOME}/bin/spark-submit \
--master ${RUNTIME_SPARK_MASTER} \
--deploy-mode cluster \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=${RUNTIME_K8S_SERVICE_ACCOUNT} \
--name analytics-zoo-autoestimator \
--conf spark.kubernetes.container.image=${RUNTIME_K8S_SPARK_IMAGE} \
--conf spark.executor.instances=${RUNTIME_EXECUTOR_INSTANCES} \
--conf spark.kubernetes.driver.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.options.claimName=${RUNTIME_PERSISTENT_VOLUME_CLAIM} \
--conf spark.kubernetes.driver.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.mount.path=/tmp \
--conf spark.kubernetes.executor.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.options.claimName=${RUNTIME_PERSISTENT_VOLUME_CLAIM} \
--conf spark.kubernetes.executor.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.mount.path=/tmp \
--conf spark.kubernetes.driver.label.az=true \
--conf spark.kubernetes.executor.label.az=true \
--conf spark.kubernetes.node.selector.spark=true \
--executor-cores ${RUNTIME_EXECUTOR_CORES} \
--executor-memory ${RUNTIME_EXECUTOR_MEMORY} \
--total-executor-cores ${RUNTIME_TOTAL_EXECUTOR_CORES} \
--driver-cores ${RUNTIME_DRIVER_CORES} \
--driver-memory ${RUNTIME_DRIVER_MEMORY} \
--properties-file ${BIGDL_HOME}/conf/spark-bigdl.conf \
--py-files local://${BIGDL_HOME}/python/bigdl-friesian-spark_${SPARK_VERSION}-${BIGDL_VERSION}-python-api.zip,local://${BIGDL_HOME}/python/bigdl-serving-spark_${SPARK_VERSION}-${BIGDL_VERSION}-python-api.zip,local://${BIGDL_HOME}/python/bigdl-dllib-spark_${SPARK_VERSION}-${BIGDL_VERSION}-python-api.zip,local://${BIGDL_HOME}/python/bigdl-orca-spark_${SPARK_VERSION}-${BIGDL_VERSION}-python-api.zip,local:///opt/bigdl-0.14.0-SNAPSHOT/examples/orca/automl/autoestimator/autoestimator_pytorch.py \
--conf spark.driver.extraJavaOptions=-Dderby.stream.error.file=/tmp \
--conf spark.sql.catalogImplementation='in-memory' \
--conf spark.driver.extraClassPath=local://${BIGDL_HOME}/jars/bigdl-orca-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar:local://${BIGDL_HOME}/jars/bigdl-dllib-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar:local://${BIGDL_HOME}/jars/bigdl-friesian-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar \
--conf spark.executor.extraClassPath=local://${BIGDL_HOME}/jars/bigdl-orca-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar:local://${BIGDL_HOME}/jars/bigdl-dllib-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar:local://${BIGDL_HOME}/jars/bigdl-friesian-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar \
local:///opt/bigdl-0.14.0-SNAPSHOT/examples/orca/automl/autoestimator/autoestimator_pytorch.py
Client Exception
2021-10-19 07:27:42,026 INFO tune.py:450 -- Total run time: 2.48 seconds (2.39 seconds for the tuning loop).
2021-10-19 07:27:42.238466: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/work/torch/install/lib:
2021-10-19 07:27:42.238500: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
Evaluate: accuracy is 0.9975
Stopping orca context
*** Error in `python3 /opt/bigdl-0.14.0-SNAPSHOT/examples/orca/automl/autoestimator/autoestimator_pytorch.py': corrupted size vs. prev_size: 0x000055d94e5cc8b0 ***
Cluster Exception
2021-10-19 07:59:12.867840: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2021-10-19 07:59:12.867876: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
Evaluate: accuracy is 0.9975
Stopping orca context
piaolaidelangman commented
AutoXGBoostClassifier.py
client command
${SPARK_HOME}/bin/spark-submit \
--master ${RUNTIME_SPARK_MASTER} \
--deploy-mode client \
--conf spark.driver.host=172.16.0.200 \
--conf spark.driver.port=54321 \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=${RUNTIME_K8S_SERVICE_ACCOUNT} \
--name analytics-zoo-autoestimator \
--conf spark.kubernetes.container.image=${RUNTIME_K8S_SPARK_IMAGE} \
--conf spark.executor.instances=${RUNTIME_EXECUTOR_INSTANCES} \
--conf spark.kubernetes.driver.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.options.claimName=${RUNTIME_PERSISTENT_VOLUME_CLAIM} \
--conf spark.kubernetes.driver.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.mount.path=/tmp \
--conf spark.kubernetes.executor.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.options.claimName=${RUNTIME_PERSISTENT_VOLUME_CLAIM} \
--conf spark.kubernetes.executor.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.mount.path=/tmp \
--conf spark.kubernetes.driver.label.az=true \
--conf spark.kubernetes.executor.label.az=true \
--conf spark.kubernetes.node.selector.spark=true \
--executor-cores ${RUNTIME_EXECUTOR_CORES} \
--executor-memory ${RUNTIME_EXECUTOR_MEMORY} \
--total-executor-cores ${RUNTIME_TOTAL_EXECUTOR_CORES} \
--driver-cores ${RUNTIME_DRIVER_CORES} \
--driver-memory ${RUNTIME_DRIVER_MEMORY} \
--properties-file ${BIGDL_HOME}/conf/spark-bigdl.conf \
--py-files local://${BIGDL_HOME}/python/bigdl-friesian-spark_${SPARK_VERSION}-${BIGDL_VERSION}-python-api.zip,local://${BIGDL_HOME}/python/bigdl-serving-spark_${SPARK_VERSION}-${BIGDL_VERSION}-python-api.zip,local://${BIGDL_HOME}/python/bigdl-dllib-spark_${SPARK_VERSION}-${BIGDL_VERSION}-python-api.zip,local://${BIGDL_HOME}/python/bigdl-orca-spark_${SPARK_VERSION}-${BIGDL_VERSION}-python-api.zip,local:///opt/bigdl-0.14.0-SNAPSHOT/examples/orca/automl/autoxgboost/AutoXGBoostClassifier.py \
--conf spark.driver.extraJavaOptions=-Dderby.stream.error.file=/tmp \
--conf spark.sql.catalogImplementation='in-memory' \
--conf spark.driver.extraClassPath=local://${BIGDL_HOME}/jars/bigdl-orca-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar:local://${BIGDL_HOME}/jars/bigdl-dllib-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar:local://${BIGDL_HOME}/jars/bigdl-friesian-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar \
--conf spark.executor.extraClassPath=local://${BIGDL_HOME}/jars/bigdl-orca-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar:local://${BIGDL_HOME}/jars/bigdl-dllib-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar:local://${BIGDL_HOME}/jars/bigdl-friesian-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar \
local:///opt/bigdl-0.14.0-SNAPSHOT/examples/orca/automl/autoxgboost/AutoXGBoostClassifier.py \
--path /home/airline_14col.data
cluster command
${SPARK_HOME}/bin/spark-submit \
--master ${RUNTIME_SPARK_MASTER} \
--deploy-mode cluster \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=${RUNTIME_K8S_SERVICE_ACCOUNT} \
--name analytics-zoo-autoestimator \
--conf spark.kubernetes.container.image=${RUNTIME_K8S_SPARK_IMAGE} \
--conf spark.executor.instances=${RUNTIME_EXECUTOR_INSTANCES} \
--conf spark.kubernetes.driver.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.options.claimName=${RUNTIME_PERSISTENT_VOLUME_CLAIM} \
--conf spark.kubernetes.driver.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.mount.path=/tmp \
--conf spark.kubernetes.executor.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.options.claimName=${RUNTIME_PERSISTENT_VOLUME_CLAIM} \
--conf spark.kubernetes.executor.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.mount.path=/tmp \
--conf spark.kubernetes.driver.label.az=true \
--conf spark.kubernetes.executor.label.az=true \
--conf spark.kubernetes.node.selector.spark=true \
--executor-cores ${RUNTIME_EXECUTOR_CORES} \
--executor-memory ${RUNTIME_EXECUTOR_MEMORY} \
--total-executor-cores ${RUNTIME_TOTAL_EXECUTOR_CORES} \
--driver-cores ${RUNTIME_DRIVER_CORES} \
--driver-memory ${RUNTIME_DRIVER_MEMORY} \
--properties-file ${BIGDL_HOME}/conf/spark-bigdl.conf \
--py-files local://${BIGDL_HOME}/python/bigdl-friesian-spark_${SPARK_VERSION}-${BIGDL_VERSION}-python-api.zip,local://${BIGDL_HOME}/python/bigdl-serving-spark_${SPARK_VERSION}-${BIGDL_VERSION}-python-api.zip,local://${BIGDL_HOME}/python/bigdl-dllib-spark_${SPARK_VERSION}-${BIGDL_VERSION}-python-api.zip,local://${BIGDL_HOME}/python/bigdl-orca-spark_${SPARK_VERSION}-${BIGDL_VERSION}-python-api.zip,local:///opt/bigdl-0.14.0-SNAPSHOT/examples/orca/automl/autoxgboost/AutoXGBoostClassifier.py \
--conf spark.driver.extraJavaOptions=-Dderby.stream.error.file=/tmp \
--conf spark.sql.catalogImplementation='in-memory' \
--conf spark.driver.extraClassPath=local://${BIGDL_HOME}/jars/bigdl-orca-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar:local://${BIGDL_HOME}/jars/bigdl-dllib-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar:local://${BIGDL_HOME}/jars/bigdl-friesian-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar \
--conf spark.executor.extraClassPath=local://${BIGDL_HOME}/jars/bigdl-orca-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar:local://${BIGDL_HOME}/jars/bigdl-dllib-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar:local://${BIGDL_HOME}/jars/bigdl-friesian-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar \
local:///opt/bigdl-0.14.0-SNAPSHOT/examples/orca/automl/autoxgboost/AutoXGBoostClassifier.py \
--path /home/airline_14col.data
Client Exception
Traceback (most recent call last):
File "/opt/bigdl-0.14.0-SNAPSHOT/examples/orca/automl/autoxgboost/AutoXGBoostClassifier.py", line 24, in <module>
from bigdl.orca.automl.xgboost import AutoXGBClassifier
File "<frozen importlib._bootstrap>", line 971, in _find_and_load
File "<frozen importlib._bootstrap>", line 955, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 656, in _load_unlocked
File "<frozen importlib._bootstrap>", line 626, in _load_backward_compatible
File "/opt/bigdl-0.14.0-SNAPSHOT/python/bigdl-friesian-spark_3.1.2-0.14.0-SNAPSHOT-python-api.zip/bigdl/orca/automl/xgboost/__init__.py", line 16, in <module>
File "<frozen importlib._bootstrap>", line 971, in _find_and_load
File "<frozen importlib._bootstrap>", line 955, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 656, in _load_unlocked
File "<frozen importlib._bootstrap>", line 626, in _load_backward_compatible
File "/opt/bigdl-0.14.0-SNAPSHOT/python/bigdl-friesian-spark_3.1.2-0.14.0-SNAPSHOT-python-api.zip/bigdl/orca/automl/xgboost/auto_xgb.py", line 18, in <module>
File "<frozen importlib._bootstrap>", line 971, in _find_and_load
File "<frozen importlib._bootstrap>", line 955, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 656, in _load_unlocked
File "<frozen importlib._bootstrap>", line 626, in _load_backward_compatible
File "/opt/bigdl-0.14.0-SNAPSHOT/python/bigdl-friesian-spark_3.1.2-0.14.0-SNAPSHOT-python-api.zip/bigdl/orca/automl/auto_estimator.py", line 16, in <module>
File "<frozen importlib._bootstrap>", line 971, in _find_and_load
File "<frozen importlib._bootstrap>", line 955, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 656, in _load_unlocked
File "<frozen importlib._bootstrap>", line 626, in _load_backward_compatible
File "/opt/bigdl-0.14.0-SNAPSHOT/python/bigdl-friesian-spark_3.1.2-0.14.0-SNAPSHOT-python-api.zip/bigdl/orca/automl/search/__init__.py", line 18, in <module>
File "<frozen importlib._bootstrap>", line 971, in _find_and_load
File "<frozen importlib._bootstrap>", line 955, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 656, in _load_unlocked
File "<frozen importlib._bootstrap>", line 626, in _load_backward_compatible
File "/opt/bigdl-0.14.0-SNAPSHOT/python/bigdl-friesian-spark_3.1.2-0.14.0-SNAPSHOT-python-api.zip/bigdl/orca/automl/search/tensorboardlogger.py", line 44, in <module>
File "/opt/work/conda/envs/bigdl/lib/python3.6/site-packages/torch/__init__.py", line 189, in <module>
_load_global_deps()
File "/opt/work/conda/envs/bigdl/lib/python3.6/site-packages/torch/__init__.py", line 142, in _load_global_deps
ctypes.CDLL(lib_path, mode=ctypes.RTLD_GLOBAL)
File "/opt/work/conda/envs/bigdl/lib/python3.6/ctypes/__init__.py", line 348, in __init__
self._handle = _dlopen(self._name, mode)
OSError: dlopen: cannot load any more object with static TLS
2021-10-19 07:50:17 INFO ShutdownHookManager:57 - Shutdown hook called
2021-10-19 07:50:17 INFO ShutdownHookManager:57 - Deleting directory /tmp/spark-16c432c8-d241-44c9-aaec-7ee29c65f8b4
2021-10-19 07:50:17 INFO ShutdownHookManager:57 - Deleting directory /tmp/localPyFiles-da811fa6-efb5-4c1d-9cac-172cc159a9d4
Cluster Exception
Traceback (most recent call last):
File "/opt/bigdl-0.14.0-SNAPSHOT/examples/orca/automl/autoxgboost/AutoXGBoostClassifier.py", line 24, in <module>
from bigdl.orca.automl.xgboost import AutoXGBClassifier
File "/opt/bigdl-0.14.0-SNAPSHOT/python/bigdl-friesian-spark_3.1.2-0.14.0-SNAPSHOT-python-api.zip/bigdl/orca/automl/xgboost/__init__.py", line 16, in <module>
File "/opt/bigdl-0.14.0-SNAPSHOT/python/bigdl-friesian-spark_3.1.2-0.14.0-SNAPSHOT-python-api.zip/bigdl/orca/automl/xgboost/auto_xgb.py", line 17, in <module>
File "/opt/bigdl-0.14.0-SNAPSHOT/python/bigdl-friesian-spark_3.1.2-0.14.0-SNAPSHOT-python-api.zip/bigdl/orca/automl/xgboost/XGBoost.py", line 19, in <module>
ModuleNotFoundError: No module named 'xgboost'
2021-10-19 08:01:45 INFO ShutdownHookManager:57 - Shutdown hook called
2021-10-19 08:01:45 INFO ShutdownHookManager:57 - Deleting directory /tmp/spark-8fb5d5a5-2706-4fcc-b331-e322498ac390
piaolaidelangman commented
AutoXGBoostRegressor.py
client command
${SPARK_HOME}/bin/spark-submit \
--master ${RUNTIME_SPARK_MASTER} \
--deploy-mode client \
--conf spark.driver.host=172.16.0.200 \
--conf spark.driver.port=54321 \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=${RUNTIME_K8S_SERVICE_ACCOUNT} \
--name analytics-zoo-autoestimator \
--conf spark.kubernetes.container.image=${RUNTIME_K8S_SPARK_IMAGE} \
--conf spark.executor.instances=${RUNTIME_EXECUTOR_INSTANCES} \
--conf spark.kubernetes.driver.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.options.claimName=${RUNTIME_PERSISTENT_VOLUME_CLAIM} \
--conf spark.kubernetes.driver.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.mount.path=/tmp \
--conf spark.kubernetes.executor.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.options.claimName=${RUNTIME_PERSISTENT_VOLUME_CLAIM} \
--conf spark.kubernetes.executor.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.mount.path=/tmp \
--conf spark.kubernetes.driver.label.az=true \
--conf spark.kubernetes.executor.label.az=true \
--conf spark.kubernetes.node.selector.spark=true \
--executor-cores ${RUNTIME_EXECUTOR_CORES} \
--executor-memory ${RUNTIME_EXECUTOR_MEMORY} \
--total-executor-cores ${RUNTIME_TOTAL_EXECUTOR_CORES} \
--driver-cores ${RUNTIME_DRIVER_CORES} \
--driver-memory ${RUNTIME_DRIVER_MEMORY} \
--properties-file ${BIGDL_HOME}/conf/spark-bigdl.conf \
--py-files local://${BIGDL_HOME}/python/bigdl-friesian-spark_${SPARK_VERSION}-${BIGDL_VERSION}-python-api.zip,local://${BIGDL_HOME}/python/bigdl-serving-spark_${SPARK_VERSION}-${BIGDL_VERSION}-python-api.zip,local://${BIGDL_HOME}/python/bigdl-dllib-spark_${SPARK_VERSION}-${BIGDL_VERSION}-python-api.zip,local://${BIGDL_HOME}/python/bigdl-orca-spark_${SPARK_VERSION}-${BIGDL_VERSION}-python-api.zip,local:///opt/bigdl-0.14.0-SNAPSHOT/examples/orca/automl/autoxgboost/AutoXGBoostRegressor.py \
--conf spark.driver.extraJavaOptions=-Dderby.stream.error.file=/tmp \
--conf spark.sql.catalogImplementation='in-memory' \
--conf spark.driver.extraClassPath=local://${BIGDL_HOME}/jars/bigdl-orca-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar:local://${BIGDL_HOME}/jars/bigdl-dllib-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar:local://${BIGDL_HOME}/jars/bigdl-friesian-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar \
--conf spark.executor.extraClassPath=local://${BIGDL_HOME}/jars/bigdl-orca-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar:local://${BIGDL_HOME}/jars/bigdl-dllib-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar:local://${BIGDL_HOME}/jars/bigdl-friesian-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar \
local:///opt/bigdl-0.14.0-SNAPSHOT/examples/orca/automl/autoxgboost/AutoXGBoostRegressor.py \
--path /home/data/incd.csv
cluster command
${SPARK_HOME}/bin/spark-submit \
--master ${RUNTIME_SPARK_MASTER} \
--deploy-mode cluster \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=${RUNTIME_K8S_SERVICE_ACCOUNT} \
--name analytics-zoo-autoestimator \
--conf spark.kubernetes.container.image=${RUNTIME_K8S_SPARK_IMAGE} \
--conf spark.executor.instances=${RUNTIME_EXECUTOR_INSTANCES} \
--conf spark.kubernetes.driver.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.options.claimName=${RUNTIME_PERSISTENT_VOLUME_CLAIM} \
--conf spark.kubernetes.driver.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.mount.path=/tmp \
--conf spark.kubernetes.executor.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.options.claimName=${RUNTIME_PERSISTENT_VOLUME_CLAIM} \
--conf spark.kubernetes.executor.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.mount.path=/tmp \
--conf spark.kubernetes.driver.label.az=true \
--conf spark.kubernetes.executor.label.az=true \
--conf spark.kubernetes.node.selector.spark=true \
--executor-cores ${RUNTIME_EXECUTOR_CORES} \
--executor-memory ${RUNTIME_EXECUTOR_MEMORY} \
--total-executor-cores ${RUNTIME_TOTAL_EXECUTOR_CORES} \
--driver-cores ${RUNTIME_DRIVER_CORES} \
--driver-memory ${RUNTIME_DRIVER_MEMORY} \
--properties-file ${BIGDL_HOME}/conf/spark-bigdl.conf \
--py-files local://${BIGDL_HOME}/python/bigdl-friesian-spark_${SPARK_VERSION}-${BIGDL_VERSION}-python-api.zip,local://${BIGDL_HOME}/python/bigdl-serving-spark_${SPARK_VERSION}-${BIGDL_VERSION}-python-api.zip,local://${BIGDL_HOME}/python/bigdl-dllib-spark_${SPARK_VERSION}-${BIGDL_VERSION}-python-api.zip,local://${BIGDL_HOME}/python/bigdl-orca-spark_${SPARK_VERSION}-${BIGDL_VERSION}-python-api.zip,local:///opt/bigdl-0.14.0-SNAPSHOT/examples/orca/automl/autoxgboost/AutoXGBoostRegressor.py \
--conf spark.driver.extraJavaOptions=-Dderby.stream.error.file=/tmp \
--conf spark.sql.catalogImplementation='in-memory' \
--conf spark.driver.extraClassPath=local://${BIGDL_HOME}/jars/bigdl-orca-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar:local://${BIGDL_HOME}/jars/bigdl-dllib-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar:local://${BIGDL_HOME}/jars/bigdl-friesian-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar \
--conf spark.executor.extraClassPath=local://${BIGDL_HOME}/jars/bigdl-orca-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar:local://${BIGDL_HOME}/jars/bigdl-dllib-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar:local://${BIGDL_HOME}/jars/bigdl-friesian-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar \
local:///opt/bigdl-0.14.0-SNAPSHOT/examples/orca/automl/autoxgboost/AutoXGBoostRegressor.py \
--path /home/data/incd.csv
Client Exception
Traceback (most recent call last):
File "/opt/bigdl-0.14.0-SNAPSHOT/examples/orca/automl/autoxgboost/AutoXGBoostRegressor.py", line 21, in <module>
from bigdl.orca.automl.xgboost import AutoXGBRegressor
File "<frozen importlib._bootstrap>", line 971, in _find_and_load
File "<frozen importlib._bootstrap>", line 955, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 656, in _load_unlocked
File "<frozen importlib._bootstrap>", line 626, in _load_backward_compatible
File "/opt/bigdl-0.14.0-SNAPSHOT/python/bigdl-friesian-spark_3.1.2-0.14.0-SNAPSHOT-python-api.zip/bigdl/orca/automl/xgboost/__init__.py", line 16, in <module>
File "<frozen importlib._bootstrap>", line 971, in _find_and_load
File "<frozen importlib._bootstrap>", line 955, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 656, in _load_unlocked
File "<frozen importlib._bootstrap>", line 626, in _load_backward_compatible
File "/opt/bigdl-0.14.0-SNAPSHOT/python/bigdl-friesian-spark_3.1.2-0.14.0-SNAPSHOT-python-api.zip/bigdl/orca/automl/xgboost/auto_xgb.py", line 18, in <module>
File "<frozen importlib._bootstrap>", line 971, in _find_and_load
File "<frozen importlib._bootstrap>", line 955, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 656, in _load_unlocked
File "<frozen importlib._bootstrap>", line 626, in _load_backward_compatible
File "/opt/bigdl-0.14.0-SNAPSHOT/python/bigdl-friesian-spark_3.1.2-0.14.0-SNAPSHOT-python-api.zip/bigdl/orca/automl/auto_estimator.py", line 16, in <module>
File "<frozen importlib._bootstrap>", line 971, in _find_and_load
File "<frozen importlib._bootstrap>", line 955, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 656, in _load_unlocked
File "<frozen importlib._bootstrap>", line 626, in _load_backward_compatible
File "/opt/bigdl-0.14.0-SNAPSHOT/python/bigdl-friesian-spark_3.1.2-0.14.0-SNAPSHOT-python-api.zip/bigdl/orca/automl/search/__init__.py", line 18, in <module>
File "<frozen importlib._bootstrap>", line 971, in _find_and_load
File "<frozen importlib._bootstrap>", line 955, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 656, in _load_unlocked
File "<frozen importlib._bootstrap>", line 626, in _load_backward_compatible
File "/opt/bigdl-0.14.0-SNAPSHOT/python/bigdl-friesian-spark_3.1.2-0.14.0-SNAPSHOT-python-api.zip/bigdl/orca/automl/search/tensorboardlogger.py", line 44, in <module>
File "/opt/work/conda/envs/bigdl/lib/python3.6/site-packages/torch/__init__.py", line 189, in <module>
_load_global_deps()
File "/opt/work/conda/envs/bigdl/lib/python3.6/site-packages/torch/__init__.py", line 142, in _load_global_deps
ctypes.CDLL(lib_path, mode=ctypes.RTLD_GLOBAL)
File "/opt/work/conda/envs/bigdl/lib/python3.6/ctypes/__init__.py", line 348, in __init__
self._handle = _dlopen(self._name, mode)
OSError: dlopen: cannot load any more object with static TLS
2021-10-19 08:20:34 INFO ShutdownHookManager:57 - Shutdown hook called
Cluster Exception
2021-10-19 08:21:10 WARN NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Traceback (most recent call last):
File "/opt/bigdl-0.14.0-SNAPSHOT/examples/orca/automl/autoxgboost/AutoXGBoostRegressor.py", line 21, in <module>
from bigdl.orca.automl.xgboost import AutoXGBRegressor
File "/opt/bigdl-0.14.0-SNAPSHOT/python/bigdl-friesian-spark_3.1.2-0.14.0-SNAPSHOT-python-api.zip/bigdl/orca/automl/xgboost/__init__.py", line 16, in <module>
File "/opt/bigdl-0.14.0-SNAPSHOT/python/bigdl-friesian-spark_3.1.2-0.14.0-SNAPSHOT-python-api.zip/bigdl/orca/automl/xgboost/auto_xgb.py", line 17, in <module>
File "/opt/bigdl-0.14.0-SNAPSHOT/python/bigdl-friesian-spark_3.1.2-0.14.0-SNAPSHOT-python-api.zip/bigdl/orca/automl/xgboost/XGBoost.py", line 19, in <module>
ModuleNotFoundError: No module named 'xgboost'
2021-10-19 08:21:12 INFO ShutdownHookManager:57 - Shutdown hook called
2021-10-19 08:21:12 INFO ShutdownHookManager:57 - Deleting directory /tmp/spark-6bafc421-1a84-4071-813a-11728a997ef9
2021-10-19 08:21:12 INFO ShutdownHookManager:57 - Deleting directory /tmp/localPyFiles-aa526875-4444-4a62-9939-395548dacf36
ManfeiBai commented
keras_dataset.py
client command
${SPARK_HOME}/bin/spark-submit \
--master ${RUNTIME_SPARK_MASTER} \
--deploy-mode client \
--conf spark.driver.host=172.16.0.200 \
--conf spark.driver.port=54321 \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=${RUNTIME_K8S_SERVICE_ACCOUNT} \
--name analytics-zoo-autoestimator \
--conf spark.kubernetes.container.image=${RUNTIME_K8S_SPARK_IMAGE} \
--conf spark.executor.instances=${RUNTIME_EXECUTOR_INSTANCES} \
--conf spark.kubernetes.driver.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.options.claimName=${RUNTIME_PERSISTENT_VOLUME_CLAIM} \
--conf spark.kubernetes.driver.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.mount.path=/tmp \
--conf spark.kubernetes.executor.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.options.claimName=${RUNTIME_PERSISTENT_VOLUME_CLAIM} \
--conf spark.kubernetes.executor.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.mount.path=/tmp \
--conf spark.kubernetes.driver.label.az=true \
--conf spark.kubernetes.executor.label.az=true \
--conf spark.kubernetes.node.selector.spark=true \
--executor-cores ${RUNTIME_EXECUTOR_CORES} \
--executor-memory ${RUNTIME_EXECUTOR_MEMORY} \
--total-executor-cores ${RUNTIME_TOTAL_EXECUTOR_CORES} \
--driver-cores ${RUNTIME_DRIVER_CORES} \
--driver-memory ${RUNTIME_DRIVER_MEMORY} \
--properties-file ${BIGDL_HOME}/conf/spark-bigdl.conf \
--py-files local://${BIGDL_HOME}/python/bigdl-friesian-spark_${SPARK_VERSION}-${BIGDL_VERSION}-python-api.zip,local://${BIGDL_HOME}/python/bigdl-serving-spark_${SPARK_VERSION}-${BIGDL_VERSION}-python-api.zip,local://${BIGDL_HOME}/python/bigdl-dllib-spark_${SPARK_VERSION}-${BIGDL_VERSION}-python-api.zip,local://${BIGDL_HOME}/python/bigdl-orca-spark_${SPARK_VERSION}-${BIGDL_VERSION}-python-api.zip,local:///opt/bigdl-0.14.0-SNAPSHOT/examples/orca/tfpark/keras/keras_dataset.py \
--conf spark.driver.extraJavaOptions=-Dderby.stream.error.file=/tmp \
--conf spark.sql.catalogImplementation='in-memory' \
--conf spark.driver.extraClassPath=local://${BIGDL_HOME}/jars/bigdl-orca-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar:local://${BIGDL_HOME}/jars/bigdl-dllib-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar:local://${BIGDL_HOME}/jars/bigdl-friesian-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar \
--conf spark.executor.extraClassPath=local://${BIGDL_HOME}/jars/bigdl-orca-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar:local://${BIGDL_HOME}/jars/bigdl-dllib-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar:local://${BIGDL_HOME}/jars/bigdl-friesian-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar \
local:///opt/bigdl-0.14.0-SNAPSHOT/examples/orca/tfpark/keras/keras_dataset.py
Cluster Command
${SPARK_HOME}/bin/spark-submit \
--master ${RUNTIME_SPARK_MASTER} \
--deploy-mode cluster \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=${RUNTIME_K8S_SERVICE_ACCOUNT} \
--name analytics-zoo-autoestimator \
--conf spark.kubernetes.container.image=${RUNTIME_K8S_SPARK_IMAGE} \
--conf spark.executor.instances=${RUNTIME_EXECUTOR_INSTANCES} \
--conf spark.kubernetes.driver.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.options.claimName=${RUNTIME_PERSISTENT_VOLUME_CLAIM} \
--conf spark.kubernetes.driver.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.mount.path=/tmp \
--conf spark.kubernetes.executor.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.options.claimName=${RUNTIME_PERSISTENT_VOLUME_CLAIM} \
--conf spark.kubernetes.executor.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.mount.path=/tmp \
--conf spark.kubernetes.driver.label.az=true \
--conf spark.kubernetes.executor.label.az=true \
--conf spark.kubernetes.node.selector.spark=true \
--executor-cores ${RUNTIME_EXECUTOR_CORES} \
--executor-memory ${RUNTIME_EXECUTOR_MEMORY} \
--total-executor-cores ${RUNTIME_TOTAL_EXECUTOR_CORES} \
--driver-cores ${RUNTIME_DRIVER_CORES} \
--driver-memory ${RUNTIME_DRIVER_MEMORY} \
--properties-file ${BIGDL_HOME}/conf/spark-bigdl.conf \
--py-files local://${BIGDL_HOME}/python/bigdl-friesian-spark_${SPARK_VERSION}-${BIGDL_VERSION}-python-api.zip,local://${BIGDL_HOME}/python/bigdl-serving-spark_${SPARK_VERSION}-${BIGDL_VERSION}-python-api.zip,local://${BIGDL_HOME}/python/bigdl-dllib-spark_${SPARK_VERSION}-${BIGDL_VERSION}-python-api.zip,local://${BIGDL_HOME}/python/bigdl-orca-spark_${SPARK_VERSION}-${BIGDL_VERSION}-python-api.zip,local:///opt/bigdl-0.14.0-SNAPSHOT/examples/orca/tfpark/keras/keras_dataset.py \
--conf spark.driver.extraJavaOptions=-Dderby.stream.error.file=/tmp \
--conf spark.sql.catalogImplementation='in-memory' \
--conf spark.driver.extraClassPath=local://${BIGDL_HOME}/jars/bigdl-orca-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar:local://${BIGDL_HOME}/jars/bigdl-dllib-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar:local://${BIGDL_HOME}/jars/bigdl-friesian-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar \
--conf spark.executor.extraClassPath=local://${BIGDL_HOME}/jars/bigdl-orca-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar:local://${BIGDL_HOME}/jars/bigdl-dllib-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar:local://${BIGDL_HOME}/jars/bigdl-friesian-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar \
local:///opt/bigdl-0.14.0-SNAPSHOT/examples/orca/tfpark/keras/keras_dataset.py
Client Exception
Downloading data from http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
122880/9912422 [..............................] - ETA: 46564s2021-10-19 09:03:54 WARN WatchConnectionManager:205 - Exec Failure
java.io.EOFException
at okio.RealBufferedSource.require(RealBufferedSource.java:61)
at okio.RealBufferedSource.readByte(RealBufferedSource.java:74)
at okhttp3.internal.ws.WebSocketReader.readHeader(WebSocketReader.java:117)
at okhttp3.internal.ws.WebSocketReader.processNextFrame(WebSocketReader.java:101)
at okhttp3.internal.ws.RealWebSocket.loopReader(RealWebSocket.java:274)
at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:214)
at okhttp3.RealCall$AsyncCall.execute(RealCall.java:203)
at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Cluster Error
BigDLBasePickler registering: bigdl.dllib.utils.common JActivity
Traceback (most recent call last):
File "/opt/bigdl-0.14.0-SNAPSHOT/examples/orca/tfpark/keras/keras_dataset.py", line 86, in <module>
main(max_epoch)
File "/opt/bigdl-0.14.0-SNAPSHOT/examples/orca/tfpark/keras/keras_dataset.py", line 38, in main
training_rdd = get_data_rdd("train", sc)
File "/opt/bigdl-0.14.0-SNAPSHOT/examples/orca/tfpark/keras/keras_dataset.py", line 25, in get_data_rdd
from bigdl.dataset import mnist
ModuleNotFoundError: No module named 'bigdl.dataset'
piaolaidelangman commented
transformer.py
client command
${SPARK_HOME}/bin/spark-submit \
--master ${RUNTIME_SPARK_MASTER} \
--deploy-mode client \
--conf spark.driver.host=172.16.0.200 \
--conf spark.driver.port=54321 \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=${RUNTIME_K8S_SERVICE_ACCOUNT} \
--name analytics-zoo-autoestimator \
--conf spark.kubernetes.container.image=${RUNTIME_K8S_SPARK_IMAGE} \
--conf spark.executor.instances=${RUNTIME_EXECUTOR_INSTANCES} \
--conf spark.kubernetes.driver.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.options.claimName=${RUNTIME_PERSISTENT_VOLUME_CLAIM} \
--conf spark.kubernetes.driver.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.mount.path=/tmp \
--conf spark.kubernetes.executor.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.options.claimName=${RUNTIME_PERSISTENT_VOLUME_CLAIM} \
--conf spark.kubernetes.executor.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.mount.path=/tmp \
--conf spark.kubernetes.driver.label.az=true \
--conf spark.kubernetes.executor.label.az=true \
--conf spark.kubernetes.node.selector.spark=true \
--executor-cores ${RUNTIME_EXECUTOR_CORES} \
--executor-memory ${RUNTIME_EXECUTOR_MEMORY} \
--total-executor-cores ${RUNTIME_TOTAL_EXECUTOR_CORES} \
--driver-cores ${RUNTIME_DRIVER_CORES} \
--driver-memory ${RUNTIME_DRIVER_MEMORY} \
--properties-file ${BIGDL_HOME}/conf/spark-bigdl.conf \
--py-files local://${BIGDL_HOME}/python/bigdl-friesian-spark_${SPARK_VERSION}-${BIGDL_VERSION}-python-api.zip,local://${BIGDL_HOME}/python/bigdl-serving-spark_${SPARK_VERSION}-${BIGDL_VERSION}-python-api.zip,local://${BIGDL_HOME}/python/bigdl-dllib-spark_${SPARK_VERSION}-${BIGDL_VERSION}-python-api.zip,local://${BIGDL_HOME}/python/bigdl-orca-spark_${SPARK_VERSION}-${BIGDL_VERSION}-python-api.zip,local:///opt/bigdl-0.14.0-SNAPSHOT/examples/orca/learn/bigdl/attention/transformer.py \
--conf spark.driver.extraJavaOptions=-Dderby.stream.error.file=/tmp \
--conf spark.sql.catalogImplementation='in-memory' \
--conf spark.driver.extraClassPath=local://${BIGDL_HOME}/jars/bigdl-orca-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar:local://${BIGDL_HOME}/jars/bigdl-dllib-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar:local://${BIGDL_HOME}/jars/bigdl-friesian-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar \
--conf spark.executor.extraClassPath=local://${BIGDL_HOME}/jars/bigdl-orca-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar:local://${BIGDL_HOME}/jars/bigdl-dllib-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar:local://${BIGDL_HOME}/jars/bigdl-friesian-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar \
local:///opt/bigdl-0.14.0-SNAPSHOT/examples/orca/learn/bigdl/attention/transformer.py \
--cluster_mode local
Client Exception
Model Summary:
------------------------------------------------------------------------------------------------------------------------
Layer (type) Output Shape Param # Connected to
========================================================================================================================
Input6211ffde (Input) (None, 200) 0
________________________________________________________________________________________________________________________
Input5e52b20d (Input) (None, 200) 0
________________________________________________________________________________________________________________________
Modelb2501061 (Model) (None, 200, 128) (None, 1 4955776 Input5e52b20d
Input6211ffde
________________________________________________________________________________________________________________________
SelectTable45b65920 (SelectTable) (None, 200, 128) 0 Modelb2501061
________________________________________________________________________________________________________________________
GlobalAveragePooling1D3f41a62a (GlobalA (None, 128) 0 SelectTable45b65920
________________________________________________________________________________________________________________________
Dropout974d65da (Dropout) (None, 128) 0 GlobalAveragePooling1D3f41a62a
________________________________________________________________________________________________________________________
Densef0607e86 (Dense) (None, 2) 258 Dropout974d65da
________________________________________________________________________________________________________________________
Total params: 4,956,034
Trainable params: 4,956,034
Non-trainable params: 0
------------------------------------------------------------------------------------------------------------------------
Train...
creating: createZooKerasSparseCategoricalCrossEntropy
creating: createAdam
creating: createZooKerasAccuracy
creating: createSeqToTensor
creating: createTensorToSample
creating: createChainedPreprocessing
creating: createNNModel
creating: createSeqToTensor
creating: createSeqToTensor
creating: createFeatureLabelPreprocessing
creating: createNNEstimator
creating: createEstimator
creating: createMaxEpoch
2021-10-19 10:12:24 INFO DistriOptimizer$:824 - caching training rdd ...
2021-10-19 10:12:26 ERROR TorrentBroadcast:73 - Store broadcast broadcast_7 fail, remove all pieces of the broadcast
2021-10-19 10:12:26 ERROR TorrentBroadcast:73 - Store broadcast broadcast_7 fail, remove all pieces of the broadcast
2021-10-19 10:12:26 ERROR TorrentBroadcast:73 - Store broadcast broadcast_7 fail, remove all pieces of the broadcast
Traceback (most recent call last):
File "/opt/bigdl-0.14.0-SNAPSHOT/examples/orca/learn/bigdl/attention/transformer.py", line 98, in <module>
epochs=1)
File "/opt/bigdl-0.14.0-SNAPSHOT/python/bigdl-friesian-spark_3.1.2-0.14.0-SNAPSHOT-python-api.zip/bigdl/orca/learn/bigdl/estimator.py", line 181, in fit
File "/opt/bigdl-0.14.0-SNAPSHOT/python/bigdl-friesian-spark_3.1.2-0.14.0-SNAPSHOT-python-api.zip/bigdl/dllib/estimator/estimator.py", line 145, in train
File "/opt/bigdl-0.14.0-SNAPSHOT/python/bigdl-friesian-spark_3.1.2-0.14.0-SNAPSHOT-python-api.zip/bigdl/dllib/utils/file_utils.py", line 164, in callZooFunc
File "/opt/bigdl-0.14.0-SNAPSHOT/python/bigdl-friesian-spark_3.1.2-0.14.0-SNAPSHOT-python-api.zip/bigdl/dllib/utils/file_utils.py", line 158, in callZooFunc
File "/opt/work/spark-3.1.2/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 1305, in __call__
File "/opt/work/spark-3.1.2/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py", line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o71.estimatorTrain.
: java.lang.StackOverflowError
at java.io.ObjectOutputStream$BlockDataOutputStream.drain(ObjectOutputStream.java:1877)
at java.io.ObjectOutputStream$BlockDataOutputStream.write(ObjectOutputStream.java:1841)
at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1534)
at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
at java.io.ObjectOutputStream.writeArray(ObjectOutputStream.java:1378)
......
at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
Stopping orca context
Try to unpersist an uncached rdd
Try to unpersist an uncached rdd
Cluster Command
${SPARK_HOME}/bin/spark-submit \
--master ${RUNTIME_SPARK_MASTER} \
--deploy-mode cluster \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=${RUNTIME_K8S_SERVICE_ACCOUNT} \
--name analytics-zoo-autoestimator \
--conf spark.kubernetes.container.image=${RUNTIME_K8S_SPARK_IMAGE} \
--conf spark.executor.instances=${RUNTIME_EXECUTOR_INSTANCES} \
--conf spark.kubernetes.driver.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.options.claimName=${RUNTIME_PERSISTENT_VOLUME_CLAIM} \
--conf spark.kubernetes.driver.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.mount.path=/tmp \
--conf spark.kubernetes.executor.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.options.claimName=${RUNTIME_PERSISTENT_VOLUME_CLAIM} \
--conf spark.kubernetes.executor.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.mount.path=/tmp \
--conf spark.kubernetes.driver.label.az=true \
--conf spark.kubernetes.executor.label.az=true \
--conf spark.kubernetes.node.selector.spark=true \
--conf spark.kubernetes.driverEnv.http_proxy=${http_proxy} \
--conf spark.kubernetes.driverEnv.https_proxy=${https_proxy} \
--conf spark.kubernetes.executorEnv.http_proxy=${http_proxy} \
--conf spark.kubernetes.executorEnv.https_proxy=${https_proxy} \
--executor-cores ${RUNTIME_EXECUTOR_CORES} \
--executor-memory ${RUNTIME_EXECUTOR_MEMORY} \
--total-executor-cores ${RUNTIME_TOTAL_EXECUTOR_CORES} \
--driver-cores ${RUNTIME_DRIVER_CORES} \
--driver-memory ${RUNTIME_DRIVER_MEMORY} \
--properties-file ${BIGDL_HOME}/conf/spark-bigdl.conf \
--py-files local://${BIGDL_HOME}/python/bigdl-friesian-spark_${SPARK_VERSION}-${BIGDL_VERSION}-python-api.zip,local://${BIGDL_HOME}/python/bigdl-serving-spark_${SPARK_VERSION}-${BIGDL_VERSION}-python-api.zip,local://${BIGDL_HOME}/python/bigdl-dllib-spark_${SPARK_VERSION}-${BIGDL_VERSION}-python-api.zip,local://${BIGDL_HOME}/python/bigdl-orca-spark_${SPARK_VERSION}-${BIGDL_VERSION}-python-api.zip,local:///opt/bigdl-0.14.0-SNAPSHOT/examples/orca/learn/bigdl/attention/transformer.py \
--conf spark.driver.extraJavaOptions=-Dderby.stream.error.file=/tmp \
--conf spark.sql.catalogImplementation='in-memory' \
--conf spark.driver.extraClassPath=local://${BIGDL_HOME}/jars/bigdl-orca-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar:local://${BIGDL_HOME}/jars/bigdl-dllib-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar:local://${BIGDL_HOME}/jars/bigdl-friesian-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar \
--conf spark.executor.extraClassPath=local://${BIGDL_HOME}/jars/bigdl-orca-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar:local://${BIGDL_HOME}/jars/bigdl-dllib-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar:local://${BIGDL_HOME}/jars/bigdl-friesian-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar \
local:///opt/bigdl-0.14.0-SNAPSHOT/examples/orca/learn/bigdl/attention/transformer.py \
--cluster_mode local
Cluster Exception
2021-10-20 11:08:58 INFO DistriOptimizer$:824 - caching training rdd ...
2021-10-20 11:09:00 ERROR TorrentBroadcast:73 - Store broadcast broadcast_7 fail, remove all pieces of the broadcast
2021-10-20 11:09:00 ERROR TorrentBroadcast:73 - Store broadcast broadcast_7 fail, remove all pieces of the broadcast
2021-10-20 11:09:00 ERROR TorrentBroadcast:73 - Store broadcast broadcast_7 fail, remove all pieces of the broadcast
Traceback (most recent call last):
File "/opt/bigdl-0.14.0-SNAPSHOT/examples/orca/learn/bigdl/attention/transformer.py", line 98, in <module>
epochs=1)
File "/opt/bigdl-0.14.0-SNAPSHOT/python/bigdl-friesian-spark_3.1.2-0.14.0-SNAPSHOT-python-api.zip/bigdl/orca/learn/bigdl/estimator.py", line 181, in fit
File "/opt/bigdl-0.14.0-SNAPSHOT/python/bigdl-friesian-spark_3.1.2-0.14.0-SNAPSHOT-python-api.zip/bigdl/dllib/estimator/estimator.py", line 145, in train
File "/opt/bigdl-0.14.0-SNAPSHOT/python/bigdl-friesian-spark_3.1.2-0.14.0-SNAPSHOT-python-api.zip/bigdl/dllib/utils/file_utils.py", line 164, in callZooFunc
File "/opt/bigdl-0.14.0-SNAPSHOT/python/bigdl-friesian-spark_3.1.2-0.14.0-SNAPSHOT-python-api.zip/bigdl/dllib/utils/file_utils.py", line 158, in callZooFunc
File "/opt/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 1305, in __call__
File "/opt/spark/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py", line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o83.estimatorTrain.
: java.lang.StackOverflowError
at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
......
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1174)
at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
Stopping orca context
Exception ignored in: <bound method SparkXShards.__del__ of <bigdl.orca.data.shard.SparkXShards object at 0x7f0e51886198>>
Traceback (most recent call last):
File "/opt/bigdl-0.14.0-SNAPSHOT/python/bigdl-friesian-spark_3.1.2-0.14.0-SNAPSHOT-python-api.zip/bigdl/orca/data/shard.py", line 430, in __del__
File "/opt/bigdl-0.14.0-SNAPSHOT/python/bigdl-friesian-spark_3.1.2-0.14.0-SNAPSHOT-python-api.zip/bigdl/orca/data/shard.py", line 194, in uncache
File "/opt/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 315, in unpersist
File "/opt/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 1296, in __call__
File "/opt/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 1260, in _build_args
File "/opt/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 1246, in _get_args
File "/opt/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_collections.py", line 490, in can_convert
File "/usr/lib/python3.6/abc.py", line 189, in __instancecheck__
AttributeError: 'NoneType' object has no attribute '_abc_invalidation_counter'
Exception ignored in: <bound method SparkXShards.__del__ of <bigdl.orca.data.shard.SparkXShards object at 0x7f0e51881828>>
Traceback (most recent call last):
File "/opt/bigdl-0.14.0-SNAPSHOT/python/bigdl-friesian-spark_3.1.2-0.14.0-SNAPSHOT-python-api.zip/bigdl/orca/data/shard.py", line 430, in __del__
File "/opt/bigdl-0.14.0-SNAPSHOT/python/bigdl-friesian-spark_3.1.2-0.14.0-SNAPSHOT-python-api.zip/bigdl/orca/data/shard.py", line 194, in uncache
File "/opt/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 315, in unpersist
File "/opt/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 1296, in __call__
File "/opt/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 1260, in _build_args
File "/opt/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 1246, in _get_args
File "/opt/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_collections.py", line 490, in can_convert
File "/usr/lib/python3.6/abc.py", line 189, in __instancecheck__
AttributeError: 'NoneType' object has no attribute '_abc_invalidation_counter'
piaolaidelangman commented
imageInference.py
Client Command
${SPARK_HOME}/bin/spark-submit \
--master ${RUNTIME_SPARK_MASTER} \
--deploy-mode client \
--conf spark.driver.host=172.16.0.200 \
--conf spark.driver.port=54321 \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=${RUNTIME_K8S_SERVICE_ACCOUNT} \
--name analytics-zoo-autoestimator \
--conf spark.kubernetes.container.image=${RUNTIME_K8S_SPARK_IMAGE} \
--conf spark.executor.instances=${RUNTIME_EXECUTOR_INSTANCES} \
--conf spark.kubernetes.driver.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.options.claimName=${RUNTIME_PERSISTENT_VOLUME_CLAIM} \
--conf spark.kubernetes.driver.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.mount.path=/tmp \
--conf spark.kubernetes.executor.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.options.claimName=${RUNTIME_PERSISTENT_VOLUME_CLAIM} \
--conf spark.kubernetes.executor.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.mount.path=/tmp \
--conf spark.kubernetes.driver.label.az=true \
--conf spark.kubernetes.executor.label.az=true \
--conf spark.kubernetes.node.selector.spark=true \
--executor-cores ${RUNTIME_EXECUTOR_CORES} \
--executor-memory ${RUNTIME_EXECUTOR_MEMORY} \
--total-executor-cores ${RUNTIME_TOTAL_EXECUTOR_CORES} \
--driver-cores ${RUNTIME_DRIVER_CORES} \
--driver-memory ${RUNTIME_DRIVER_MEMORY} \
--properties-file ${BIGDL_HOME}/conf/spark-bigdl.conf \
--py-files local://${BIGDL_HOME}/python/bigdl-friesian-spark_${SPARK_VERSION}-${BIGDL_VERSION}-python-api.zip,local://${BIGDL_HOME}/python/bigdl-serving-spark_${SPARK_VERSION}-${BIGDL_VERSION}-python-api.zip,local://${BIGDL_HOME}/python/bigdl-dllib-spark_${SPARK_VERSION}-${BIGDL_VERSION}-python-api.zip,local://${BIGDL_HOME}/python/bigdl-orca-spark_${SPARK_VERSION}-${BIGDL_VERSION}-python-api.zip,local:///opt/bigdl-0.14.0-SNAPSHOT/examples/orca/learn/bigdl/imageInference/imageInference.py \
--conf spark.driver.extraJavaOptions=-Dderby.stream.error.file=/tmp \
--conf spark.sql.catalogImplementation='in-memory' \
--conf spark.driver.extraClassPath=local://${BIGDL_HOME}/jars/bigdl-orca-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar:local://${BIGDL_HOME}/jars/bigdl-dllib-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar:local://${BIGDL_HOME}/jars/bigdl-friesian-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar \
--conf spark.executor.extraClassPath=local://${BIGDL_HOME}/jars/bigdl-orca-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar:local://${BIGDL_HOME}/jars/bigdl-dllib-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar:local://${BIGDL_HOME}/jars/bigdl-friesian-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar \
local:///opt/bigdl-0.14.0-SNAPSHOT/examples/orca/learn/bigdl/imageInference/imageInference.py \
-m /tmp/data2/analytics-zoo-models/bigdl_inception-v1_imagenet_0.4.0.model \
-f /tmp/data2/nnframes/samples \
--b 32
Client Exception
Traceback (most recent call last):
File "/opt/bigdl-0.14.0-SNAPSHOT/examples/orca/learn/bigdl/imageInference/imageInference.py", line 83, in <module>
predictionDF = inference(image_path, model_path, batch_size, sc) \
File "/opt/bigdl-0.14.0-SNAPSHOT/examples/orca/learn/bigdl/imageInference/imageInference.py", line 36, in inference
model = Model.loadModel(model_path)
File "/opt/work/conda/envs/bigdl/lib/python3.6/site-packages/bigdl/nn/layer.py", line 776, in loadModel
jmodel = callBigDlFunc(bigdl_type, "loadBigDLModule", modelPath, weightPath)
File "/opt/work/conda/envs/bigdl/lib/python3.6/site-packages/bigdl/util/common.py", line 592, in callBigDlFunc
for jinvoker in JavaCreator.instance(bigdl_type, gateway).value:
File "/opt/work/conda/envs/bigdl/lib/python3.6/site-packages/bigdl/util/common.py", line 56, in instance
cls._instance = cls(bigdl_type, *args)
File "/opt/work/conda/envs/bigdl/lib/python3.6/site-packages/bigdl/util/common.py", line 96, in __init__
self.value.append(getattr(jclass, "ofFloat")())
TypeError: 'JavaPackage' object is not callable
Stopping orca context
Cluster Command
${SPARK_HOME}/bin/spark-submit \
--master ${RUNTIME_SPARK_MASTER} \
--deploy-mode cluster \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=${RUNTIME_K8S_SERVICE_ACCOUNT} \
--name analytics-zoo-autoestimator \
--conf spark.kubernetes.container.image=${RUNTIME_K8S_SPARK_IMAGE} \
--conf spark.executor.instances=${RUNTIME_EXECUTOR_INSTANCES} \
--conf spark.kubernetes.driver.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.options.claimName=${RUNTIME_PERSISTENT_VOLUME_CLAIM} \
--conf spark.kubernetes.driver.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.mount.path=/tmp \
--conf spark.kubernetes.executor.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.options.claimName=${RUNTIME_PERSISTENT_VOLUME_CLAIM} \
--conf spark.kubernetes.executor.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.mount.path=/tmp \
--conf spark.kubernetes.driver.label.az=true \
--conf spark.kubernetes.executor.label.az=true \
--conf spark.kubernetes.node.selector.spark=true \
--executor-cores ${RUNTIME_EXECUTOR_CORES} \
--executor-memory ${RUNTIME_EXECUTOR_MEMORY} \
--total-executor-cores ${RUNTIME_TOTAL_EXECUTOR_CORES} \
--driver-cores ${RUNTIME_DRIVER_CORES} \
--driver-memory ${RUNTIME_DRIVER_MEMORY} \
--properties-file ${BIGDL_HOME}/conf/spark-bigdl.conf \
--py-files local://${BIGDL_HOME}/python/bigdl-friesian-spark_${SPARK_VERSION}-${BIGDL_VERSION}-python-api.zip,local://${BIGDL_HOME}/python/bigdl-serving-spark_${SPARK_VERSION}-${BIGDL_VERSION}-python-api.zip,local://${BIGDL_HOME}/python/bigdl-dllib-spark_${SPARK_VERSION}-${BIGDL_VERSION}-python-api.zip,local://${BIGDL_HOME}/python/bigdl-orca-spark_${SPARK_VERSION}-${BIGDL_VERSION}-python-api.zip,local:///opt/bigdl-0.14.0-SNAPSHOT/examples/orca/learn/bigdl/imageInference/imageInference.py \
--conf spark.driver.extraJavaOptions=-Dderby.stream.error.file=/tmp \
--conf spark.sql.catalogImplementation='in-memory' \
--conf spark.driver.extraClassPath=local://${BIGDL_HOME}/jars/bigdl-orca-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar:local://${BIGDL_HOME}/jars/bigdl-dllib-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar:local://${BIGDL_HOME}/jars/bigdl-friesian-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar \
--conf spark.executor.extraClassPath=local://${BIGDL_HOME}/jars/bigdl-orca-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar:local://${BIGDL_HOME}/jars/bigdl-dllib-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar:local://${BIGDL_HOME}/jars/bigdl-friesian-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar \
local:///opt/bigdl-0.14.0-SNAPSHOT/examples/orca/learn/bigdl/imageInference/imageInference.py \
-m /tmp/data2/analytics-zoo-models/bigdl_inception-v1_imagenet_0.4.0.model \
-f /tmp/data2/nnframes/samples \
--b 32
Cluster Exception
Traceback (most recent call last):
File "/opt/bigdl-0.14.0-SNAPSHOT/examples/orca/learn/bigdl/imageInference/imageInference.py", line 19, in <module>
from bigdl.nn.layer import Model
ModuleNotFoundError: No module named 'bigdl.nn'
2021-10-20 01:41:01 INFO ShutdownHookManager:57 - Shutdown hook called
2021-10-20 01:41:01 INFO ShutdownHookManager:57 - Deleting directory /tmp/spark-a7cc8a62-166c-4e11-8e5c-26f02d941222
2021-10-20 01:41:01 INFO ShutdownHookManager:57 - Deleting directory /tmp/localPyFiles-756c842f-335c-46f8-a236-2fb9f779aa4b
piaolaidelangman commented
lenet_mnist.py
Cluster Command
${SPARK_HOME}/bin/spark-submit \
--master ${RUNTIME_SPARK_MASTER} \
--deploy-mode cluster \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=${RUNTIME_K8S_SERVICE_ACCOUNT} \
--name analytics-zoo-autoestimator \
--conf spark.kubernetes.container.image=${RUNTIME_K8S_SPARK_IMAGE} \
--conf spark.executor.instances=${RUNTIME_EXECUTOR_INSTANCES} \
--conf spark.kubernetes.driver.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.options.claimName=${RUNTIME_PERSISTENT_VOLUME_CLAIM} \
--conf spark.kubernetes.driver.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.mount.path=/tmp \
--conf spark.kubernetes.executor.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.options.claimName=${RUNTIME_PERSISTENT_VOLUME_CLAIM} \
--conf spark.kubernetes.executor.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.mount.path=/tmp \
--conf spark.kubernetes.driver.label.az=true \
--conf spark.kubernetes.executor.label.az=true \
--conf spark.kubernetes.node.selector.spark=true \
--executor-cores ${RUNTIME_EXECUTOR_CORES} \
--executor-memory ${RUNTIME_EXECUTOR_MEMORY} \
--total-executor-cores ${RUNTIME_TOTAL_EXECUTOR_CORES} \
--driver-cores ${RUNTIME_DRIVER_CORES} \
--driver-memory ${RUNTIME_DRIVER_MEMORY} \
--properties-file ${BIGDL_HOME}/conf/spark-bigdl.conf \
--py-files local://${BIGDL_HOME}/python/bigdl-friesian-spark_${SPARK_VERSION}-${BIGDL_VERSION}-python-api.zip,local://${BIGDL_HOME}/python/bigdl-serving-spark_${SPARK_VERSION}-${BIGDL_VERSION}-python-api.zip,local://${BIGDL_HOME}/python/bigdl-dllib-spark_${SPARK_VERSION}-${BIGDL_VERSION}-python-api.zip,local://${BIGDL_HOME}/python/bigdl-orca-spark_${SPARK_VERSION}-${BIGDL_VERSION}-python-api.zip,local:///opt/bigdl-0.14.0-SNAPSHOT/examples/orca/learn/mxnet/lenet_mnist.py \
--conf spark.driver.extraJavaOptions=-Dderby.stream.error.file=/tmp \
--conf spark.sql.catalogImplementation='in-memory' \
--conf spark.driver.extraClassPath=local://${BIGDL_HOME}/jars/bigdl-orca-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar:local://${BIGDL_HOME}/jars/bigdl-dllib-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar:local://${BIGDL_HOME}/jars/bigdl-friesian-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar \
--conf spark.executor.extraClassPath=local://${BIGDL_HOME}/jars/bigdl-orca-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar:local://${BIGDL_HOME}/jars/bigdl-dllib-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar:local://${BIGDL_HOME}/jars/bigdl-friesian-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar \
local:///opt/bigdl-0.14.0-SNAPSHOT/examples/orca/learn/mxnet/lenet_mnist.py \
-e 2
Cluster Exception
Traceback (most recent call last):
File "/opt/bigdl-0.14.0-SNAPSHOT/examples/orca/learn/mxnet/lenet_mnist.py", line 22, in <module>
from bigdl.orca.learn.mxnet import Estimator, create_config
File "/opt/bigdl-0.14.0-SNAPSHOT/python/bigdl-friesian-spark_3.1.2-0.14.0-SNAPSHOT-python-api.zip/bigdl/orca/learn/mxnet/__init__.py", line 17, in <module>
File "/opt/bigdl-0.14.0-SNAPSHOT/python/bigdl-friesian-spark_3.1.2-0.14.0-SNAPSHOT-python-api.zip/bigdl/orca/learn/mxnet/estimator.py", line 22, in <module>
ModuleNotFoundError: No module named 'dmlc_tracker'
2021-10-20 01:56:35 INFO ShutdownHookManager:57 - Shutdown hook called
2021-10-20 01:56:35 INFO ShutdownHookManager:57 - Deleting directory /tmp/localPyFiles-09eb1333-bc74-4b83-8d8d-c1ba7c8825ce
2021-10-20 01:56:35 INFO ShutdownHookManager:57 - Deleting directory /tmp/spark-5b880701-7902-4bec-bae6-cae810ddce26
Client command (Client没问题)
${SPARK_HOME}/bin/spark-submit \
--master ${RUNTIME_SPARK_MASTER} \
--deploy-mode client \
--conf spark.driver.host=172.16.0.200 \
--conf spark.driver.port=54321 \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=${RUNTIME_K8S_SERVICE_ACCOUNT} \
--name analytics-zoo-autoestimator \
--conf spark.kubernetes.container.image=${RUNTIME_K8S_SPARK_IMAGE} \
--conf spark.executor.instances=${RUNTIME_EXECUTOR_INSTANCES} \
--conf spark.kubernetes.driver.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.options.claimName=${RUNTIME_PERSISTENT_VOLUME_CLAIM} \
--conf spark.kubernetes.driver.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.mount.path=/tmp \
--conf spark.kubernetes.executor.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.options.claimName=${RUNTIME_PERSISTENT_VOLUME_CLAIM} \
--conf spark.kubernetes.executor.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.mount.path=/tmp \
--conf spark.kubernetes.driver.label.az=true \
--conf spark.kubernetes.executor.label.az=true \
--conf spark.kubernetes.node.selector.spark=true \
--executor-cores ${RUNTIME_EXECUTOR_CORES} \
--executor-memory ${RUNTIME_EXECUTOR_MEMORY} \
--total-executor-cores ${RUNTIME_TOTAL_EXECUTOR_CORES} \
--driver-cores ${RUNTIME_DRIVER_CORES} \
--driver-memory ${RUNTIME_DRIVER_MEMORY} \
--properties-file ${BIGDL_HOME}/conf/spark-bigdl.conf \
--py-files local://${BIGDL_HOME}/python/bigdl-friesian-spark_${SPARK_VERSION}-${BIGDL_VERSION}-python-api.zip,local://${BIGDL_HOME}/python/bigdl-serving-spark_${SPARK_VERSION}-${BIGDL_VERSION}-python-api.zip,local://${BIGDL_HOME}/python/bigdl-dllib-spark_${SPARK_VERSION}-${BIGDL_VERSION}-python-api.zip,local://${BIGDL_HOME}/python/bigdl-orca-spark_${SPARK_VERSION}-${BIGDL_VERSION}-python-api.zip,local:///opt/bigdl-0.14.0-SNAPSHOT/examples/orca/learn/mxnet/lenet_mnist.py \
--conf spark.driver.extraJavaOptions=-Dderby.stream.error.file=/tmp \
--conf spark.sql.catalogImplementation='in-memory' \
--conf spark.driver.extraClassPath=local://${BIGDL_HOME}/jars/bigdl-orca-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar:local://${BIGDL_HOME}/jars/bigdl-dllib-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar:local://${BIGDL_HOME}/jars/bigdl-friesian-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar \
--conf spark.executor.extraClassPath=local://${BIGDL_HOME}/jars/bigdl-orca-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar:local://${BIGDL_HOME}/jars/bigdl-dllib-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar:local://${BIGDL_HOME}/jars/bigdl-friesian-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar \
local:///opt/bigdl-0.14.0-SNAPSHOT/examples/orca/learn/mxnet/lenet_mnist.py \
-e 2
ManfeiBai commented
train.py
train.py's name need to be updated in the readme
Client Command
${SPARK_HOME}/bin/spark-submit \
--master ${RUNTIME_SPARK_MASTER} \
--deploy-mode client \
--conf spark.driver.host=172.16.0.200 \
--conf spark.driver.port=54321 \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=${RUNTIME_K8S_SERVICE_ACCOUNT} \
--name analytics-zoo-autoestimator \
--conf spark.kubernetes.container.image=${RUNTIME_K8S_SPARK_IMAGE} \
--conf spark.executor.instances=${RUNTIME_EXECUTOR_INSTANCES} \
--conf spark.kubernetes.driver.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.options.claimName=${RUNTIME_PERSISTENT_VOLUME_CLAIM} \
--conf spark.kubernetes.driver.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.mount.path=/tmp \
--conf spark.kubernetes.executor.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.options.claimName=${RUNTIME_PERSISTENT_VOLUME_CLAIM} \
--conf spark.kubernetes.executor.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.mount.path=/tmp \
--conf spark.kubernetes.driver.label.az=true \
--conf spark.kubernetes.executor.label.az=true \
--conf spark.kubernetes.node.selector.spark=true \
--executor-cores ${RUNTIME_EXECUTOR_CORES} \
--executor-memory ${RUNTIME_EXECUTOR_MEMORY} \
--total-executor-cores ${RUNTIME_TOTAL_EXECUTOR_CORES} \
--driver-cores ${RUNTIME_DRIVER_CORES} \
--driver-memory ${RUNTIME_DRIVER_MEMORY} \
--properties-file ${BIGDL_HOME}/conf/spark-bigdl.conf \
--py-files local://${BIGDL_HOME}/python/bigdl-friesian-spark_${SPARK_VERSION}-${BIGDL_VERSION}-python-api.zip,local://${BIGDL_HOME}/python/bigdl-serving-spark_${SPARK_VERSION}-${BIGDL_VERSION}-python-api.zip,local://${BIGDL_HOME}/python/bigdl-dllib-spark_${SPARK_VERSION}-${BIGDL_VERSION}-python-api.zip,local://${BIGDL_HOME}/python/bigdl-orca-spark_${SPARK_VERSION}-${BIGDL_VERSION}-python-api.zip,local:///opt/bigdl-0.14.0-SNAPSHOT/examples/orca/tfpark/tf_optimizer/train.py \
--conf spark.driver.extraJavaOptions=-Dderby.stream.error.file=/tmp \
--conf spark.sql.catalogImplementation='in-memory' \
--conf spark.driver.extraClassPath=local://${BIGDL_HOME}/jars/bigdl-orca-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar:local://${BIGDL_HOME}/jars/bigdl-dllib-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar:local://${BIGDL_HOME}/jars/bigdl-friesian-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar \
--conf spark.executor.extraClassPath=local://${BIGDL_HOME}/jars/bigdl-orca-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar:local://${BIGDL_HOME}/jars/bigdl-dllib-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar:local://${BIGDL_HOME}/jars/bigdl-friesian-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar \
local:///opt/bigdl-0.14.0-SNAPSHOT/examples/orca/tfpark/tf_optimizer/train.py
Client Exception
Traceback (most recent call last):
File "/opt/bigdl-0.14.0-SNAPSHOT/examples/orca/tfpark/tf_optimizer/train.py", line 19, in <module>
from bigdl.optim.optimizer import *
ModuleNotFoundError: No module named 'bigdl.optim'
cluster command
${SPARK_HOME}/bin/spark-submit \
--master ${RUNTIME_SPARK_MASTER} \
--deploy-mode cluster \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=${RUNTIME_K8S_SERVICE_ACCOUNT} \
--name analytics-zoo-autoestimator \
--conf spark.kubernetes.container.image=${RUNTIME_K8S_SPARK_IMAGE} \
--conf spark.executor.instances=${RUNTIME_EXECUTOR_INSTANCES} \
--conf spark.kubernetes.driver.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.options.claimName=${RUNTIME_PERSISTENT_VOLUME_CLAIM} \
--conf spark.kubernetes.driver.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.mount.path=/tmp \
--conf spark.kubernetes.executor.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.options.claimName=${RUNTIME_PERSISTENT_VOLUME_CLAIM} \
--conf spark.kubernetes.executor.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.mount.path=/tmp \
--conf spark.kubernetes.driver.label.az=true \
--conf spark.kubernetes.executor.label.az=true \
--conf spark.kubernetes.node.selector.spark=true \
--executor-cores ${RUNTIME_EXECUTOR_CORES} \
--executor-memory ${RUNTIME_EXECUTOR_MEMORY} \
--total-executor-cores ${RUNTIME_TOTAL_EXECUTOR_CORES} \
--driver-cores ${RUNTIME_DRIVER_CORES} \
--driver-memory ${RUNTIME_DRIVER_MEMORY} \
--properties-file ${BIGDL_HOME}/conf/spark-bigdl.conf \
--py-files local://${BIGDL_HOME}/python/bigdl-friesian-spark_${SPARK_VERSION}-${BIGDL_VERSION}-python-api.zip,local://${BIGDL_HOME}/python/bigdl-serving-spark_${SPARK_VERSION}-${BIGDL_VERSION}-python-api.zip,local://${BIGDL_HOME}/python/bigdl-dllib-spark_${SPARK_VERSION}-${BIGDL_VERSION}-python-api.zip,local://${BIGDL_HOME}/python/bigdl-orca-spark_${SPARK_VERSION}-${BIGDL_VERSION}-python-api.zip,local:///opt/bigdl-0.14.0-SNAPSHOT/examples/orca/tfpark/tf_optimizer/train.py \
--conf spark.driver.extraJavaOptions=-Dderby.stream.error.file=/tmp \
--conf spark.sql.catalogImplementation='in-memory' \
--conf spark.driver.extraClassPath=local://${BIGDL_HOME}/jars/bigdl-orca-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar:local://${BIGDL_HOME}/jars/bigdl-dllib-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar:local://${BIGDL_HOME}/jars/bigdl-friesian-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar \
--conf spark.executor.extraClassPath=local://${BIGDL_HOME}/jars/bigdl-orca-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar:local://${BIGDL_HOME}/jars/bigdl-dllib-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar:local://${BIGDL_HOME}/jars/bigdl-friesian-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar \
local:///opt/bigdl-0.14.0-SNAPSHOT/examples/orca/tfpark/tf_optimizer/train.py
Cluster Exception
Traceback (most recent call last):
File "/opt/bigdl-0.14.0-SNAPSHOT/examples/orca/tfpark/tf_optimizer/train.py", line 19, in <module>
from bigdl.optim.optimizer import *
ModuleNotFoundError: No module named 'bigdl.optim'
ManfeiBai commented
gan_train_and_evaluate.py
Client Command
${SPARK_HOME}/bin/spark-submit \
--master ${RUNTIME_SPARK_MASTER} \
--deploy-mode client \
--conf spark.driver.host=172.16.0.200 \
--conf spark.driver.port=54321 \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=${RUNTIME_K8S_SERVICE_ACCOUNT} \
--name analytics-zoo-autoestimator \
--conf spark.kubernetes.container.image=${RUNTIME_K8S_SPARK_IMAGE} \
--conf spark.executor.instances=${RUNTIME_EXECUTOR_INSTANCES} \
--conf spark.kubernetes.driver.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.options.claimName=${RUNTIME_PERSISTENT_VOLUME_CLAIM} \
--conf spark.kubernetes.driver.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.mount.path=/tmp \
--conf spark.kubernetes.executor.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.options.claimName=${RUNTIME_PERSISTENT_VOLUME_CLAIM} \
--conf spark.kubernetes.executor.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.mount.path=/tmp \
--conf spark.kubernetes.driver.label.az=true \
--conf spark.kubernetes.executor.label.az=true \
--conf spark.kubernetes.node.selector.spark=true \
--executor-cores ${RUNTIME_EXECUTOR_CORES} \
--executor-memory ${RUNTIME_EXECUTOR_MEMORY} \
--total-executor-cores ${RUNTIME_TOTAL_EXECUTOR_CORES} \
--driver-cores ${RUNTIME_DRIVER_CORES} \
--driver-memory ${RUNTIME_DRIVER_MEMORY} \
--properties-file ${BIGDL_HOME}/conf/spark-bigdl.conf \
--py-files local://${BIGDL_HOME}/python/bigdl-friesian-spark_${SPARK_VERSION}-${BIGDL_VERSION}-python-api.zip,local://${BIGDL_HOME}/python/bigdl-serving-spark_${SPARK_VERSION}-${BIGDL_VERSION}-python-api.zip,local://${BIGDL_HOME}/python/bigdl-dllib-spark_${SPARK_VERSION}-${BIGDL_VERSION}-python-api.zip,local://${BIGDL_HOME}/python/bigdl-orca-spark_${SPARK_VERSION}-${BIGDL_VERSION}-python-api.zip,local:///opt/bigdl-0.14.0-SNAPSHOT/examples/orca/tfpark/gan/gan_train_and_evaluate.py \
--conf spark.driver.extraJavaOptions=-Dderby.stream.error.file=/tmp \
--conf spark.sql.catalogImplementation='in-memory' \
--conf spark.driver.extraClassPath=local://${BIGDL_HOME}/jars/bigdl-orca-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar:local://${BIGDL_HOME}/jars/bigdl-dllib-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar:local://${BIGDL_HOME}/jars/bigdl-friesian-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar \
--conf spark.executor.extraClassPath=local://${BIGDL_HOME}/jars/bigdl-orca-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar:local://${BIGDL_HOME}/jars/bigdl-dllib-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar:local://${BIGDL_HOME}/jars/bigdl-friesian-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar \
local:///opt/bigdl-0.14.0-SNAPSHOT/examples/orca/tfpark/gan/gan_train_and_evaluate.py
Client Exception
Traceback (most recent call last):
File "/opt/bigdl-0.14.0-SNAPSHOT/examples/orca/tfpark/gan/gan_train_and_evaluate.py", line 16, in <module>
from bigdl.optim.optimizer import MaxIteration
ModuleNotFoundError: No module named 'bigdl.optim'
cluster command
${SPARK_HOME}/bin/spark-submit \
--master ${RUNTIME_SPARK_MASTER} \
--deploy-mode cluster \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=${RUNTIME_K8S_SERVICE_ACCOUNT} \
--name analytics-zoo-autoestimator \
--conf spark.kubernetes.container.image=${RUNTIME_K8S_SPARK_IMAGE} \
--conf spark.executor.instances=${RUNTIME_EXECUTOR_INSTANCES} \
--conf spark.kubernetes.driver.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.options.claimName=${RUNTIME_PERSISTENT_VOLUME_CLAIM} \
--conf spark.kubernetes.driver.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.mount.path=/tmp \
--conf spark.kubernetes.executor.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.options.claimName=${RUNTIME_PERSISTENT_VOLUME_CLAIM} \
--conf spark.kubernetes.executor.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.mount.path=/tmp \
--conf spark.kubernetes.driver.label.az=true \
--conf spark.kubernetes.executor.label.az=true \
--conf spark.kubernetes.node.selector.spark=true \
--executor-cores ${RUNTIME_EXECUTOR_CORES} \
--executor-memory ${RUNTIME_EXECUTOR_MEMORY} \
--total-executor-cores ${RUNTIME_TOTAL_EXECUTOR_CORES} \
--driver-cores ${RUNTIME_DRIVER_CORES} \
--driver-memory ${RUNTIME_DRIVER_MEMORY} \
--properties-file ${BIGDL_HOME}/conf/spark-bigdl.conf \
--py-files local://${BIGDL_HOME}/python/bigdl-friesian-spark_${SPARK_VERSION}-${BIGDL_VERSION}-python-api.zip,local://${BIGDL_HOME}/python/bigdl-serving-spark_${SPARK_VERSION}-${BIGDL_VERSION}-python-api.zip,local://${BIGDL_HOME}/python/bigdl-dllib-spark_${SPARK_VERSION}-${BIGDL_VERSION}-python-api.zip,local://${BIGDL_HOME}/python/bigdl-orca-spark_${SPARK_VERSION}-${BIGDL_VERSION}-python-api.zip,local:///opt/bigdl-0.14.0-SNAPSHOT/examples/orca/tfpark/gan/gan_train_and_evaluate.py \
--conf spark.driver.extraJavaOptions=-Dderby.stream.error.file=/tmp \
--conf spark.sql.catalogImplementation='in-memory' \
--conf spark.driver.extraClassPath=local://${BIGDL_HOME}/jars/bigdl-orca-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar:local://${BIGDL_HOME}/jars/bigdl-dllib-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar:local://${BIGDL_HOME}/jars/bigdl-friesian-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar \
--conf spark.executor.extraClassPath=local://${BIGDL_HOME}/jars/bigdl-orca-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar:local://${BIGDL_HOME}/jars/bigdl-dllib-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar:local://${BIGDL_HOME}/jars/bigdl-friesian-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar \
local:///opt/bigdl-0.14.0-SNAPSHOT/examples/orca/tfpark/gan/gan_train_and_evaluate.py
cluster Exception
Traceback (most recent call last):
File "/opt/bigdl-0.14.0-SNAPSHOT/examples/orca/tfpark/gan/gan_train_and_evaluate.py", line 16, in <module>
from bigdl.optim.optimizer import MaxIteration
ModuleNotFoundError: No module named 'bigdl.optim'
piaolaidelangman commented
yolov3.py
client command
${SPARK_HOME}/bin/spark-submit \
--master ${RUNTIME_SPARK_MASTER} \
--deploy-mode client \
--conf spark.driver.host=172.16.0.200 \
--conf spark.driver.port=54321 \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=${RUNTIME_K8S_SERVICE_ACCOUNT} \
--name analytics-zoo-autoestimator \
--conf spark.kubernetes.container.image=${RUNTIME_K8S_SPARK_IMAGE} \
--conf spark.executor.instances=${RUNTIME_EXECUTOR_INSTANCES} \
--conf spark.kubernetes.driver.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.options.claimName=${RUNTIME_PERSISTENT_VOLUME_CLAIM} \
--conf spark.kubernetes.driver.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.mount.path=/tmp \
--conf spark.kubernetes.executor.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.options.claimName=${RUNTIME_PERSISTENT_VOLUME_CLAIM} \
--conf spark.kubernetes.executor.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.mount.path=/tmp \
--conf spark.kubernetes.driver.label.az=true \
--conf spark.kubernetes.executor.label.az=true \
--conf spark.kubernetes.node.selector.spark=true \
--executor-cores ${RUNTIME_EXECUTOR_CORES} \
--executor-memory ${RUNTIME_EXECUTOR_MEMORY} \
--total-executor-cores ${RUNTIME_TOTAL_EXECUTOR_CORES} \
--driver-cores ${RUNTIME_DRIVER_CORES} \
--driver-memory ${RUNTIME_DRIVER_MEMORY} \
--properties-file ${BIGDL_HOME}/conf/spark-bigdl.conf \
--py-files local://${BIGDL_HOME}/python/bigdl-friesian-spark_${SPARK_VERSION}-${BIGDL_VERSION}-python-api.zip,local://${BIGDL_HOME}/python/bigdl-serving-spark_${SPARK_VERSION}-${BIGDL_VERSION}-python-api.zip,local://${BIGDL_HOME}/python/bigdl-dllib-spark_${SPARK_VERSION}-${BIGDL_VERSION}-python-api.zip,local://${BIGDL_HOME}/python/bigdl-orca-spark_${SPARK_VERSION}-${BIGDL_VERSION}-python-api.zip,local:///opt/bigdl-0.14.0-SNAPSHOT/examples/orca/learn/tf2/yolov3/yoloV3.py \
--conf spark.driver.extraJavaOptions=-Dderby.stream.error.file=/tmp \
--conf spark.sql.catalogImplementation='in-memory' \
--conf spark.driver.extraClassPath=local://${BIGDL_HOME}/jars/bigdl-orca-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar:local://${BIGDL_HOME}/jars/bigdl-dllib-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar:local://${BIGDL_HOME}/jars/bigdl-friesian-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar \
--conf spark.executor.extraClassPath=local://${BIGDL_HOME}/jars/bigdl-orca-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar:local://${BIGDL_HOME}/jars/bigdl-dllib-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar:local://${BIGDL_HOME}/jars/bigdl-friesian-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar \
local:///opt/bigdl-0.14.0-SNAPSHOT/examples/orca/learn/tf2/yolov3/yoloV3.py \
--data_dir /bigdl2.0/data/yolov3 \
--weights /bigdl2.0/data/yolov3/yolov3.weights \
--class_num 20 \
--name /bigdl2.0/data/yolov3/voc2012.names
client exception
Traceback (most recent call last):
File "/opt/bigdl-0.14.0-SNAPSHOT/examples/orca/learn/tf2/yolov3/yoloV3.py", line 694, in <module>
main()
File "/opt/bigdl-0.14.0-SNAPSHOT/examples/orca/learn/tf2/yolov3/yoloV3.py", line 680, in main
trainer = Estimator.from_keras(model_creator=model_creator)
File "/opt/bigdl-0.14.0-SNAPSHOT/python/bigdl-friesian-spark_3.1.2-0.14.0-SNAPSHOT-python-api.zip/bigdl/orca/learn/tf2/estimator.py", line 69, in from_keras
File "/opt/bigdl-0.14.0-SNAPSHOT/python/bigdl-friesian-spark_3.1.2-0.14.0-SNAPSHOT-python-api.zip/bigdl/orca/learn/tf2/estimator.py", line 132, in __init__
File "/opt/bigdl-0.14.0-SNAPSHOT/python/bigdl-friesian-spark_3.1.2-0.14.0-SNAPSHOT-python-api.zip/bigdl/orca/learn/dl_cluster.py", line 111, in __init__
File "/opt/work/conda/envs/newenv/lib/python3.7/site-packages/ray/_private/client_mode_hook.py", line 47, in wrapper
return func(*args, **kwargs)
File "/opt/work/conda/envs/newenv/lib/python3.7/site-packages/ray/worker.py", line 1456, in get
raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(RecursionError): ray::Worker.disable_cpu_affinity() (pid=41100, ip=172.16.0.200)
File "python/ray/_raylet.pyx", line 439, in ray._raylet.execute_task
File "python/ray/_raylet.pyx", line 442, in ray._raylet.execute_task
File "python/ray/_raylet.pyx", line 459, in ray._raylet.execute_task
File "/opt/work/conda/envs/newenv/lib/python3.7/site-packages/ray/serialization.py", line 245, in deserialize_objects
self._deserialize_object(data, metadata, object_ref))
File "/opt/work/conda/envs/newenv/lib/python3.7/site-packages/ray/serialization.py", line 192, in _deserialize_object
return self._deserialize_msgpack_data(data, metadata_fields)
File "/opt/work/conda/envs/newenv/lib/python3.7/site-packages/ray/serialization.py", line 170, in _deserialize_msgpack_data
python_objects = self._deserialize_pickle5_data(pickle5_data)
File "/opt/work/conda/envs/newenv/lib/python3.7/site-packages/ray/serialization.py", line 158, in _deserialize_pickle5_data
obj = pickle.loads(in_band, buffers=buffers)
File "/opt/work/conda/envs/newenv/lib/python3.7/site-packages/tensorflow/__init__.py", line 50, in __getattr__
module = self._load()
File "/opt/work/conda/envs/newenv/lib/python3.7/site-packages/tensorflow/__init__.py", line 44, in _load
module = _importlib.import_module(self.__name__)
File "/opt/work/conda/envs/newenv/lib/python3.7/site-packages/tensorflow/__init__.py", line 50, in __getattr__
module = self._load()
File "/opt/work/conda/envs/newenv/lib/python3.7/site-packages/tensorflow/__init__.py", line 44, in _load
module = _importlib.import_module(self.__name__)
File "/opt/work/conda/envs/newenv/lib/python3.7/site-packages/tensorflow/__init__.py", line 50, in __getattr__
module = self._load()
File "/opt/work/conda/envs/newenv/lib/python3.7/site-packages/tensorflow/__init__.py", line 44, in _load
......
File "/opt/work/conda/envs/newenv/lib/python3.7/site-packages/tensorflow/__init__.py", line 50, in __getattr__
module = self._load()
File "/opt/work/conda/envs/newenv/lib/python3.7/site-packages/tensorflow/__init__.py", line 44, in _load
module = _importlib.import_module(self.__name__)
RecursionError: maximum recursion depth exceeded while calling a Python object
Stopping orca context
cluster command
${SPARK_HOME}/bin/spark-submit \
--master ${RUNTIME_SPARK_MASTER} \
--deploy-mode cluster \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=${RUNTIME_K8S_SERVICE_ACCOUNT} \
--name analytics-zoo-autoestimator \
--conf spark.kubernetes.container.image=${RUNTIME_K8S_SPARK_IMAGE} \
--conf spark.executor.instances=${RUNTIME_EXECUTOR_INSTANCES} \
--conf spark.kubernetes.driver.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.options.claimName=${RUNTIME_PERSISTENT_VOLUME_CLAIM} \
--conf spark.kubernetes.driver.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.mount.path=/tmp \
--conf spark.kubernetes.executor.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.options.claimName=${RUNTIME_PERSISTENT_VOLUME_CLAIM} \
--conf spark.kubernetes.executor.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.mount.path=/tmp \
--conf spark.kubernetes.driver.label.az=true \
--conf spark.kubernetes.executor.label.az=true \
--conf spark.kubernetes.node.selector.spark=true \
--conf spark.kubernetes.driverEnv.http_proxy=${http_proxy} \
--conf spark.kubernetes.driverEnv.https_proxy=${https_proxy} \
--conf spark.kubernetes.executorEnv.http_proxy=${http_proxy} \
--conf spark.kubernetes.executorEnv.https_proxy=${https_proxy} \
--conf spark.kubernetes.executor.podTemplateFile=/opt/work/spark-executor-template.yaml \
--executor-cores ${RUNTIME_EXECUTOR_CORES} \
--executor-memory ${RUNTIME_EXECUTOR_MEMORY} \
--total-executor-cores ${RUNTIME_TOTAL_EXECUTOR_CORES} \
--driver-cores ${RUNTIME_DRIVER_CORES} \
--driver-memory ${RUNTIME_DRIVER_MEMORY} \
--properties-file ${BIGDL_HOME}/conf/spark-bigdl.conf \
--py-files local://${BIGDL_HOME}/python/bigdl-friesian-spark_${SPARK_VERSION}-${BIGDL_VERSION}-python-api.zip,local://${BIGDL_HOME}/python/bigdl-serving-spark_${SPARK_VERSION}-${BIGDL_VERSION}-python-api.zip,local://${BIGDL_HOME}/python/bigdl-dllib-spark_${SPARK_VERSION}-${BIGDL_VERSION}-python-api.zip,local://${BIGDL_HOME}/python/bigdl-orca-spark_${SPARK_VERSION}-${BIGDL_VERSION}-python-api.zip,local:///opt/bigdl-0.14.0-SNAPSHOT/examples/orca/learn/tf2/yolov3/yoloV3.py \
--conf spark.driver.extraJavaOptions=-Dderby.stream.error.file=/tmp \
--conf spark.sql.catalogImplementation='in-memory' \
--conf spark.driver.extraClassPath=local://${BIGDL_HOME}/jars/bigdl-orca-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar:local://${BIGDL_HOME}/jars/bigdl-dllib-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar:local://${BIGDL_HOME}/jars/bigdl-friesian-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar \
--conf spark.executor.extraClassPath=local://${BIGDL_HOME}/jars/bigdl-orca-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar:local://${BIGDL_HOME}/jars/bigdl-dllib-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar:local://${BIGDL_HOME}/jars/bigdl-friesian-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar \
local:///opt/bigdl-0.14.0-SNAPSHOT/examples/orca/learn/tf2/yolov3/yoloV3.py \
--data_dir /bigdl2.0/data/yolov3 \
--weights /bigdl2.0/data/yolov3/yolov3.weights \
--class_num 20 \
--name /bigdl2.0/data/yolov3/voc2012.names
cluster exception
Traceback (most recent call last):
File "/opt/bigdl-0.14.0-SNAPSHOT/examples/orca/learn/tf2/yolov3/yoloV3.py", line 694, in <module>
main()
File "/opt/bigdl-0.14.0-SNAPSHOT/examples/orca/learn/tf2/yolov3/yoloV3.py", line 606, in main
load_darknet_weights(yolo, options.weights)
File "/opt/bigdl-0.14.0-SNAPSHOT/examples/orca/learn/tf2/yolov3/yoloV3.py", line 192, in load_darknet_weights
wf = open(weights_file, 'rb')
FileNotFoundError: [Errno 2] No such file or directory: '/bigdl2.0/data/yolov3/yolov3.weights'
Stopping orca context
The file '/bigdl2.0/data/yolov3/yolov3.weights' exists.
piaolaidelangman commented
fashion_mnist.py
client command
${SPARK_HOME}/bin/spark-submit \
--master ${RUNTIME_SPARK_MASTER} \
--deploy-mode client \
--conf spark.driver.host=172.16.0.200 \
--conf spark.driver.port=54321 \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=${RUNTIME_K8S_SERVICE_ACCOUNT} \
--name bigdl2-fashion_mnist \
--conf spark.kubernetes.container.image=${RUNTIME_K8S_SPARK_IMAGE} \
--conf spark.executor.instances=${RUNTIME_EXECUTOR_INSTANCES} \
--conf spark.kubernetes.driver.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.options.claimName=${RUNTIME_PERSISTENT_VOLUME_CLAIM} \
--conf spark.kubernetes.driver.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.mount.path=/bigdl2.0/data \
--conf spark.kubernetes.executor.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.options.claimName=${RUNTIME_PERSISTENT_VOLUME_CLAIM} \
--conf spark.kubernetes.executor.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.mount.path=/bigdl2.0/data \
--conf spark.kubernetes.driver.label.az=true \
--conf spark.kubernetes.executor.label.az=true \
--conf spark.kubernetes.node.selector.spark=true \
--conf spark.kubernetes.driverEnv.http_proxy=${http_proxy} \
--conf spark.kubernetes.driverEnv.https_proxy=${https_proxy} \
--conf spark.kubernetes.executorEnv.http_proxy=${http_proxy} \
--conf spark.kubernetes.executorEnv.https_proxy=${https_proxy} \
--conf spark.kubernetes.container.image.pullPolicy=Always \
--executor-cores ${RUNTIME_EXECUTOR_CORES} \
--executor-memory ${RUNTIME_EXECUTOR_MEMORY} \
--total-executor-cores ${RUNTIME_TOTAL_EXECUTOR_CORES} \
--driver-cores ${RUNTIME_DRIVER_CORES} \
--driver-memory ${RUNTIME_DRIVER_MEMORY} \
--properties-file ${BIGDL_HOME}/conf/spark-bigdl.conf \
--py-files local://${BIGDL_HOME}/python/bigdl-friesian-spark_${SPARK_VERSION}-${BIGDL_VERSION}-python-api.zip,local://${BIGDL_HOME}/python/bigdl-serving-spark_${SPARK_VERSION}-${BIGDL_VERSION}-python-api.zip,local://${BIGDL_HOME}/python/bigdl-dllib-spark_${SPARK_VERSION}-${BIGDL_VERSION}-python-api.zip,local://${BIGDL_HOME}/python/bigdl-orca-spark_${SPARK_VERSION}-${BIGDL_VERSION}-python-api.zip,local:///opt/bigdl-0.14.0-SNAPSHOT/examples/orca/learn/pytorch/fashion_mnist/fashion_mnist.py \
--conf spark.driver.extraJavaOptions=-Dderby.stream.error.file=/tmp \
--conf spark.sql.catalogImplementation='in-memory' \
--conf spark.driver.extraClassPath=local://${BIGDL_HOME}/jars/bigdl-orca-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar:local://${BIGDL_HOME}/jars/bigdl-dllib-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar:local://${BIGDL_HOME}/jars/bigdl-friesian-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar \
--conf spark.executor.extraClassPath=local://${BIGDL_HOME}/jars/bigdl-orca-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar:local://${BIGDL_HOME}/jars/bigdl-dllib-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar:local://${BIGDL_HOME}/jars/bigdl-friesian-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar \
local:///opt/bigdl-0.14.0-SNAPSHOT/examples/orca/learn/pytorch/fashion_mnist/fashion_mnist.py
client exception
Traceback (most recent call last):
File "/opt/bigdl-0.14.0-SNAPSHOT/examples/orca/learn/pytorch/fashion_mnist/fashion_mnist.py", line 190, in <module>
main()
File "/opt/bigdl-0.14.0-SNAPSHOT/examples/orca/learn/pytorch/fashion_mnist/fashion_mnist.py", line 159, in main
backend="bigdl")
File "/opt/bigdl-0.14.0-SNAPSHOT/python/bigdl-friesian-spark_3.1.2-0.14.0-SNAPSHOT-python-api.zip/bigdl/orca/learn/pytorch/estimator.py", line 103, in from_torch
File "/opt/bigdl-0.14.0-SNAPSHOT/python/bigdl-friesian-spark_3.1.2-0.14.0-SNAPSHOT-python-api.zip/bigdl/orca/learn/pytorch/estimator.py", line 294, in __init__
File "/opt/bigdl-0.14.0-SNAPSHOT/python/bigdl-friesian-spark_3.1.2-0.14.0-SNAPSHOT-python-api.zip/bigdl/orca/torch/torch_model.py", line 61, in from_pytorch
AttributeError: module 'types' has no attribute 'ClassType'
Stopping orca context
cluster command
${SPARK_HOME}/bin/spark-submit \
--master ${RUNTIME_SPARK_MASTER} \
--deploy-mode cluster \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=${RUNTIME_K8S_SERVICE_ACCOUNT} \
--name bigdl2-fashion_mnist \
--conf spark.kubernetes.container.image=${RUNTIME_K8S_SPARK_IMAGE} \
--conf spark.executor.instances=${RUNTIME_EXECUTOR_INSTANCES} \
--conf spark.kubernetes.driver.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.options.claimName=${RUNTIME_PERSISTENT_VOLUME_CLAIM} \
--conf spark.kubernetes.driver.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.mount.path=/bigdl2.0/data \
--conf spark.kubernetes.executor.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.options.claimName=${RUNTIME_PERSISTENT_VOLUME_CLAIM} \
--conf spark.kubernetes.executor.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.mount.path=/bigdl2.0/data \
--conf spark.kubernetes.driver.label.az=true \
--conf spark.kubernetes.executor.label.az=true \
--conf spark.kubernetes.node.selector.spark=true \
--conf spark.kubernetes.driverEnv.http_proxy=${http_proxy} \
--conf spark.kubernetes.driverEnv.https_proxy=${https_proxy} \
--conf spark.kubernetes.executorEnv.http_proxy=${http_proxy} \
--conf spark.kubernetes.executorEnv.https_proxy=${https_proxy} \
--conf spark.kubernetes.container.image.pullPolicy=Always \
--executor-cores ${RUNTIME_EXECUTOR_CORES} \
--executor-memory ${RUNTIME_EXECUTOR_MEMORY} \
--total-executor-cores ${RUNTIME_TOTAL_EXECUTOR_CORES} \
--driver-cores ${RUNTIME_DRIVER_CORES} \
--driver-memory ${RUNTIME_DRIVER_MEMORY} \
--properties-file ${BIGDL_HOME}/conf/spark-bigdl.conf \
--py-files local://${BIGDL_HOME}/python/bigdl-friesian-spark_${SPARK_VERSION}-${BIGDL_VERSION}-python-api.zip,local://${BIGDL_HOME}/python/bigdl-serving-spark_${SPARK_VERSION}-${BIGDL_VERSION}-python-api.zip,local://${BIGDL_HOME}/python/bigdl-dllib-spark_${SPARK_VERSION}-${BIGDL_VERSION}-python-api.zip,local://${BIGDL_HOME}/python/bigdl-orca-spark_${SPARK_VERSION}-${BIGDL_VERSION}-python-api.zip,local:///opt/bigdl-0.14.0-SNAPSHOT/examples/orca/learn/pytorch/fashion_mnist/fashion_mnist.py \
--conf spark.driver.extraJavaOptions=-Dderby.stream.error.file=/tmp \
--conf spark.sql.catalogImplementation='in-memory' \
--conf spark.driver.extraClassPath=local://${BIGDL_HOME}/jars/bigdl-orca-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar:local://${BIGDL_HOME}/jars/bigdl-dllib-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar:local://${BIGDL_HOME}/jars/bigdl-friesian-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar \
--conf spark.executor.extraClassPath=local://${BIGDL_HOME}/jars/bigdl-orca-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar:local://${BIGDL_HOME}/jars/bigdl-dllib-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar:local://${BIGDL_HOME}/jars/bigdl-friesian-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar \
local:///opt/bigdl-0.14.0-SNAPSHOT/examples/orca/learn/pytorch/fashion_mnist/fashion_mnist.py${SPARK_HOME}/bin/spark-submit \
--master ${RUNTIME_SPARK_MASTER} \
--deploy-mode cluster \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=${RUNTIME_K8S_SERVICE_ACCOUNT} \
--name bigdl2-fashion_mnist \
--conf spark.kubernetes.container.image=${RUNTIME_K8S_SPARK_IMAGE} \
--conf spark.executor.instances=${RUNTIME_EXECUTOR_INSTANCES} \
--conf spark.kubernetes.driver.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.options.claimName=${RUNTIME_PERSISTENT_VOLUME_CLAIM} \
--conf spark.kubernetes.driver.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.mount.path=/bigdl2.0/data \
--conf spark.kubernetes.executor.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.options.claimName=${RUNTIME_PERSISTENT_VOLUME_CLAIM} \
--conf spark.kubernetes.executor.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.mount.path=/bigdl2.0/data \
--conf spark.kubernetes.driver.label.az=true \
--conf spark.kubernetes.executor.label.az=true \
--conf spark.kubernetes.node.selector.spark=true \
--conf spark.kubernetes.driverEnv.http_proxy=${http_proxy} \
--conf spark.kubernetes.driverEnv.https_proxy=${https_proxy} \
--conf spark.kubernetes.executorEnv.http_proxy=${http_proxy} \
--conf spark.kubernetes.executorEnv.https_proxy=${https_proxy} \
--conf spark.kubernetes.container.image.pullPolicy=Always \
--executor-cores ${RUNTIME_EXECUTOR_CORES} \
--executor-memory ${RUNTIME_EXECUTOR_MEMORY} \
--total-executor-cores ${RUNTIME_TOTAL_EXECUTOR_CORES} \
--driver-cores ${RUNTIME_DRIVER_CORES} \
--driver-memory ${RUNTIME_DRIVER_MEMORY} \
--properties-file ${BIGDL_HOME}/conf/spark-bigdl.conf \
--py-files local://${BIGDL_HOME}/python/bigdl-friesian-spark_${SPARK_VERSION}-${BIGDL_VERSION}-python-api.zip,local://${BIGDL_HOME}/python/bigdl-serving-spark_${SPARK_VERSION}-${BIGDL_VERSION}-python-api.zip,local://${BIGDL_HOME}/python/bigdl-dllib-spark_${SPARK_VERSION}-${BIGDL_VERSION}-python-api.zip,local://${BIGDL_HOME}/python/bigdl-orca-spark_${SPARK_VERSION}-${BIGDL_VERSION}-python-api.zip,local:///opt/bigdl-0.14.0-SNAPSHOT/examples/orca/learn/pytorch/fashion_mnist/fashion_mnist.py \
--conf spark.driver.extraJavaOptions=-Dderby.stream.error.file=/tmp \
--conf spark.sql.catalogImplementation='in-memory' \
--conf spark.driver.extraClassPath=local://${BIGDL_HOME}/jars/bigdl-orca-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar:local://${BIGDL_HOME}/jars/bigdl-dllib-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar:local://${BIGDL_HOME}/jars/bigdl-friesian-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar \
--conf spark.executor.extraClassPath=local://${BIGDL_HOME}/jars/bigdl-orca-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar:local://${BIGDL_HOME}/jars/bigdl-dllib-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar:local://${BIGDL_HOME}/jars/bigdl-friesian-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar \
local:///opt/bigdl-0.14.0-SNAPSHOT/examples/orca/learn/pytorch/fashion_mnist/fashion_mnist.py
cluster exception
creating: createTorchLoss
creating: createTorchOptim
creating: createZooKerasAccuracy
Traceback (most recent call last):
File "/opt/bigdl-0.14.0-SNAPSHOT/examples/orca/learn/pytorch/fashion_mnist/fashion_mnist.py", line 190, in <module>
main()
File "/opt/bigdl-0.14.0-SNAPSHOT/examples/orca/learn/pytorch/fashion_mnist/fashion_mnist.py", line 159, in main
backend="bigdl")
File "/opt/bigdl-0.14.0-SNAPSHOT/python/bigdl-friesian-spark_3.1.2-0.14.0-SNAPSHOT-python-api.zip/bigdl/orca/learn/pytorch/estimator.py", line 103, in from_torch
File "/opt/bigdl-0.14.0-SNAPSHOT/python/bigdl-friesian-spark_3.1.2-0.14.0-SNAPSHOT-python-api.zip/bigdl/orca/learn/pytorch/estimator.py", line 294, in __init__
File "/opt/bigdl-0.14.0-SNAPSHOT/python/bigdl-friesian-spark_3.1.2-0.14.0-SNAPSHOT-python-api.zip/bigdl/orca/torch/torch_model.py", line 61, in from_pytorch
AttributeError: module 'types' has no attribute 'ClassType'
Stopping orca context
torchmodel's mnist/main.py imagenet/main.py resnet_finetune.py share same error.
qiuxin has update some source code in python/orca/src/bigdl/orca/torch/torch_model.py
Le-Zheng commented
yolov3.py
client command
${SPARK_HOME}/bin/spark-submit \ --master ${RUNTIME_SPARK_MASTER} \ --deploy-mode client \ --conf spark.driver.host=172.16.0.200 \ --conf spark.driver.port=54321 \ --conf spark.kubernetes.authenticate.driver.serviceAccountName=${RUNTIME_K8S_SERVICE_ACCOUNT} \ --name analytics-zoo-autoestimator \ --conf spark.kubernetes.container.image=${RUNTIME_K8S_SPARK_IMAGE} \ --conf spark.executor.instances=${RUNTIME_EXECUTOR_INSTANCES} \ --conf spark.kubernetes.driver.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.options.claimName=${RUNTIME_PERSISTENT_VOLUME_CLAIM} \ --conf spark.kubernetes.driver.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.mount.path=/tmp \ --conf spark.kubernetes.executor.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.options.claimName=${RUNTIME_PERSISTENT_VOLUME_CLAIM} \ --conf spark.kubernetes.executor.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.mount.path=/tmp \ --conf spark.kubernetes.driver.label.az=true \ --conf spark.kubernetes.executor.label.az=true \ --conf spark.kubernetes.node.selector.spark=true \ --executor-cores ${RUNTIME_EXECUTOR_CORES} \ --executor-memory ${RUNTIME_EXECUTOR_MEMORY} \ --total-executor-cores ${RUNTIME_TOTAL_EXECUTOR_CORES} \ --driver-cores ${RUNTIME_DRIVER_CORES} \ --driver-memory ${RUNTIME_DRIVER_MEMORY} \ --properties-file ${BIGDL_HOME}/conf/spark-bigdl.conf \ --py-files local://${BIGDL_HOME}/python/bigdl-friesian-spark_${SPARK_VERSION}-${BIGDL_VERSION}-python-api.zip,local://${BIGDL_HOME}/python/bigdl-serving-spark_${SPARK_VERSION}-${BIGDL_VERSION}-python-api.zip,local://${BIGDL_HOME}/python/bigdl-dllib-spark_${SPARK_VERSION}-${BIGDL_VERSION}-python-api.zip,local://${BIGDL_HOME}/python/bigdl-orca-spark_${SPARK_VERSION}-${BIGDL_VERSION}-python-api.zip,local:///opt/bigdl-0.14.0-SNAPSHOT/examples/orca/learn/tf2/yolov3/yoloV3.py \ --conf spark.driver.extraJavaOptions=-Dderby.stream.error.file=/tmp \ --conf spark.sql.catalogImplementation='in-memory' \ --conf spark.driver.extraClassPath=local://${BIGDL_HOME}/jars/bigdl-orca-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar:local://${BIGDL_HOME}/jars/bigdl-dllib-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar:local://${BIGDL_HOME}/jars/bigdl-friesian-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar \ --conf spark.executor.extraClassPath=local://${BIGDL_HOME}/jars/bigdl-orca-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar:local://${BIGDL_HOME}/jars/bigdl-dllib-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar:local://${BIGDL_HOME}/jars/bigdl-friesian-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar \ local:///opt/bigdl-0.14.0-SNAPSHOT/examples/orca/learn/tf2/yolov3/yoloV3.py \ --data_dir /bigdl2.0/data/yolov3 \ --weights /bigdl2.0/data/yolov3/yolov3.weights \ --class_num 20 \ --name /bigdl2.0/data/yolov3/voc2012.names
client exception
Traceback (most recent call last): File "/opt/bigdl-0.14.0-SNAPSHOT/examples/orca/learn/tf2/yolov3/yoloV3.py", line 694, in <module> main() File "/opt/bigdl-0.14.0-SNAPSHOT/examples/orca/learn/tf2/yolov3/yoloV3.py", line 680, in main trainer = Estimator.from_keras(model_creator=model_creator) File "/opt/bigdl-0.14.0-SNAPSHOT/python/bigdl-friesian-spark_3.1.2-0.14.0-SNAPSHOT-python-api.zip/bigdl/orca/learn/tf2/estimator.py", line 69, in from_keras File "/opt/bigdl-0.14.0-SNAPSHOT/python/bigdl-friesian-spark_3.1.2-0.14.0-SNAPSHOT-python-api.zip/bigdl/orca/learn/tf2/estimator.py", line 132, in __init__ File "/opt/bigdl-0.14.0-SNAPSHOT/python/bigdl-friesian-spark_3.1.2-0.14.0-SNAPSHOT-python-api.zip/bigdl/orca/learn/dl_cluster.py", line 111, in __init__ File "/opt/work/conda/envs/newenv/lib/python3.7/site-packages/ray/_private/client_mode_hook.py", line 47, in wrapper return func(*args, **kwargs) File "/opt/work/conda/envs/newenv/lib/python3.7/site-packages/ray/worker.py", line 1456, in get raise value.as_instanceof_cause() ray.exceptions.RayTaskError(RecursionError): ray::Worker.disable_cpu_affinity() (pid=41100, ip=172.16.0.200) File "python/ray/_raylet.pyx", line 439, in ray._raylet.execute_task File "python/ray/_raylet.pyx", line 442, in ray._raylet.execute_task File "python/ray/_raylet.pyx", line 459, in ray._raylet.execute_task File "/opt/work/conda/envs/newenv/lib/python3.7/site-packages/ray/serialization.py", line 245, in deserialize_objects self._deserialize_object(data, metadata, object_ref)) File "/opt/work/conda/envs/newenv/lib/python3.7/site-packages/ray/serialization.py", line 192, in _deserialize_object return self._deserialize_msgpack_data(data, metadata_fields) File "/opt/work/conda/envs/newenv/lib/python3.7/site-packages/ray/serialization.py", line 170, in _deserialize_msgpack_data python_objects = self._deserialize_pickle5_data(pickle5_data) File "/opt/work/conda/envs/newenv/lib/python3.7/site-packages/ray/serialization.py", line 158, in _deserialize_pickle5_data obj = pickle.loads(in_band, buffers=buffers) File "/opt/work/conda/envs/newenv/lib/python3.7/site-packages/tensorflow/__init__.py", line 50, in __getattr__ module = self._load() File "/opt/work/conda/envs/newenv/lib/python3.7/site-packages/tensorflow/__init__.py", line 44, in _load module = _importlib.import_module(self.__name__) File "/opt/work/conda/envs/newenv/lib/python3.7/site-packages/tensorflow/__init__.py", line 50, in __getattr__ module = self._load() File "/opt/work/conda/envs/newenv/lib/python3.7/site-packages/tensorflow/__init__.py", line 44, in _load module = _importlib.import_module(self.__name__) File "/opt/work/conda/envs/newenv/lib/python3.7/site-packages/tensorflow/__init__.py", line 50, in __getattr__ module = self._load() File "/opt/work/conda/envs/newenv/lib/python3.7/site-packages/tensorflow/__init__.py", line 44, in _load ...... File "/opt/work/conda/envs/newenv/lib/python3.7/site-packages/tensorflow/__init__.py", line 50, in __getattr__ module = self._load() File "/opt/work/conda/envs/newenv/lib/python3.7/site-packages/tensorflow/__init__.py", line 44, in _load module = _importlib.import_module(self.__name__) RecursionError: maximum recursion depth exceeded while calling a Python object Stopping orca context
cluster command
${SPARK_HOME}/bin/spark-submit \ --master ${RUNTIME_SPARK_MASTER} \ --deploy-mode cluster \ --conf spark.kubernetes.authenticate.driver.serviceAccountName=${RUNTIME_K8S_SERVICE_ACCOUNT} \ --name analytics-zoo-autoestimator \ --conf spark.kubernetes.container.image=${RUNTIME_K8S_SPARK_IMAGE} \ --conf spark.executor.instances=${RUNTIME_EXECUTOR_INSTANCES} \ --conf spark.kubernetes.driver.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.options.claimName=${RUNTIME_PERSISTENT_VOLUME_CLAIM} \ --conf spark.kubernetes.driver.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.mount.path=/tmp \ --conf spark.kubernetes.executor.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.options.claimName=${RUNTIME_PERSISTENT_VOLUME_CLAIM} \ --conf spark.kubernetes.executor.volumes.persistentVolumeClaim.${RUNTIME_PERSISTENT_VOLUME_CLAIM}.mount.path=/tmp \ --conf spark.kubernetes.driver.label.az=true \ --conf spark.kubernetes.executor.label.az=true \ --conf spark.kubernetes.node.selector.spark=true \ --conf spark.kubernetes.driverEnv.http_proxy=${http_proxy} \ --conf spark.kubernetes.driverEnv.https_proxy=${https_proxy} \ --conf spark.kubernetes.executorEnv.http_proxy=${http_proxy} \ --conf spark.kubernetes.executorEnv.https_proxy=${https_proxy} \ --conf spark.kubernetes.executor.podTemplateFile=/opt/work/spark-executor-template.yaml \ --executor-cores ${RUNTIME_EXECUTOR_CORES} \ --executor-memory ${RUNTIME_EXECUTOR_MEMORY} \ --total-executor-cores ${RUNTIME_TOTAL_EXECUTOR_CORES} \ --driver-cores ${RUNTIME_DRIVER_CORES} \ --driver-memory ${RUNTIME_DRIVER_MEMORY} \ --properties-file ${BIGDL_HOME}/conf/spark-bigdl.conf \ --py-files local://${BIGDL_HOME}/python/bigdl-friesian-spark_${SPARK_VERSION}-${BIGDL_VERSION}-python-api.zip,local://${BIGDL_HOME}/python/bigdl-serving-spark_${SPARK_VERSION}-${BIGDL_VERSION}-python-api.zip,local://${BIGDL_HOME}/python/bigdl-dllib-spark_${SPARK_VERSION}-${BIGDL_VERSION}-python-api.zip,local://${BIGDL_HOME}/python/bigdl-orca-spark_${SPARK_VERSION}-${BIGDL_VERSION}-python-api.zip,local:///opt/bigdl-0.14.0-SNAPSHOT/examples/orca/learn/tf2/yolov3/yoloV3.py \ --conf spark.driver.extraJavaOptions=-Dderby.stream.error.file=/tmp \ --conf spark.sql.catalogImplementation='in-memory' \ --conf spark.driver.extraClassPath=local://${BIGDL_HOME}/jars/bigdl-orca-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar:local://${BIGDL_HOME}/jars/bigdl-dllib-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar:local://${BIGDL_HOME}/jars/bigdl-friesian-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar \ --conf spark.executor.extraClassPath=local://${BIGDL_HOME}/jars/bigdl-orca-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar:local://${BIGDL_HOME}/jars/bigdl-dllib-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar:local://${BIGDL_HOME}/jars/bigdl-friesian-spark_${SPARK_VERSION}-${BIGDL_VERSION}-jar-with-dependencies.jar \ local:///opt/bigdl-0.14.0-SNAPSHOT/examples/orca/learn/tf2/yolov3/yoloV3.py \ --data_dir /bigdl2.0/data/yolov3 \ --weights /bigdl2.0/data/yolov3/yolov3.weights \ --class_num 20 \ --name /bigdl2.0/data/yolov3/voc2012.names
cluster exception
Traceback (most recent call last): File "/opt/bigdl-0.14.0-SNAPSHOT/examples/orca/learn/tf2/yolov3/yoloV3.py", line 694, in <module> main() File "/opt/bigdl-0.14.0-SNAPSHOT/examples/orca/learn/tf2/yolov3/yoloV3.py", line 606, in main load_darknet_weights(yolo, options.weights) File "/opt/bigdl-0.14.0-SNAPSHOT/examples/orca/learn/tf2/yolov3/yoloV3.py", line 192, in load_darknet_weights wf = open(weights_file, 'rb') FileNotFoundError: [Errno 2] No such file or directory: '/bigdl2.0/data/yolov3/yolov3.weights' Stopping orca context
The file '/bigdl2.0/data/yolov3/yolov3.weights' exists.
Yolov3 client mode RecursionError: maximum recursion depth exceeded while calling a Python object
: it requires Tensorflow2, we may add a conda env for tf2
Yolov3 cluster mode /bigdl2.0/data/yolov3/yolov3.weights
: need to put this file in nfs path
piaolaidelangman commented
K8s client-mode test exception on new image in this issue: #23