zookeeper authentication issue
rajrohith opened this issue · 1 comments
Hi Kiran,
I have issue while pushing data to solr from pyspark and my solr is in cloud which have user and password. please see the error below . please guide me how to fix this error in the below
specifically when i run in local i didnt face any issue and when i run into yarn mode job is failed due to authentication error and please see the error
below is my shell script and
/usr/hdp/current/spark2-client/bin/spark-submit --master=yarn --conf spark.executor.memory=20g --conf spark.driver.memory=15g --conf spark.executor.cores=3 --conf spark.executor.instances=5 --jars $EXTRAJARPATH/mysql.jar,$EXTRAJARPATH/elasticsearch-hadoop-5.5.0/dist/elasticsearch-hadoop-5.5.0.jar,$SOLR/spark-solr-3.5.8-shaded.jar --conf 'spark.driver.extraJavaOptions=-Dbasicauth=admin:admin' --py-files /data/bdr/scripts/bdr-ptabapi/parm.py $CODEPATH/public_doc_pushto_solr.py $CODEPATH $startdatetime
18/12/18 16:45:58 INFO ZooKeeper: Initiating client connection, connectString=10.201.13.18:2181, 10.201.13.54:2181, 10.201.13.32:2181 sessionTimeout=30000 watcher=org.apache.solr.common.cloud.SolrZkClient$1@76746be8
18/12/18 16:45:58 WARN StaticHostProvider: No IP address found for server: 10.201.13.54:2181
java.net.UnknownHostException: 10.201.13.54: Name or service not known
at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method)
at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:928)
at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1323)
at java.net.InetAddress.getAllByName0(InetAddress.java:1276)
at java.net.InetAddress.getAllByName(InetAddress.java:1192)
at java.net.InetAddress.getAllByName(InetAddress.java:1126)
at org.apache.zookeeper.client.StaticHostProvider.resolveAndShuffle(StaticHostProvider.java:98)
at org.apache.zookeeper.client.StaticHostProvider.(StaticHostProvider.java:61)
at org.apache.zookeeper.ZooKeeper.(ZooKeeper.java:445)
at org.apache.zookeeper.ZooKeeper.(ZooKeeper.java:380)
at org.apache.solr.common.cloud.SolrZooKeeper.(SolrZooKeeper.java:43)
at org.apache.solr.common.cloud.ZkClientConnectionStrategy.createSolrZooKeeper(ZkClientConnectionStrategy.java:105)
at org.apache.solr.common.cloud.DefaultConnectionStrategy.connect(DefaultConnectionStrategy.java:37)
at org.apache.solr.common.cloud.SolrZkClient.(SolrZkClient.java:150)
at org.apache.solr.common.cloud.SolrZkClient.(SolrZkClient.java:121)
at org.apache.solr.common.cloud.SolrZkClient.(SolrZkClient.java:111)
at org.apache.solr.common.cloud.ZkStateReader.(ZkStateReader.java:295)
at org.apache.solr.client.solrj.impl.ZkClientClusterStateProvider.connect(ZkClientClusterStateProvider.java:155)
at org.apache.solr.client.solrj.impl.CloudSolrClient.connect(CloudSolrClient.java:399)
at com.lucidworks.spark.util.SolrSupport$.getSolrCloudClient(SolrSupport.scala:221)
at com.lucidworks.spark.util.SolrSupport$.getNewSolrCloudClient(SolrSupport.scala:240)
at com.lucidworks.spark.util.CacheCloudSolrClient$$anon$1.load(SolrSupport.scala:38)
at com.lucidworks.spark.util.CacheCloudSolrClient$$anon$1.load(SolrSupport.scala:36)
at com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599)
at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2379)
at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342)
at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2257)
at com.google.common.cache.LocalCache.get(LocalCache.java:4000)
at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:4004)
at com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874)
at com.lucidworks.spark.util.SolrSupport$.getCachedCloudClient(SolrSupport.scala:244)
at com.lucidworks.spark.util.SolrSupport$.getSolrBaseUrl(SolrSupport.scala:248)
at com.lucidworks.spark.SolrRelation.insert(SolrRelation.scala:636)
at solr.DefaultSource.createRelation(DefaultSource.scala:27)
at org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:518)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:215)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:280)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:214)
at java.lang.Thread.run(Thread.java:745)
18/12/18 16:45:58 WARN StaticHostProvider: No IP address found for server: 10.201.13.32:2181
java.net.UnknownHostException: 10.201.13.32: Name or service not known
at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method)
at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:928)
at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1323)
at java.net.InetAddress.getAllByName0(InetAddress.java:1276)
at java.net.InetAddress.getAllByName(InetAddress.java:1192)
at java.net.InetAddress.getAllByName(InetAddress.java:1126)
at org.apache.zookeeper.client.StaticHostProvider.resolveAndShuffle(StaticHostProvider.java:98)
at org.apache.zookeeper.client.StaticHostProvider.(StaticHostProvider.java:61)
at org.apache.zookeeper.ZooKeeper.(ZooKeeper.java:445)
at org.apache.zookeeper.ZooKeeper.(ZooKeeper.java:380)
at org.apache.solr.common.cloud.SolrZooKeeper.(SolrZooKeeper.java:43)
at org.apache.solr.common.cloud.ZkClientConnectionStrategy.createSolrZooKeeper(ZkClientConnectionStrategy.java:105)
at org.apache.solr.common.cloud.DefaultConnectionStrategy.connect(DefaultConnectionStrategy.java:37)
at org.apache.solr.common.cloud.SolrZkClient.(SolrZkClient.java:150)
at org.apache.solr.common.cloud.SolrZkClient.(SolrZkClient.java:121)
at org.apache.solr.common.cloud.SolrZkClient.(SolrZkClient.java:111)
at org.apache.solr.common.cloud.ZkStateReader.(ZkStateReader.java:295)
at org.apache.solr.client.solrj.impl.ZkClientClusterStateProvider.connect(ZkClientClusterStateProvider.java:155)
at org.apache.solr.client.solrj.impl.CloudSolrClient.connect(CloudSolrClient.java:399)
at com.lucidworks.spark.util.SolrSupport$.getSolrCloudClient(SolrSupport.scala:221)
at com.lucidworks.spark.util.SolrSupport$.getNewSolrCloudClient(SolrSupport.scala:240)
at com.lucidworks.spark.util.CacheCloudSolrClient$$anon$1.load(SolrSupport.scala:38)
at com.lucidworks.spark.util.CacheCloudSolrClient$$anon$1.load(SolrSupport.scala:36)
at com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599)
at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2379)
at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342)
at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2257)
at com.google.common.cache.LocalCache.get(LocalCache.java:4000)
at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:4004)
at com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874)
at com.lucidworks.spark.util.SolrSupport$.getCachedCloudClient(SolrSupport.scala:244)
at com.lucidworks.spark.util.SolrSupport$.getSolrBaseUrl(SolrSupport.scala:248)
at com.lucidworks.spark.SolrRelation.insert(SolrRelation.scala:636)
at solr.DefaultSource.createRelation(DefaultSource.scala:27)
at org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:518)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:215)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:280)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:214)
at java.lang.Thread.run(Thread.java:745)
18/12/18 16:45:58 INFO ClientCnxn: Opening socket connection to server 10.201.13.18/10.201.13.18:2181. Will not attempt to authenticate using SASL (unknown error)
18/12/18 16:45:58 INFO ClientCnxn: Socket connection established to 10.201.13.18/10.201.13.18:2181, initiating session
18/12/18 16:45:58 INFO ClientCnxn: Session establishment complete on server 10.201.13.18/10.201.13.18:2181, sessionid = 0x20000047878000f, negotiated timeout = 30000
18/12/18 16:45:58 INFO ConnectionManager: zkClient has connected
18/12/18 16:45:58 INFO ZooKeeper: Session: 0x20000047878000f closed
18/12/18 16:45:58 INFO ClientCnxn: EventThread shut down
Traceback (most recent call last):
File "/data/bdr/scripts/bdr-ptabapi/public_doc_pushto_solr.py", line 72, in
solrDF.write.format("solr").option("zkhost","10.201.13.18:2181, 10.201.13.54:2181, 10.201.13.32:2181").option("collection","ptab-documents").option("batch_size", "10000").option("commit_within", "5000").mode("append").save()
File "/usr/hdp/current/spark2-client/python/lib/pyspark.zip/pyspark/sql/readwriter.py", line 548, in save
File "/usr/hdp/current/spark2-client/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1133, in call
File "/usr/hdp/current/spark2-client/python/lib/pyspark.zip/pyspark/sql/utils.py", line 63, in deco
File "/usr/hdp/current/spark2-client/python/lib/py4j-0.10.4-src.zip/py4j/protocol.py", line 319, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o186.save.
: com.google.common.util.concurrent.UncheckedExecutionException: org.apache.solr.common.cloud.ZooKeeperException:
at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2263)
at com.google.common.cache.LocalCache.get(LocalCache.java:4000)
at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:4004)
at com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874)
at com.lucidworks.spark.util.SolrSupport$.getCachedCloudClient(SolrSupport.scala:244)
at com.lucidworks.spark.util.SolrSupport$.getSolrBaseUrl(SolrSupport.scala:248)
at com.lucidworks.spark.SolrRelation.insert(SolrRelation.scala:636)
at solr.DefaultSource.createRelation(DefaultSource.scala:27)
at org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:518)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:215)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:280)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:214)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.solr.common.cloud.ZooKeeperException:
at org.apache.solr.client.solrj.impl.ZkClientClusterStateProvider.connect(ZkClientClusterStateProvider.java:165)
at org.apache.solr.client.solrj.impl.CloudSolrClient.connect(CloudSolrClient.java:399)
at com.lucidworks.spark.util.SolrSupport$.getSolrCloudClient(SolrSupport.scala:221)
at com.lucidworks.spark.util.SolrSupport$.getNewSolrCloudClient(SolrSupport.scala:240)
at com.lucidworks.spark.util.CacheCloudSolrClient$$anon$1.load(SolrSupport.scala:38)
at com.lucidworks.spark.util.CacheCloudSolrClient$$anon$1.load(SolrSupport.scala:36)
at com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599)
at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2379)
at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342)
at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2257)
... 20 more
Caused by: org.apache.zookeeper.KeeperException$NoAuthException: KeeperErrorCode = NoAuth for /live_nodes
at org.apache.zookeeper.KeeperException.create(KeeperException.java:113)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1472)
at org.apache.solr.common.cloud.SolrZkClient.lambda$getChildren$4(SolrZkClient.java:329)
at org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:60)
at org.apache.solr.common.cloud.SolrZkClient.getChildren(SolrZkClient.java:329)
at org.apache.solr.common.cloud.ZkStateReader.refreshLiveNodes(ZkStateReader.java:762)
at org.apache.solr.common.cloud.ZkStateReader.createClusterStateWatchersAndUpdate(ZkStateReader.java:448)
at org.apache.solr.client.solrj.impl.ZkClientClusterStateProvider.connect(ZkClientClusterStateProvider.java:156)
... 29 more
18/12/21 10:40:11 INFO YarnScheduler: Stage 1 was cancelled
18/12/21 10:40:11 INFO DAGScheduler: ResultStage 1 (foreachPartition at SolrSupport.scala:316) failed in 20.520 s due to Job aborted due to stage failure: Task 1 in stage 1.0 failed 4 times, most recent failure: Lost task 1.3 in stage 1.0 (TID 18, bdr-itwp-hdfs-5.dev.uspto.gov, executor 1): scala.MatchError: org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at http://10.201.10.41:8983/solr/ptab-documents: Expected mime type application/octet-stream but got text/html.
HTTP ERROR 401
Problem accessing /solr/ptab-documents/update. Reason:
require authentication(of class org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException) at com.lucidworks.spark.util.SolrSupport$.sendBatchToSolrWithRetry(SolrSupport.scala:352) at com.lucidworks.spark.util.SolrSupport$$anonfun$indexDocs$1.apply(SolrSupport.scala:335) at com.lucidworks.spark.util.SolrSupport$$anonfun$indexDocs$1.apply(SolrSupport.scala:316) at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$29.apply(RDD.scala:926) at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$29.apply(RDD.scala:926) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1951) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1951) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) at org.apache.spark.scheduler.Task.run(Task.scala:99) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)
Driver stacktrace:
18/12/21 10:40:11 INFO DAGScheduler: Job 1 failed: foreachPartition at SolrSupport.scala:316, took 20.662378 s
18/12/21 10:40:11 WARN TaskSetManager: Lost task 0.3 in stage 1.0 (TID 20, bdr-itwp-hdfs-5.dev.uspto.gov, executor 1): TaskKilled (killed intentionally)
18/12/21 10:40:12 INFO TaskSetManager: Lost task 5.2 in stage 1.0 (TID 16) on bdr-itwp-hdfs-3.dev.uspto.gov, executor 2: scala.MatchError (org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at http://10.201.10.41:8983/solr/ptab-documents: Expected mime type application/octet-stream but got text/html.
HTTP ERROR 401
Problem accessing /solr/ptab-documents/update. Reason:
require authentication(of class org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException)) [duplicate 2] Traceback (most recent call last): File "/data/bdr/scripts/bdr-ptabapi/public_doc_pushto_solr.py", line 72, in solrDF.write.format("solr").option("zkhost","10.201.10.57:2181, 10.201.10.34:2181, 10.201.10.11:2181").option("collection","ptab-documents").option("batch_size", "10000").option("commit_within", "5000").mode("append").save() File "/usr/hdp/current/spark2-client/python/lib/pyspark.zip/pyspark/sql/readwriter.py", line 548, in save File "/usr/hdp/current/spark2-client/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1133, in __call__ File "/usr/hdp/current/spark2-client/python/lib/pyspark.zip/pyspark/sql/utils.py", line 63, in deco File "/usr/hdp/current/spark2-client/python/lib/py4j-0.10.4-src.zip/py4j/protocol.py", line 319, in get_return_value py4j.protocol.Py4JJavaError: An error occurred while calling o185.save. : java.lang.RuntimeException: org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 1.0 failed 4 times, most recent failure: Lost task 1.3 in stage 1.0 (TID 18, bdr-itwp-hdfs-5.dev.uspto.gov, executor 1): scala.MatchError: org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at http://10.201.10.41:8983/solr/ptab-documents: Expected mime type application/octet-stream but got text/html. <title>Error 401 require authentication</title>
HTTP ERROR 401
Problem accessing /solr/ptab-documents/update. Reason:
require authentication(of class org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException) at com.lucidworks.spark.util.SolrSupport$.sendBatchToSolrWithRetry(SolrSupport.scala:352) at com.lucidworks.spark.util.SolrSupport$$anonfun$indexDocs$1.apply(SolrSupport.scala:335) at com.lucidworks.spark.util.SolrSupport$$anonfun$indexDocs$1.apply(SolrSupport.scala:316) at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$29.apply(RDD.scala:926) at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$29.apply(RDD.scala:926) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1951) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1951) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) at org.apache.spark.scheduler.Task.run(Task.scala:99) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)
Driver stacktrace:
at solr.DefaultSource.createRelation(DefaultSource.scala:31)
at org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:518)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:215)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:280)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:214)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 1.0 failed 4 times, most recent failure: Lost task 1.3 in stage 1.0 (TID 18, bdr-itwp-hdfs-5.dev.uspto.gov, executor 1): scala.MatchError: org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at http://10.201.10.41:8983/solr/ptab-documents: Expected mime type application/octet-stream but got text/html.
HTTP ERROR 401
Problem accessing /solr/ptab-documents/update. Reason:
require authentication(of class org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException) at com.lucidworks.spark.util.SolrSupport$.sendBatchToSolrWithRetry(SolrSupport.scala:352) at com.lucidworks.spark.util.SolrSupport$$anonfun$indexDocs$1.apply(SolrSupport.scala:335) at com.lucidworks.spark.util.SolrSupport$$anonfun$indexDocs$1.apply(SolrSupport.scala:316) at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$29.apply(RDD.scala:926) at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$29.apply(RDD.scala:926) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1951) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1951) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) at org.apache.spark.scheduler.Task.run(Task.scala:99) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)
Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1435)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1423)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1422)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1422)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:802)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:802)
at scala.Option.foreach(Option.scala:257)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:802)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1650)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1605)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1594)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:628)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1925)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1938)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1951)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1965)
at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1.apply(RDD.scala:926)
at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1.apply(RDD.scala:924)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
at org.apache.spark.rdd.RDD.foreachPartition(RDD.scala:924)
at com.lucidworks.spark.util.SolrSupport$.indexDocs(SolrSupport.scala:316)
at com.lucidworks.spark.SolrRelation.insert(SolrRelation.scala:719)
at solr.DefaultSource.createRelation(DefaultSource.scala:27)
... 13 more
Caused by: scala.MatchError: org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at http://10.201.10.41:8983/solr/ptab-documents: Expected mime type application/octet-stream but got text/html.
HTTP ERROR 401
Problem accessing /solr/ptab-documents/update. Reason:
require authentication(of class org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException) at com.lucidworks.spark.util.SolrSupport$.sendBatchToSolrWithRetry(SolrSupport.scala:352) at com.lucidworks.spark.util.SolrSupport$$anonfun$indexDocs$1.apply(SolrSupport.scala:335) at com.lucidworks.spark.util.SolrSupport$$anonfun$indexDocs$1.apply(SolrSupport.scala:316) at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$29.apply(RDD.scala:926) at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$29.apply(RDD.scala:926) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1951) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1951) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) at org.apache.spark.scheduler.Task.run(Task.scala:99) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ... 1 more
18/12/21 10:40:12 INFO SparkContext: Invoking stop() from shutdown hook
18/12/21 10:40:12 INFO SparkUI: Stopped Spark web UI at http://10.64.66.71:4043
18/12/21 10:40:12 INFO ClientCnxn: EventThread shut down
18/12/21 10:40:12 INFO ZooKeeper: Session: 0x1000004ef4a005d closed
18/12/21 10:40:12 INFO YarnClientSchedulerBackend: Interrupting monitor thread
18/12/21 10:40:12 INFO YarnClientSchedulerBackend: Shutting down all executors
18/12/21 10:40:12 INFO YarnSchedulerBackend$YarnDriverEndpoint: Asking each executor to shut down
18/12/21 10:40:12 INFO SchedulerExtensionServices: Stopping SchedulerExtensionServices
(serviceOption=None,
services=List(),
started=false)
18/12/21 10:40:12 INFO YarnClientSchedulerBackend: Stopped
18/12/21 10:40:12 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
18/12/21 10:40:12 INFO MemoryStore: MemoryStore cleared
18/12/21 10:40:12 INFO BlockManager: BlockManager stopped
18/12/21 10:40:12 INFO BlockManagerMaster: BlockManagerMaster stopped
18/12/21 10:40:12 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
18/12/21 10:40:12 INFO SparkContext: Successfully stopped SparkContext
18/12/21 10:40:12 INFO ShutdownHookManager: Shutdown hook called
18/12/21 10:40:12 INFO ShutdownHookManager: Deleting directory /tmp/spark-8432b916-db24-4281-b541-4793f4a57d1b/pyspark-14241e2d-4b39-44ae-94e6-ee7733c4b01b
18/12/21 10:40:12 INFO ShutdownHookManager: Deleting directory /tmp/spark-8432b916-db24-4281-b541-4793f4a57d1b
Spark-submit Python script failed
Make sure you are setting -Dbasicauth on BOTH the driver AND executor.
--conf 'spark.driver.extraJavaOptions=-Dbasicauth=admin:admin' --conf 'spark.executor.extraJavaOptions=-Dbasicauth=admin:admin'
The README
example is using --master local
but when using --master yarn
the spark executor nodes will perform the action and will need the --conf as well.