palantir/k8s-spark-scheduler

The spark job is always in pending

chia7712 opened this issue · 1 comments

I follow the readme to call kubectl apply -f examples/extender.yml and the scheduler is created.

spark-scheduler-f575548bb-7fs5w   2/2     Running   0          52m
spark-scheduler-f575548bb-l2bdn   2/2     Running   0          52m

and then I create pod template for driver and executor.

apiVersion: v1
kind: Pod
metadata:
  labels:
    spark-app-id: my-custom-id
  annotations:
    spark-driver-cpus: 1
    spark-driver-mem: 1g
    spark-executor-cpu: 2
    spark-executor-mem: 4g
    spark-executor-count: 8
spec:
  schedulerName: spark-scheduler

apiVersion: v1
kind: Pod
metadata:
  labels:
    spark-app-id: my-custom-id
spec:
  schedulerName: spark-scheduler

The command to submit spark job is shown below.

./bin/spark-submit \
    --master k8s://https://spark10:6443 \
    --deploy-mode cluster \
    --name my-custom-id \
    --class org.apache.spark.examples.SparkPi \
    --conf spark.executor.instances=1 \
    --conf spark.kubernetes.container.image=chia7712/spark:latest \
    --conf spark.kubernetes.container.image.pullPolicy=Never \
    --conf spark.kubernetes.driver.podTemplateFile=/home/chia7712/driver.yaml \
    --conf spark.kubernetes.executor.podTemplateFile=/home/chia7712/executor.yaml \
    local:///opt/spark/examples/jars/spark-examples_2.12-3.1.2.jar

However, the job is always in pending. the spec of driver is shown below.

Name:         my-custom-id-d759047a14c4c0ce-driver
Namespace:    default
Priority:     0
Node:         <none>
Labels:       spark-app-id=my-custom-id
              spark-app-selector=spark-8e9a9108444e47878de54a64a1849f46
              spark-role=driver
Annotations:  spark-driver-cpus: 1
              spark-driver-mem: 1g
              spark-executor-count: 8
              spark-executor-cpu: 2
              spark-executor-mem: 4g
Status:       Pending
IP:           
IPs:          <none>
Containers:
  spark-kubernetes-driver:
    Image:       chia7712/spark:latest
    Ports:       7078/TCP, 7079/TCP, 4040/TCP
    Host Ports:  0/TCP, 0/TCP, 0/TCP
    Args:
      driver
      --properties-file
      /opt/spark/conf/spark.properties
      --class
      org.apache.spark.examples.SparkPi
      local:///opt/spark/examples/jars/spark-examples_2.12-3.1.2.jar
    Limits:
      memory:  1408Mi
    Requests:
      cpu:     1
      memory:  1408Mi
    Environment:
      SPARK_USER:                 chia7712
      SPARK_APPLICATION_ID:       spark-8e9a9108444e47878de54a64a1849f46
      SPARK_DRIVER_BIND_ADDRESS:   (v1:status.podIP)
      SPARK_LOCAL_DIRS:           /var/data/spark-364d254a-6342-468e-8c65-74439134c645
      SPARK_CONF_DIR:             /opt/spark/conf
    Mounts:
      /opt/spark/conf from spark-conf-volume-driver (rw)
      /opt/spark/pod-template from pod-template-volume (rw)
      /var/data/spark-364d254a-6342-468e-8c65-74439134c645 from spark-local-dir-1 (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-9xk4g (ro)
Conditions:
  Type           Status
  PodScheduled   False 
Volumes:
  pod-template-volume:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      my-custom-id-d759047a14c4c0ce-driver-podspec-conf-map
    Optional:  false
  spark-local-dir-1:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     
    SizeLimit:  <unset>
  spark-conf-volume-driver:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      spark-drv-07a9b27a14c4c432-conf-map
    Optional:  false
  kube-api-access-9xk4g:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:                      <none>

Did I miss any configuration?

It seems the driver.yaml gets some incorrect arguments (see #170)