databricks/databricks-vscode

[BUG] unable to run spark code - BAD_REQUEST: Spark Connect (vsix from nightly run - 5455543994)

sauerchextern opened this issue · 3 comments

Hi,
first of all, your changes and adjustments look really promising! Thanks a lot.

Describe the bug
Whenever I run pyspark code in visual studio code (run cell, debug cell or run python file) I receive the following message:
"BAD_REQUEST: Spark Connect is enabled only on Unity Catalog enabled Shared and Single User Clusters."
image

To Reproduce
Steps to reproduce the behavior:

  1. Create Cluster
    image
  2. Go to Visual Studio Code
  3. Install Artifact from this nightly run: https://github.com/databricks/databricks-vscode/actions/runs/5455543994
  4. debug cell with any spark code e.g. spark.sql("USE default") or use the code below (additional context).

Additional context

import os
import sys
from datetime import date
from databricks.connect import DatabricksSession
from pyspark.sql.types import *

# COMMAND ----------
spark = DatabricksSession.builder.getOrCreate()

# COMMAND ----------

# Create a Spark DataFrame consisting of high and low temperatures
# by airport code and date.
schema = StructType([
    StructField('AirportCode', StringType(), False),
    StructField('Date', DateType(), False),
    StructField('TempHighF', IntegerType(), False),
    StructField('TempLowF', IntegerType(), False)
])

# COMMAND ----------

data = [
    ['BLI', date(2021, 4, 3), 52, 43],
    ['BLI', date(2021, 4, 2), 50, 38],
    ['BLI', date(2021, 4, 1), 52, 41],
    ['PDX', date(2021, 4, 3), 64, 45],
    ['PDX', date(2021, 4, 2), 61, 41],
    ['PDX', date(2021, 4, 1), 66, 39],
    ['SEA', date(2021, 4, 3), 57, 43],
    ['SEA', date(2021, 4, 2), 54, 39],
    ['SEA', date(2021, 4, 1), 56, 41]
]

temps = spark.createDataFrame(data, schema)

# COMMAND ----------

# Create a table on the Databricks cluster and then fill
# the table with the DataFrame's contents.
# If the table already exists from a previous run,
# delete it first.
spark.sql('USE default')
spark.sql('DROP TABLE IF EXISTS zzz_demo_temps_table')
temps.write.saveAsTable('zzz_demo_temps_table')```

Hi @sauerchextern. I am not able to repro this. Can you check that the spark remote environment variable has the correct cluster id? You can do an os.environ['SPARK_REMOTE'].

cc @nija-at do you know what could be the issue (assuming the configs from vscode are correct)?

Hi @sauerchextern. I am not able to repro this. Can you check that the spark remote environment variable has the correct cluster id? You can do an os.environ['SPARK_REMOTE'].

cc @nija-at do you know what could be the issue (assuming the configs from vscode are correct)?

I would say that the cluster ID is set correctly:
image

Unity was not activated in the workspace, therefore not in the cluster.