databrickslabs/dbx

DBX example (Python quickstart) coverage test wont run due to dependency issues

Opened this issue · 0 comments

Expected Behavior

Execute code from https://dbx.readthedocs.io/en/latest/guides/python/python_quickstart/ works.

Current Behavior

When running pytest tests/unit --cov there is an exception: AttributeError: 'DataFrame' object has no attribute 'iteritems'

Steps to Reproduce (for bugs)

Follow the instructions and execute the code from https://dbx.readthedocs.io/en/latest/guides/python/python_quickstart/

Context

This is due to pyspark version is fixed and pandas version is not fixed. In pandas 'iteritems' became deprecated and removed.
Upgrading pyspark (and delta-spark) to latest version will fix the issue, but first I had to fix another issue:
Due to the python version is fixed in the example (to 3.9), and my environment has python 3.11, I got the following error: Python in worker has different version 3.11 than that in driver 3.9, PySpark cannot run with different minor versions. Please check environment variables PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON are correctly set.

After setting the worker version to python3.9, it worked. (There should be a note somewhere to need to take care of this version as well.)

Your Environment

platform darwin -- Python 3.9.17

  • dbx version used: 0.8.18
  • Databricks Runtime version: not applicable