[pyspark] assert_frame_equal_with_sort() broken for complex types
vlieven opened this issue · 1 comments
Describe the bug
The function assert_frame_equal_with_sort()
provided in /tests/common/spark.py
in the PySpark template cannot deal with complex types, and will throw an error related to pyspark <-> pandas conversion. I noticed this while writing a test for a DataFrame containing an ArrayType, but I suspect MapType and StructType will trigger the same kind of error.
To Reproduce
Steps to reproduce the behavior:
- Perform
assert_frame_equal_with_sort()
on a DataFrame containing an ArrayType column.
Expected behavior
The assertion properly tests the equality of the two DataFrames instead of throwing an error.
Desktop (please complete the following information):
- OS: MacOS Big Sur
- Datafy Version: 0.37
Additional context
I fixed my issue by replacing the call to assert_frame_equal_with_sort()
by a call to a function defined by the chispa library. This seems to work, but I don't know if the templates should be opinionated about which flavour of spark test helpers to use.
We provide this more as example I believe but it might be interesting to help users with finding a test setup that keeps on working if you are doing some more real work :)
So feel free to add it