MrPowers/chispa

ignore_schema in assert_df_equality removed in 9.3?

MathiasHolmstrom opened this issue · 6 comments

I used this parameter in 9.2 but it's no longer there in 9.3. Why was this removed and does it mean I can't perform unit-tests without comparing types any longer?

Yea, we had to remove this because it was a bad addition to the library (it didn't make sense after I thought about it deeper). Can you give me a better idea of what you're trying to accomplish, so I can see if it's possible with chispa or if the library should be modified? Thank you.

If I am comparing two dataframes and don't care about the types of the columns. In that case I want the assert dataframes to pass even if the types are different. Is there another way of accomplishing this behavior?

@Hiderdk - yea this should work: chispa.assert_basic_rows_equality(df1.collect(), df2.collect()). Let me know if that works for you.

@MrPowers first of all, thanks for the wonderful library. Why did you decide to change the API of this assertion in the minor version bump of the package?

This caused our tests to break, the convention is to rely on the fact that the minor version bumps don't change the API and thus package managers (like poetry) update the version of the dips to the latest minor version.

We used this option a lot because we don't really care of whether the column is IntegerType or LongType but we do want to compare Spark DataFrames and use other comparator options of assert_df_equality. Without it, we will need to conduct some boilerplate type casting in the test code to make tests work again. It's a petty that you decided to remove it.

@ivanychev - yea, I have the work-around that will meet your use case above.

Why did you decide to change the API of this assertion in the minor version bump of the package?

We're using Semantic Versioning 2.0. Per the spec: "Major version zero (0.y.z) is for initial development. Anything MAY change at any time. The public API SHOULD NOT be considered stable."

It's a petty that you decided to remove it.

No, this wasn't petty. This option was causing bugs and breaking workflows. We needed to remove it. I do my best to make all changes backwards compatible. This one absolutely needed to be removed cause it was causing lots of issues.

We used this option a lot because we don't really care of whether the column is IntegerType or LongType but we do want to compare Spark DataFrames and use other comparator options of assert_df_equality

Feel free to propose another abstraction that's not breaking, not buggy, and will be a good addition for the entire chispa community 🚀