Show results when running 1000 rules are not in order
Closed this issue · 2 comments
canimus commented
I appears that when running a test with 1000 rules.
Test scenario is with the Taxi NYC data set with 20M Rows.
df = spark.read.parquet("temp/data/*.parquet")
c = Check(CheckLevel.Warning, "NYC")
for i in range(1000):
c.is_greater_than("fare_amount", i)
c.validate(spark, df).show(n=1000, truncate=False)
# Displayed dataframe contains wrong order in rows
# in 995 there is a discrepancy because 10% of the rows are certainly not with `fare_amount > 995`
canimus commented
Fixed during integration of split computation
canimus commented
Done in pre-release compute method split.