julioasotodv/spark-df-profiling

showing Incorrect Missing data in HTML Report

Opened this issue · 6 comments

After generating the HTML report using spark-df- profiling It is showing the percentage of Missing data as 0%.

Even though dataframe has some missing data

Could you give an example?

Is this fixed yet? mine also shows wrong missing data as 0%

@harika1419 I think I found the issue. It's in line 397.
Change to this:

results_data = df.select(column).na.drop().agg(countDistinct(col(column)).alias("distinct_count"),
                                                       count(col(column)).alias("count")).toPandas()

@julioasotodv you might need to look at this solution

Hi...
That issue was fixed after upgrading the spark from 1.6 to 2.3.3

Hi @harika1419,
Thanks for informing. I'm facing this issue while using spark 2.4.2, that is why I thought its not fixed yet.

I'm on Spark 3.1.0, and it's showing wrong. Also the number of zeros are wrong.