showing Incorrect Missing data in HTML Report
Opened this issue · 6 comments
harika1419 commented
After generating the HTML report using spark-df- profiling It is showing the percentage of Missing data as 0%.
Even though dataframe has some missing data
adutchengineer commented
Could you give an example?
shhanani commented
Is this fixed yet? mine also shows wrong missing data as 0%
shhanani commented
@harika1419 I think I found the issue. It's in line 397.
Change to this:
results_data = df.select(column).na.drop().agg(countDistinct(col(column)).alias("distinct_count"),
count(col(column)).alias("count")).toPandas()
@julioasotodv you might need to look at this solution
harika1419 commented
Hi...
That issue was fixed after upgrading the spark from 1.6 to 2.3.3
shhanani commented
Hi @harika1419,
Thanks for informing. I'm facing this issue while using spark 2.4.2, that is why I thought its not fixed yet.
Strauman commented
I'm on Spark 3.1.0, and it's showing wrong. Also the number of zeros are wrong.