holgerbrandl/krangl

R alternative: Dataframe.drop_na()

TheMrCodes opened this issue · 6 comments

DataFrame object equivalent function for deleting columns with NA values

https://www.rdocumentation.org/packages/tidyr/versions/0.8.3/topics/drop_na

Great suggestion. The immediate solution would be

df.filterByRow { !it.values.contains(null)

but to allow providing a column selector I've just added filterNotNull (also see referenced commit for example in tests).

df.filterNotNull() 
df.filterNotNull({ startsWith("user") })

I'm still uncertain about the correct naming here, see https://kotlinlang.slack.com/archives/C4W52CFEZ/p1611263648007500

I guess this was my most crappy commit to this repo since a long time. Functionally it was fine as it contained the bits described above, but somehow a rebuilt API documentation and other unrelated changes slipped in as well. Sorry for the confusion.

Personally i would find filterNa more intuitive for someone comming from R, but my vote is for filterNotNull because its mor kotlin like

On kotlin slack it was argued that Double.POSITIVE_INFINITY is usually considered too NA (also in R afaik), but would/should not be covered by filterNotNull() (which is also still my preferred name here)

Good, does that means that fillterNotNull only filters out Null Values?

This would be no problem for my use case.
In my opinion krangl don't has to be an exact replica of R and Python functionality