
"Smart" column add in `DynamicDataFrameBuilder`

AndreiKingsley opened this issue · 4 comments

I can't rewrite this code with DynamicDataFrameBuilder:
Here I want to do the following - if the column I want to add is already in the builder (with the same name and contains the same elements), it shouldn't be added again.

Indeed it might be worth it to add a contains(col): Boolean function to DynamicDataFrameBuilder.

Could you try whether this works?

public operator fun DynamicDataFrameBuilder.contains(col: AnyCol): Boolean =
    toDataFrame().getColumnOrNull(col) == col

Column equality in DataFrame is checked by this function:

internal fun <T> BaseColumn<T>.checkEquals(other: Any?): Boolean {
    if (this === other) return true

    if (this !is AnyCol) return false
    if (other !is AnyCol) return false

    if (name != other.name) return false
    if (type != other.type) return false
    return values.equalsByElement(other.values)

So it checks the name and values by default when you use colA == colB

I guess it should work, but it's not efficient in terms of performance. So yeah, contains will work I suppose.

@AndreiKingsley you can make a PR that adds the function to DynamicDataFrameBuilder if you like :)