Kotlin/dataframe

Describe breaks on `Number` column (and other statistics inconsistencies)

Jolanrensen opened this issue · 3 comments

This happens because the Iterable<Number>.std() function accepts Number but doesn't convert them to Double (like mean() does).

There are a couple more missing actually:

  • cumSum
    • Misses Byte, Short
    • Has DataColumn overloads but not Iterable/Sequence
  • mean
    • Has Sequence<Double | Float> but not for other Number types
  • median
    • Misses Float, Byte, Short, Number (it only works on Comparable)
    • Needs to handle other types consistently
    • No Sequence overloads
    • Cannot skipNA (if applicable)
  • min and max
    • internal Iterable<T>.min and max are not used and can be removed. Stdlib functions for Comparable sequences and iterables are used instead.
    • Misses Number (it only works on Comparable)
  • std
    • Breaks if type is Number
    • Short and Byte are cast to Int which works but is a bit iffy
    • Iterable overloads missing for Number, Short, Byte
    • Sequence overloads missing
    • Nullable overloads missing for Iterable (and sequence)
  • varianceAndMean
    • also provides std(ddof: Int) function without docs of what ddof even means, as well as count. Could have a better name. Also can produce nulls?? this screams for documentation.
    • variance functions are missing on DataColumns entirely (had to be added separately for Kandy)
    • Misses Short, Byte, Number, and nullable overloads
    • Misses Sequence overloads
  • sum
    • Has TODOs where types are amiss
    • Misses Float(!), Short, Byte, Number in various Iterable overloads.

All are also missing BigInteger as we're supporting BigDecimal too.

#352 probably same problem

As mentioned here #543, some functions like median(ints) might result in an unexpectedly rounded Int in return. It might be better to let all functions return Double and then handle BigInteger / BigDecimal separately for now, as they're java-specific for now.