Describe breaks on `Number` column (and other statistics inconsistencies)

Question

Describe breaks on `Number` column (and other statistics inconsistencies)

Jolanrensen opened this issue 9 months ago · 3 comments

This happens because the Iterable<Number>.std() function accepts Number but doesn't convert them to Double (like mean() does).

There are a couple more missing actually:

cumSum
- Misses Byte, Short
- Has DataColumn overloads but not Iterable/Sequence
mean
- Has Sequence<Double | Float> but not for other Number types
median
- Misses Float, Byte, Short, Number (it only works on Comparable)
- Needs to handle other types consistently
- No Sequence overloads
- Cannot skipNA (if applicable)
min and max
- internal Iterable<T>.min and max are not used and can be removed. Stdlib functions for Comparable sequences and iterables are used instead.
- Misses Number (it only works on Comparable)
std
- Breaks if type is Number
- Short and Byte are cast to Int which works but is a bit iffy
- Iterable overloads missing for Number, Short, Byte
- Sequence overloads missing
- Nullable overloads missing for Iterable (and sequence)
varianceAndMean
- also provides std(ddof: Int) function without docs of what ddof even means, as well as count. Could have a better name. Also can produce nulls?? this screams for documentation.
- variance functions are missing on DataColumns entirely (had to be added separately for Kandy)
- Misses Short, Byte, Number, and nullable overloads
- Misses Sequence overloads
sum
- Has TODOs where types are amiss
- Misses Float(!), Short, Byte, Number in various Iterable overloads.

All are also missing BigInteger as we're supporting BigDecimal too.

Answer 1 · 2024-01-15T18:21:10.000Z

#352 probably same problem

Answer 2 · 2024-01-18T11:31:41.000Z

As mentioned here #543, some functions like median(ints) might result in an unexpectedly rounded Int in return. It might be better to let all functions return Double and then handle BigInteger / BigDecimal separately for now, as they're java-specific for now.