scicloj/tablecloth.time

Interval adjusment: handle case where first element is missing

Opened this issue · 1 comments

In our current implementation of adjust-interval (See #14), we use the first item in a column to determine the target-datatype of the time unit to which we are converting. More specifically, adjust-interval takes a ->new-time_converter fn that the user supplies, and calls that function on the first item in the targeted column to get the new unit, and then uses tech.v3.datatype/elemwise-datatype to determine the unit's keyword.

This is all fine, but @cnuernber raised a good point that we overlooked:

There are some auto-detection routines for datatype that rely on converting the first element. All I might add to that is you may want to convert the first non-missing element; what if your first element is a missing/null value?

We should figure out a way to handle this case. It might also pay to generalize the process of determining the time datatype from the row if this is going to be a more common practice.

When we added the index structure to tech.ml.datset (see techascent/tech.ml.dataset#214), we prevented a column from returning an index if there are missing values in the column here. So this issue may not be relevant any more. There should always be an item in the first position because the column should not have any missing values.