JuliaAI/ScientificTypes.jl

Inconsistent behaviour regarding missing

Closed this issue · 0 comments

julia> v= [1, 2,3, missing]
4-element Array{Union{Missing, Int64},1}:
 1       
 2       
 3       
  missing

julia> v[1:3]
3-element Array{Union{Missing, Int64},1}:
 1
 2
 3

julia> scitype(ans)
AbstractArray{Count,1}

Similar for float vectors. However:

julia> v=categorical([1,2,missing])
3-element CategoricalArrays.CategoricalArray{Union{Missing, Int64},1,UInt32}:
 1      
 2      
 missing

julia> v[1:2]
2-element CategoricalArrays.CategoricalArray{Union{Missing, Int64},1,UInt32}:
 1
 2

julia> scitype(v[1:2])
AbstractArray{Union{Missing, Multiclass{2}},1}

The first behaviour is "correct" because the scitype of a vector v is supposed to be AbstractVector{U} where U = union_scitype(v), by definition. However, scitype is slow here because scitype could not deduce the correct type from the wrapper, and so fell back to actually computing unions.

The second behaviour is "incorrect". There is an "obvious" fix, but that will make scitype slow for all cases the wrapper has Missing in its type parameter.

I will think about this more.