Inconsistent behaviour regarding missing
Closed this issue · 0 comments
ablaom commented
julia> v= [1, 2,3, missing]
4-element Array{Union{Missing, Int64},1}:
1
2
3
missing
julia> v[1:3]
3-element Array{Union{Missing, Int64},1}:
1
2
3
julia> scitype(ans)
AbstractArray{Count,1}
Similar for float vectors. However:
julia> v=categorical([1,2,missing])
3-element CategoricalArrays.CategoricalArray{Union{Missing, Int64},1,UInt32}:
1
2
missing
julia> v[1:2]
2-element CategoricalArrays.CategoricalArray{Union{Missing, Int64},1,UInt32}:
1
2
julia> scitype(v[1:2])
AbstractArray{Union{Missing, Multiclass{2}},1}
The first behaviour is "correct" because the scitype of a vector v
is supposed to be AbstractVector{U}
where U = union_scitype(v)
, by definition. However, scitype is slow here because scitype could not deduce the correct type from the wrapper, and so fell back to actually computing unions.
The second behaviour is "incorrect". There is an "obvious" fix, but that will make scitype slow for all cases the wrapper has Missing
in its type parameter.
I will think about this more.