JuliaAI/ScientificTypes.jl

Scitype inference on dataframes

Closed this issue · 1 comments

The current implementation ignores the column information that can be obtained easily by running eltype on columns of a dataframe.
https://github.com/alan-turing-institute/ScientificTypes.jl/blob/659268dd305444079472d459b5e90cc1df7c458d/src/tables.jl#L5-L12

This is slow as well since it iterates through all entries in each column and won't scale to large dataframes.

Hello @deepaksuresh, I'm not sure I get what you're saying, the eltype is not what we want here (that's what we'd call the "machine type"), rather we want to obtain the scientific type or interpretation of the data.

Also I don't understand why you're saying that it "iterates through all entries and won't scale to large dataframes"? it doesn't look at all entries.

I may have misunderstood your comment though so could you please

  1. clarify what you mean that we should use eltype when we want to obtain the scientific type
  2. subtantiate the performance claim by giving us a MWE?

thanks

PS: see also my comment on the other issue #28