tensorwerk/stockroom

[FEATURE REQUEST] Automatic creation of Columns

hhsecond opened this issue · 2 comments

Describe the feature
As part of making the APIs easier, it might be a good idea to infer the type of data on the first assignment and create hangar columns if it doesn't exist. Few thoughts to make it a difficult decision:

  • The implicit creation of columns might make the user think that they could add data with different shape and different dtype
  • Inferring whether the columns should be variable_shaped might not be possible.

Thoughts?

The variable_shape argument would be difficult to inter, but I think the bigger issue would be figuring out which column layout to use from data samples alone..

also does stockroom allow users to specify kwargs for the backend / backend_options parameters? hangars heuristics are really dumb... automatic methods to optimize this is part of hangar enterprise, but even with a reasonable set of options for a dataset, the final choice requires you to know a bit about the users environment and where they fall on the compression tradeoff (time vs space) scale. Neither hangar nor stockroom can handle this in full right now...

If you were able to infer basic info though, you'll need to consider how a user would correct a column definition if the heuristics were wrong?

No, if the user is expert enough to configure backend_options, he/she can rely on the hangar for that.

So the idea here, the user always has the choice to use hangar CLI to create the columns (and we'll make sure the user understand this through the document) but if they just need the stock to act as a dictionary and not worry about anything else, that's when heuristics are going to help. Does that sound reasonable to you?

About inferring the layout, I think the currently available layouts are inferrable. Isn't that true?. About the time series layout (and any future layouts), I am not sure.