Internal storage
gforge opened this issue · 8 comments
I think that it would be beneficial to use Torch's tensors as internal storage instead of tables. This would allow us to:
- a more efficient storage (the tables flexibility probably costs)
- reduce the risk of conversion issues in the
to_tensor()
- separate float/double from integers which would be beneficial in the output functions.
The API changes would probably mostly affect get_column()
where as_tensor
should default to true. This isn't something that I plan to pursue at the moment but I figure that I'd add this as this could be worth-while considering.
It is a good idea.
Do you mean we would have string and integers in a table and double and float in a (or multiple) tensor ? Or you suggest to use CharStorage
to store our string in tensors ?
I was thinking using int
or long
for integers and float
/double
for floats (see types) while keeping strings in tables. I haven't looked at CharStorage
but it could be an interesting option.
We will see CharStorage
for an further enhancement. Right now your suggestion is great ;)
One important thing that I changed was the _infer_schema
. You previously checked a proportion of the rows for row type. This is problematic with small datasets such as our test datasets and also checking 1000 rows should be rather cheap so I changed the code to:
function Dataframe:_infer_schema(max_rows)
rows_to_explore = math.min(max_rows or 1e3, self.n_rows)
With the integer functionality we should probably change 'number' to 'integer' and 'float'. The concept is that the column schema goes from integer --> float --> string
Just got a hint that may solve the string storage issue: https://github.com/torch/tds
Looks great ! Furthermore it would allow more complex string operations in the future :)
Feature implemented and will be merged into develop once the doc script is updated & a working update to 1.6 is added. Until then it's in the feature branch internat_storage.
A few issues seem to remain:
- Non-luajit fails in Travis
- get_mode fails with categorical columns