HDFGroup/hsds

Variable Length Strings

ron-kuhn opened this issue · 4 comments

Can someone explain why variable length strings are not supported? Is there a work-around? Is there plans to support in the future? Should I just avoid using variable length strings (i.e. performance reasons?)?

They've been supported for ages. See: https://github.com/HDFGroup/hsds/blob/master/tests/integ/vlen_test.py for example. Do you have some code that you expected to work that doesn't?

Supported in HSDS; NOT supported in REST vol for HDF5. I added the issue to rest vol (HDFGroup/vol-rest#13).

you can close it here

Thanks for opening the issue in vol-rest.
Regarding performance, I haven't seen many benchmarks for HDF5 with variable length types but expect performance will to be slightly slower in HSDS compared with variable length types. For variable types there's an extra step on the client where the data has be be serialized, and then de-serialized server side.

If you know the maximum size of the datatype, one alternative would be to used a fixed size type with compression. Compressors do a good job of squishing the zero-bytes, so there won't be a lot of storage overhead.