pmem/pmemkv-nodejs

Check and enforce string encoding

robfromboulder opened this issue · 3 comments

Currently KVTree.get assumes that all persisted strings are UTF-8, leaving several obvious gaps:

  • Cases where libraries or applications use other string encodings
  • KVTree.put does not validate the encoding of incoming strings
  • Currently missing configuration for what encoding KVTree should require

Should follow same guidelines as pmem/pmemkv-ruby#3

Not sure if we should handle the string encoding at all. Perhaps, KVTree should not care about the encoding and treat the keys as binary data (not strings)?

Hi @krzycz, the issue is that the JS bindings are based on JS strings, which are binary-safe, but still require use of the correct encoding when converting between raw bytes to interned string objects. Most of the time (especially EC6) these JS strings will be UTF-8 encoded, but they don't always have to be, and we probably have to handle those other cases gracefully. (BTW, this is the same issue for Ruby and Java bindings for converting between raw bytes and real String objects)

Whether or not we allow binary-safe keys at the pmemkv layer is a slightly different issue -- JS strings are binary-safe regardless of what pmemkv does.

Anyway, hope this helps better frame the problem!
RobD

Closing, now obsolete