bytewise key-encoding on sub-level causes not found on other level

Question

bytewise key-encoding on sub-level causes not found on other level

MeirionHughes opened this issue 5 years ago · 20 comments

I'm getting odd behavior when using bytewise encoding on a sublevel. The intention is have bytewise encode a number to lex-sortable key, and then have that key appended on the end of the sub-level. It works if I write one value, but the moment I write a second, I can't read keys from other nested subs. Any idea where I'm going wrong? Docs says a sub-level must encode to a buffer and bytewise should be doing just that.

var sub = require('subleveldown')
var memdown = require('memdown')
var levelup = require('levelup');
var encoding = require('encoding-down');
var bytewise = require('bytewise');
var msgpack = require('msgpack-lite');
var {streamToRx} = require('rxjs-stream');

var db = levelup(encoding(memdown()));

var test1= sub(db, "logs", {valueEncoding: "json" });
var test2 = sub(db, "data");
var nested1 = sub(test2, '1234', { keyEncoding: bytewise, valueEncoding: msgpack })

async function main(){
  await test1.put("1234", "FOO");

  console.log("Got: " + await test1.get("1234"));
  console.log("put one..")
  await nested1.put(10,  10);

  console.log("Got: " + await test1.get("1234"));

  await dumpKeys(db);

  await nested1.put(20,  20);
  console.log("put another..")

  await dumpKeys(db);

  console.log(await test1.get("1234"));

  await dumpKeys(db);
}

async function dumpKeys(db){
  console.log("DUMP:")
  await streamToRx(db.createKeyStream()).forEach(x=>console.log(" " + x.toString()));
}

main().catch(console.log);

Console output:

Got: FOO
put one..
Got: FOO
DUMP:
 !logs!1234
 !data!!1234!B@$
put another..
DUMP:
 !logs!1234
 !data!!1234!B@$
 !data!!1234!B@4
NotFoundError: Key not found in database [1234]
    at D:\Code\geo\node_modules\levelup\lib\levelup.js:160:15
    at D:\Code\geo\node_modules\encoding-down\index.js:50:21
    at Immediate.callNext (D:\Code\geo\node_modules\memdown\memdown.js:162:7)
    at runCallback (timers.js:694:18)
    at tryOnImmediate (timers.js:665:5)
    at processImmediate (timers.js:647:5)

deps:

    "bytewise": "^1.1.0",
    "encoding-down": "^6.0.2",
    "leveldown": "^5.1.0",
    "levelup": "^4.0.2",
    "msgpack-lite": "^0.1.26",
    "rxjs": "^6.5.1",
    "rxjs-stream": "^3.0.2",
    "subleveldown": "^4.0.0",

vweevers commented 5 years ago

Yes :)

Answer 1 · 2019-06-15T23:55:16.000Z

No issue if I switch to charwise - so I suspect the mix of string + buffer and its screwing with the ! delimiter checking.

Answer 2 · 2019-06-16T05:45:49.000Z

You're likely hitting a known issue with memdown, when mixing buffers and strings in the same db, see Level/level-ttl#68 (comment) (different context, same underlying issue). Might be fixable by reverting Level/codec#23.

Answer 3 · 2019-06-16T05:48:03.000Z

To confirm it's that, could you try replacing memdown with leveldown?

Answer 4 · 2019-06-16T06:03:48.000Z

Yes - leveldown works.

Answer 5 · 2019-06-16T06:09:22.000Z

Great. So, reverting Level/codec#23 should fix this, the level-ttl issue and a multileveldown issue (Level/community#71). I can't think of any reasons not to revert it. Feel up to making a PR?

Answer 6 · 2019-06-16T06:11:51.000Z

So, you want to revert #23, which was a revert itself?

Answer 7 · 2019-06-16T06:15:58.000Z

okaydoky I'll test if that fixes my unit-test locally then do the pr.

Answer 8 · 2019-06-16T07:03:59.000Z

okay I've cherry-picked the original commit and resolved the linting errors: https://github.com/MeirionHughes/codec/commit/9f6686a390b96200a93103e6672d96569a388a2a

doesn't seem to fix it unfortunately.

I yarn link and yarn link level-codec locally, and added a console.log during the export just to make sure it was being used.

Answer 9 · 2019-06-16T07:10:07.000Z

I'll see if I can make a basic failing test on here

Answer 10 · 2019-06-16T08:19:50.000Z

Okay I've managed to get it to trigger the race condition. At least for me and its consistent: https://github.com/Level/subleveldown/compare/tests/buffer-race-issue

it very odd because you can change subdb(db, 'logs') to subdb(db, 'AAAA') and it works...

Answer 11 · 2019-06-16T08:22:36.000Z

{ keyEncoding: 'buffer' } should be { keyEncoding: 'binary' }

Answer 12 · 2019-06-16T08:31:57.000Z

fixed - still failing. I've also made sure that my local version of level-codec with your suggested change (Level/codec@master...MeirionHughes:master) is defiantly being loaded.

... anyway I can use charwise and keep all my keys as strings for now.

Answer 13 · 2019-06-16T14:30:20.000Z

I had a closer look. Thanks for providing a clean subleveldown test! The problem is that memdown, when comparing two keys, and one key is a buffer while the other key is a string, does not compare them bytewise. Essentially it compares buf[index] to str[index].

Answer 14 · 2019-06-16T14:47:10.000Z

We have a few options:

Support comparing buffers to strings in memdown (bringing its behavior closer to leveldown) (but it's not compatible with IDB-style sort order, which sorts by type first and then content - i.e. string data is never equal to binary data) (bytewise is also IDB-style but sorts string and binary the other way around...)
Have memdown sort by type, then content
Have subleveldown always use buffers internally

I hate encodings right now :)

Answer 15 · 2019-06-16T15:00:22.000Z

I think I prefer 2. Because although it is not the same behavior as leveldown, it is the least surprising and it's fairly reasonable to expect a consumer to be consistent in their usage of key/value types - i.e. don't mix buffers and strings in the same (sub)db.

@ralphtheninja WDYT?

Edit: I forgot I already wrote code for 2! To deal with other surprising cases, all arising from the fact that memdown does not compare by type. For example it considers 'a' and 0 to be equal (because JavaScript attempts to convert the left- and right-hand sides of greater-than and less-than operators to numbers, unless both sides are strings. In the case of 'a' and 0, 'a' becomes NaN, which is never greater or less than a number.)

Answer 16 · 2019-06-16T15:49:08.000Z

Thanks for investigating, on a Sunday no-less! - Perhaps no.1 via an option so that it behaves just like leveldown? Non-breaking, but allows memdown to behave like leveldown during unit tests. I guess if you're code for no.2 already solves it then that is the way to go.

Answer 17 · 2019-06-16T16:10:39.000Z

Perhaps no.1 via an option so that it behaves just like leveldown?

Although it's easy to add a comparator option to memdown to make it sort however you want, IMO the default behavior should cover common use cases like these, and not require configuration.

Somewhere this week, I'll try to clean up and push the code that I have. It will make memdown sort the same as IDB and level-js, and almost the same as bytewise.

Side note: are you using memdown purely to speed up tests? You might like level-test which defaults to leveldown in node, but is quite fast because it uses temporary directories without cleaning them up (leaving that to the OS).

Answer 18 · 2019-06-21T22:42:05.000Z

Opened a PR for level-codec (Level/codec#51, thanks!); memdown discussion is to be continued in Level/memdown#186.

Keeping this open though, as a reminder to test everything together once ready.

Answer 19 · 2019-08-18T10:37:06.000Z

Fixed in memdown@5.0.0.