bytewise key-encoding on sub-level causes not found on other level
MeirionHughes opened this issue · 20 comments
I'm getting odd behavior when using bytewise encoding on a sublevel. The intention is have bytewise encode a number to lex-sortable key, and then have that key appended on the end of the sub-level. It works if I write one value, but the moment I write a second, I can't read keys from other nested subs. Any idea where I'm going wrong? Docs says a sub-level must encode to a buffer and bytewise
should be doing just that.
var sub = require('subleveldown')
var memdown = require('memdown')
var levelup = require('levelup');
var encoding = require('encoding-down');
var bytewise = require('bytewise');
var msgpack = require('msgpack-lite');
var {streamToRx} = require('rxjs-stream');
var db = levelup(encoding(memdown()));
var test1= sub(db, "logs", {valueEncoding: "json" });
var test2 = sub(db, "data");
var nested1 = sub(test2, '1234', { keyEncoding: bytewise, valueEncoding: msgpack })
async function main(){
await test1.put("1234", "FOO");
console.log("Got: " + await test1.get("1234"));
console.log("put one..")
await nested1.put(10, 10);
console.log("Got: " + await test1.get("1234"));
await dumpKeys(db);
await nested1.put(20, 20);
console.log("put another..")
await dumpKeys(db);
console.log(await test1.get("1234"));
await dumpKeys(db);
}
async function dumpKeys(db){
console.log("DUMP:")
await streamToRx(db.createKeyStream()).forEach(x=>console.log(" " + x.toString()));
}
main().catch(console.log);
Console output:
Got: FOO
put one..
Got: FOO
DUMP:
!logs!1234
!data!!1234!B@$
put another..
DUMP:
!logs!1234
!data!!1234!B@$
!data!!1234!B@4
NotFoundError: Key not found in database [1234]
at D:\Code\geo\node_modules\levelup\lib\levelup.js:160:15
at D:\Code\geo\node_modules\encoding-down\index.js:50:21
at Immediate.callNext (D:\Code\geo\node_modules\memdown\memdown.js:162:7)
at runCallback (timers.js:694:18)
at tryOnImmediate (timers.js:665:5)
at processImmediate (timers.js:647:5)
deps:
"bytewise": "^1.1.0",
"encoding-down": "^6.0.2",
"leveldown": "^5.1.0",
"levelup": "^4.0.2",
"msgpack-lite": "^0.1.26",
"rxjs": "^6.5.1",
"rxjs-stream": "^3.0.2",
"subleveldown": "^4.0.0",
No issue if I switch to charwise
- so I suspect the mix of string + buffer and its screwing with the ! delimiter checking.
You're likely hitting a known issue with memdown
, when mixing buffers and strings in the same db, see Level/level-ttl#68 (comment) (different context, same underlying issue). Might be fixable by reverting Level/codec#23.
To confirm it's that, could you try replacing memdown
with leveldown
?
Yes - leveldown works.
Great. So, reverting Level/codec#23 should fix this, the level-ttl
issue and a multileveldown
issue (Level/community#71). I can't think of any reasons not to revert it. Feel up to making a PR?
So, you want to revert #23, which was a revert itself?
Yes :)
okaydoky I'll test if that fixes my unit-test locally then do the pr.
okay I've cherry-picked the original commit and resolved the linting errors: https://github.com/MeirionHughes/codec/commit/9f6686a390b96200a93103e6672d96569a388a2a
doesn't seem to fix it unfortunately.
I yarn link
and yarn link level-codec
locally, and added a console.log during the export just to make sure it was being used.
I'll see if I can make a basic failing test on here
Okay I've managed to get it to trigger the race condition. At least for me and its consistent: https://github.com/Level/subleveldown/compare/tests/buffer-race-issue
it very odd because you can change subdb(db, 'logs')
to subdb(db, 'AAAA')
and it works...
{ keyEncoding: 'buffer' }
should be { keyEncoding: 'binary' }
fixed - still failing. I've also made sure that my local version of level-codec with your suggested change (Level/codec@master...MeirionHughes:master) is defiantly being loaded.
... anyway I can use charwise and keep all my keys as strings for now.
I had a closer look. Thanks for providing a clean subleveldown
test! The problem is that memdown
, when comparing two keys, and one key is a buffer while the other key is a string, does not compare them bytewise. Essentially it compares buf[index]
to str[index]
.
We have a few options:
- Support comparing buffers to strings in
memdown
(bringing its behavior closer toleveldown
) (but it's not compatible with IDB-style sort order, which sorts by type first and then content - i.e. string data is never equal to binary data) (bytewise
is also IDB-style but sorts string and binary the other way around...) - Have
memdown
sort by type, then content - Have
subleveldown
always use buffers internally
I hate encodings right now :)
I think I prefer 2. Because although it is not the same behavior as leveldown
, it is the least surprising and it's fairly reasonable to expect a consumer to be consistent in their usage of key/value types - i.e. don't mix buffers and strings in the same (sub)db.
@ralphtheninja WDYT?
Edit: I forgot I already wrote code for 2! To deal with other surprising cases, all arising from the fact that memdown
does not compare by type. For example it considers 'a'
and 0
to be equal (because JavaScript attempts to convert the left- and right-hand sides of greater-than and less-than operators to numbers, unless both sides are strings. In the case of 'a'
and 0
, 'a'
becomes NaN
, which is never greater or less than a number.)
Thanks for investigating, on a Sunday no-less! - Perhaps no.1 via an option so that it behaves just like leveldown? Non-breaking, but allows memdown to behave like leveldown during unit tests. I guess if you're code for no.2 already solves it then that is the way to go.
Perhaps no.1 via an option so that it behaves just like leveldown?
Although it's easy to add a comparator
option to memdown
to make it sort however you want, IMO the default behavior should cover common use cases like these, and not require configuration.
Somewhere this week, I'll try to clean up and push the code that I have. It will make memdown
sort the same as IDB and level-js
, and almost the same as bytewise
.
Side note: are you using memdown
purely to speed up tests? You might like level-test
which defaults to leveldown
in node, but is quite fast because it uses temporary directories without cleaning them up (leaving that to the OS).
Opened a PR for level-codec
(Level/codec#51, thanks!); memdown
discussion is to be continued in Level/memdown#186.
Keeping this open though, as a reminder to test everything together once ready.
Fixed in memdown@5.0.0
.