ethereumjs/ethereumjs-blockchain

getHead() on reopened block-containing leveldown blockchain DB returns genesis block

holgerd77 opened this issue · 10 comments

When I manipulate the test code a bit, using a leveldown instance in a created db folder und running this code twice, the console.log(head) in line 27 on both runs returns the genesis head (so both with <Buffer d7 f8 97 4f b5 ac 78 d9 ac 09 9b 9a d5 01 8b ed c2 ce 0a 72 da d1 82 7a 17 09 da 30 58 0f 05 44> for the stateRoot.

@vpulim Do you know what is going on here? Is this a bug? Could you have a look and test this?

@holgerd77 The way this module worked before 3.0.0 was that getHead() returned the “vm” head by default. This “vm” head is only updated when the iterator() function is called. putBlock() updates the headHeader, which is different from the “vm” head. Although I prefer that getHead() return the headHeader or blockHeader by default, I kept it this way to make sure I didn’t break existing code in other modules.

So this isn’t a bug and is working as expected. You can test this by trying your test code against a pre-3.0.0 version of the module.

Hmm, irritating.

So how do I get the current head block atm? Even...

blockchain._initLock.await(function () {
  console.log(blockchain._headHeader)
})

returns the same genesis hash on both times.

@holgerd77 Now, THAT is a bug! Good catch. It turns out I wasn't saving headHeader/headBlock updates to the db after calls to putBlock(). This will explain why _headHeader is always the genesis hash in the _initLock.await() call. I'll submit a new PR today to fix this with some additional test cases.

NOTE: this will still not affect the return value from getHead(), which still uses the "vm" head value that will always remain at genesisHead until iterator() is called.

Since we are introducing new functionality, should we add two new functions to access the heads of the current headerchain and blockchain?

Maybe getHeadHeader() and getHeadBlock()?

In geth, these represent and heads of the latest light sync and latest full sync.

I have to admit that haven't really gotten this concept of this 'vm' heads naming. Is this by definition always the canonical chain?

Then we should really consider to change this "only on iterator call update" semantics. I can't imagine that people rely on this behavior since it doesn't really make that much sense (or does it and I am not seeing the point?), so I think we wouldn't that much risk of breaking something.

Otherwise if we have that old getHead functionality and the new two functions named as proposed this would be confusing if things behave so differently. Generally I would be very much in favor to add something like that.

If 'vm' == 'canonical' (javascript equal operator semantics, hehe), then we might also just rename this to 'canonical', 'vm' seems to not make that much sense here (and is probably also not really in use in the new version since we just released v3.0.0).

i think it's more accurate to think of a "head" as a cursor along the canonical chain.

geth only tracks three heads: the headerchain head (used for light syncs), blockchain head (used for full syncs), and a fast sync head (used to track the latest synced block during fast syncs).

However, the original ethereumjs-blockchain implementaion allowed for the tracking of an unlimited number of heads, each assigned its own name ("vm" is the default name). This functionality is used by ethereumjs-vm to track where it is in the canonical chain during calls to iterator(). Since each time a new VM iterates through the chain, it starts from the genesis block. That's why we maintain a separate field called _headHeader and _headBlock so that they aren't overwritten as VMs iterate through the chain from the beginning.

I agree. This is VERY confusing!

It would be nice to just have the VM maintain its own cursor internally, instead of having the blockchain maintain it. This way, getHead() can be modified to return the actual _headHeader (or _headBlock... whichever makes more sense)

Ok, thanks for the explanation, didn't have the whole picture on that. Then maybe let's do a you proposed with the two new functions and otherwise keep the current functionality until the next breaking release. We can/should additionally make this clear in the (API) docs.

Created a PR to address the issues discussed above: #52