Use tests in `BlockchainTests`
Closed this issue ยท 14 comments
We currently use the tests in the GeneralStateTests folder of the Ethereum tests repo.
Each such test seems to be replicated in the BlockchainTests/GeneralStateTests folder, in a different format. This format includes useful information like the roots of the receipts and transactions tries, the Bloom filter, and the withdrawals root.
It would be helpful to modify the runner to use these tests instead, so that we can verify these data. An immediate application would be to verify the receipts trie root produced by 0xPolygonZero/plonky2#1097.
Hmm yeah I agree. The tests under BlockchainTests
are the expanded versions of the root GeneralStateTests
. I remember we deliberated a bit on which to parse when we wrote this initially, but I don't remember why we chose to parse this format instead. As far as I can tell, no information is lost through the expansion, so we should be fine to parse this instead.
I think one difference is the absence of variant cases on the BlockchainTests side. For example comparing the Shanghai version for BlockchainTests/Call1MB1024Calldepth.json vs GeneralStateTests/Call1MB1024Calldepth.json, there's only 1 txn vs 2 (the second one is the same).
Hmm yeah. I think what actually ends up happening during the test expansion is each of these variants gets expanded into separate "test sections" (for lack of a better word). So like in this case, the two txns are there, but they have their own sections Call1MB1024Calldepth_d0g0v0_Shanghai
& "Call1MB1024Calldepth_d0g1v0_Berlin"
, where g*
corresponds to which gas price the entry is using from here:
"Shanghai" : [
{
"hash" : "0x5de8ccc907a2cf4b21c0ee18f6ccd171f7b6a517b149cc7fb5398ec40d619e0f",
"indexes" : {
"data" : 0,
"gas" : 1,
"value" : 0
},
"logs" : "0x1dcc4de8dec75d7aab85b567b6ccd41ad312451b948a7413f0a142fd40d49347",
"txbytes" : "0xf861800a840ee6b28094bbbf5374fce5edbc8e2a8697c15331677e6ebf0b0a801ba0462186579a4be0ad8a63224059a11693b4c0684b9939f6c2394d1fbe045275f2a059d73f99e037295a5f8c0e656acdb5c8b9acd28ec73c320c277df61f2e2d54f9"
},
{
"hash" : "0x485947362d7a26bfe59efa9a2581036f48cbdb09f9b704130ecd0dc2bf969ec7",
"indexes" : {
"data" : 0,
"gas" : 0,
"value" : 0
},
"logs" : "0x1dcc4de8dec75d7aab85b567b6ccd41ad312451b948a7413f0a142fd40d49347",
"txbytes" : "0xf860800a830249f094bbbf5374fce5edbc8e2a8697c15331677e6ebf0b0a801ba06843bbbe573744c46cf47f1c126c11a53cffa1bcc3eeb0e0d328e6a73ecc5447a02062c2fbb34bfea606f3cf268b5a94a81d68e748d4d896c514061c03e8acb774"
}
]
Ah good point, I had missed this! All good then ๐
In addition to the added support for the receipt trie, the latest PRs have also added a few fields to the block_metadata
, namely
txn_number_before
gas_used_before
gas_used_after
block_bloom_before
block_bloom_after
some of which seem kind of cumbersome to parse ahead of proving
Thanks for the headsup @Nashtare.
Sorry that I haven't finished this yet. I have something in-progress, but I keep getting pulled to work on other tasks. I'm aiming to have this PR done by the end of tomorrow.
And actually another topic regarding recent updates on plonky2 side. Now that we observe public values in the challenger, we can't process tests that have a block_gaslimit
not fitting in 32 bits (before we could, as long as they were not spawning a new context which would have caused an overflow when exiting the kernel).
I know that @wborgeaud wasn't inclined in supporting 2-limb gaslimit as kinda needless in practice, though that means there would be now many tests that aren't supported natively (we could manually alter their block_gaslimit
but for most of those it seems setting it to u32::MAX
would result in an OOG error), and this isn't ideal to ensure proper coverage of all edge-cases.
Do you have any opinion on the matter?
Hmm... So I'm a bit naive when it comes to a lot of the plonky2 internals, but maybe could we clamp the gas limit in these cases to u32::MAX
? I don't know if this ends up making some tests sort of miss the point of what they are trying to test, but if most of them never hit the gas limit (and clamping the gas limit doesn't affect any other output like hashes or whatever), then maybe this is an option.
Hmm, we could test on a case-by-case basis but for the few of which I actually tried setting this to u32::MAX
, it seemed they actually needed to go over that limit, as I wrote above
we could manually alter their block_gaslimit but for most of those it seems setting it to u32::MAX would result in an OOG error
hence I couldn't find a really satisfactory solution.
@wborgeaud I was wondering if your opinion had changed on the matter:
Now that we observe public values in the challenger, we can't process tests that have a block_gaslimit not fitting in 32 bits (before we could, as long as they were not spawning a new context which would have caused an overflow when exiting the kernel).
I know that @wborgeaud wasn't inclined in supporting 2-limb gaslimit as kinda needless in practice, though that means there would be now many tests that aren't supported natively.
I am currently working on updating the runner to latest plonky2, but the fact that so many tests are uncheckable (altering block_gaslimit leads to failure anyway in most cases) kind of concerns me, especially with all the heavy changes that happened on the zkEVM the past few months.
AFAIK, the overhead for having a 2-limb gaslimit would be light (we'd only have an additional Target
to connect to the PublicValues
but this is fairly negligible). Did you have other reserves about this?
Yeah sounds good, I agree the performance overhead is pretty negligible.
So let's change it to 2 limbs with a comment mentioning this is necessary for testing.
Should we change block_gas_used
also?
Out of the 18567 test I can parse so far (some are still TODO), I'm having:
- 3940 unprovable because of
gas_used
overflowingu32
, - out of the 14627 remaining ones, 3442 unprovable because of
gaslimit
.
Yes sounds good too. Just one question, do you have an idea if most of these tests will run in a reasonable time? Like I just want to make sure these tests are not stupidly large and will make the runner run out of memory anyways.
Most of them are runnable on non-beefy machines yeah. They are already run with the current instance, we just "Ignore" them if the prover fails with GasLimit error.