Ethereum Core Devs Meeting 30 Agenda
Souptacular opened this issue · 43 comments
Ethereum Core Devs Meeting 30 Agenda
Meeting Date/Time: Friday 12/15/17 at 14:00 UTC
Meeting Duration 1.5 hours
YouTube Live Stream Link
Agenda
- Testing Updates.
- Digital cats caused network congestion this month. Meow.
a. Why did this happen and what solutions are available to prevent future network congestion? See comments below for some ideas.
b. Stateless Clients proposal.
c. Would having minimum system requirements to set up an optimal client/full node help?
d. Is the bottleneck is not just disk bandwidth, but specifically sequential disk bandwidth?
e. Vitalik has some ideas around gas cost changes and scalability-relevant client optimizations. - Plans on Quantum-resistant cryptography and any plans to include it in the next update?
- Introduction to K-EVM team (Everett H.)
- Does it remain the case that the Yellow Paper is intended to be Ethereum's formal specification?
Time permitting:
6. Parity stuck ether proposals.
7. POA Testnet unification [Update]
8. Core team updates.
Please provide comments to add or correct agenda topics.
Shall we talk about transaction backlogs?
Just some random thoughts.
- Shouldn't the block gas limit go up at this consistently high transaction load?
-
Is there anything short-term we can do? Like recommending higher gas limits? Is it even safe to recommend higher block gas limits? If yes, what would be a reasonable limit?
-
Is there anything we can do to improve applications like crypto-kitties to use less gas, or anything else to relax the situation? Did anyone look into options yet?
Is there anything short-term we can do?
Just a mid-term raw idea (not perfect, I know):
We could limit gas usage (or increase min gas price specifically for heavy contracts) per contract group (contracts with the same codebase) if network becomes overloaded. A contract deployer can't easily overcome this restriction by delivering many slightly altered contracts with another codebase, because this bunch of different contracts could not be so easily trusted and accepted as the single one. Such "loadbalancing" is in tradeoff to acceptance.
With Cryptokitties making up >10% of all tx's currently (https://ethgasstation.info/gasguzzlers.php), the best medium-term solution may be helping them implement a payment channel mechanism. Uncles / total blocks per day is around 21%, which is not disastrous, but is only aggravated by a gas limit increase. If the gas limit is increased, we'd have to tell everyone to wait for more block confirmations per tx to make sure they get on the right chain.
Crazy idea, but at this point it may be worth looking at increasing the target time per block from ~15s. Users will have to wait for multiple confirmations anyway with the current increasing uncle rate, at least with a higher block time interval we can increase the gas limit with a reduced effect on uncle rate.
@dip239 while I see the first part of the idea, the second part "contracts must be audited" is easly circumvented by adding harmless nonsense functions. And it would be lead to batteries of loadbalancer contracts anyway.
I would like to bring up the Stateless Clients proposal, as I described here: https://medium.com/@akhounov/how-to-speed-up-ethereum-in-the-face-of-crypto-kitties-7a9c901d98e9
I am collecting more data now about how much impact it can make, and what is the overhead, hopefully can present something very briefly
part "contracts must be audited" is easly circumvented by adding harmless nonsense functions
My idea is not perfect, I agree, it is more the way of thinking about the problem: I am just trying to punisch an excessive gas consumption by target contracts instead of gas provisioning by transactions.
Nevertheless my point was, that "loadbalancing" will not working "for free": a careful user needs to trust N contracts instead of single one if their codebase is not identical. Personally I wouldn't trust a bunch of loadbalancing contracts with "almost" the same code: too much to check every single one.
But CryptoKitty players possibly do not care about contracts they trust at all.
Whatever solution we adopt, we can all agree that this is an emergency situation that must be solved short term. With the 'accidental' success of CryptoKitties, we can assume there are a bunch of developers coding Ethereum dApps right now as I write this message, so this transaction backlog will only get worse from now.
Thought more:
... there should be some "central contract" behind the "loadbalancer", coordinating the whole application. We could sum all gas burned in all transactions going through this "central contract" in some time frame (TxGasBurningRate for this contract). If the network is currently overloaded AND some contract is involved into excessive gas burning, all transactions going throw it should be deincentivized by higher gas price.
further discussion is moved to ethereum/research
Might be missing something obvious here: Why do we have a static blocktime target, variable gas limits, and (a more abstract) acceptable uncle rate (which is actually variable). Why isn't the blocktime target also variable in order to target a more well defined/specified uncle rate target? (or uncle/time rate to keep it fair for miners)
Why isn't the blocktime target also variable in order to target a more well defined/specified uncle rate target?
The blocktime target is flexible as of Byzantium, to keep total rewards roughly constant. See it rising slightly here: https://etherscan.io/chart/blocktime
I personally oppose further blocktime increases. The contribution of the fast blocktime to the total uncle rate is relatively small, and furthermore it's ADDITIVE, not multiplicative, with contribution to uncle rate from capacity. That is:
uncle_rate ~= k1 / blocktime + k2 * gas_per_sec
This is confirmed with bitcoin in Decker and Wattenhofer's 2013 paper, and experience suggests the same is true with ethereum. Right now it's the second term in the sum that is the problem, not the first.
IMO we should consider a few optimizations:
- Do another round of increasing gas costs on account-accessing opcodes (BALANCE, EXTCODESIZE, etc), and SLOAD, as that's still our major weak point from the PoV of DoS resistance. I'd recommend SLOAD -> 320, BALANCE -> 800, EXTCODESIZE/CALL/CALLCODE/DELEGATECALL/.... -> 1200. But we should add an exception, that self-calls and calls to precompiles cost only 100.
- Some variant of ethereum/EIPs#168 and ethereum/EIPs#169 to alleviate state size growth
- Increase the cost of sending a tx by 30000 if it goes to a currently empty account
I also totally support the idea of stateless clients. Right now it actually already is possible to implement without any core protocol changes, as long as miners are stateful. There's also the possibility of a "stateless partially full node" - be a light node by default, but fully (statelessly) verify specific blocks if a trusted server tells you that they're invalid. This gives the security model that you won't accept an invalid block unless BOTH (i) there is an active 51% attack, and (ii) all trusted servers you're connecting to are colluding.
Also, it would make sense to have a much more coordinated benchmarking effort, so we can see what opcodes are currently the slowest, and what can be done to improve their execution speed.
Finally, we should have a poll on where we are at for key scalability-relevant client optimizations. This includes:
- Garbage collection
- On-disk state caching
- State tree pruning
- Network compression
- Database optimization
I would like to hear about the Ethereum team's plans on Quantum-resistant cryptography and any plans to include it in the next update?
Hallo, I would like the foundation to recommend the minimum system requirements to set up an optimal client/full node. This is probably a basic step to mitigate a bit the uncle rate problem, it seem that the hard drive is one of the most important bottleneck given the high number of I/O calls to the database.
https://medium.com/@akhounov/how-to-speed-up-ethereum-in-the-face-of-crypto-kitties-7a9c901d98e9
I would like to hear about the Ethereum team's plans on Quantum-resistant cryptography and any plans to include it in the next update?
Properly incorporating this requires account abstraction, which is going into the sharding spec; I don't think there is yet consensus on how/when it's going into the main chain. Abstraction will also be available for Casper validators.
See my comment on stateless client numbers here:
I do have a question that I'd like to hear answered as well as possible.
It seems to me that the bottleneck is not just disk bandwidth, but specifically sequential disk bandwidth. That is, for example, if we somehow magically knew ahead of time what state tree nodes need to be accessed, and we could make the accesses happen in parallel, then processing speed could be increased greatly.
First, is this true? That is, is it the case that loading 1000 specific state trie keys from the DB in parallel is much faster than doing it sequentially? Second, if so, how much faster?
If there are substantial gains to be made, then there are clever things we can do, like requiring miners to provide a witness specifying what accounts and storage keys get accessed in the block, and additionally it means that there are potential great scalability gains in EIP 648.
@vbuterin Thanks a lot of the answers! I am still trying to do the full mode sync of geth, and now I hit the road block because my SSD is only 500Gb and doing it on HDD is simply too slow, so I am currently stuck around block 4.5m - 9th of November 2017 :). That is why I am trying to optimise geth a bit. But I have managed to compute the sizes of the witness for the blocks around DoS attacks in September 2016. Very often, the witness would be like 37Mb. I have not analysed yet why.
Regarding your second question about parallel reads from the DB, I also thought about it and I looked at how exactly geth (and parity too) organises the accounts and their storage - I will prepare a blog post on that, because it also explain how I calculated the witness size.
I also looked at LevelDB implementation that geth uses to see if there is any gain from concurrent reads. I doubt there is. Because of the way the data is stored, there is no locality, and data even from the nearby trie branches are randomly scattered across the whole database. So reading them in parallel would require loading more LevelDB blocks into memory and seeking them.
@Souptacular About KEVM, Everett and some of his colleagues will be joining the call. So give me an agenda item: "introduction KEVM (Everett)". It would fit nicely before the YP discussion.
@vbuterin Actually I take it back - I think there will be improvement in trying to access trie nodes in parallel. Because currently lots of time is spent in navigating down the trie, reading lower level only after the higher one. And that exacerbate the high latency of HDD/SDD. I will definite try that.
Another thing we could do is only include parts of the keys in the "witness hint", lets say, only first 8 bytes instead of all 32, and use non-exact seek operation to read from DB. I will look into that too.
@pirapira That is great! I have read KEVM paper after DevCon3 and will be curious to hear the discussion
Im just following the discussion regarding data storage - I highly recommend the embedded db https://github.com/dgraph-io/badger which is a RocksDB implementation in pure Go. It's very robust, tested, and supports concurrent reads, ACID transactions, batching and snapshots. The original RocksDB btw is a fork of LevelDB by Facebook with more concurrency features/tuning - so I expect the work necessary to replace geth's existing use of github.com/syndtr/goleveldb/leveldb to badger will be quite minimal. The benefits: more performance, no more CGO for the db (leaks? call penalty?), and maybe disk space too depending if there is any data compaction in geth's db (to release old unused space from deleted/changed entries), or opportunities for compression.
@pkieltyka I have encountered BadgerDB yesterday and it looks interesting. Another thing to try, thanks!
Hey @pkieltyka just FYI, we have massive issues with RocksDB and are currently in the process of replacing it in Parity. openethereum/parity-ethereum#6280
@5chdn I wasn't suggesting to use RocksDB, I suggested to evaluate Badger, an alternative implementation of a LSM in pure Go, inspired by RocksDB. I don't think that issue applies here.
I just synced geth, parity and harmony over the last few days to see how they are handling the load.
Here is my feedback. I ran this on Ubuntu 17.10, with a 512GB SSD with 16 GB RAM; in all three clients I used the appropriate setting to set the cache size to 6 GB.
- Parity - the warp sync feature failed outright (never even once downloaded a single chunk), and the client did a full sync. This finished after ~2.5 days (not constant online, there were a few offline periods). The processing speed was sometimes ~25-40 mgas/s, and sometimes ~5 mgas/s (see openethereum/parity-ethereum#7258). Storage size is 41 GB.
- Geth - the client randomly crashed the first couple of times I ran it, and then the third time it managed to download all the block receipts/headers and concentrated on downloading the state, and that time it worked. Took ~8 hours, with a total of ~50 million state objects. When processing blocks, the speed is sometimes ~20-30 mgas/s, and sometimes ~3-6 mgas/s. Storage size is 47 GB.
- Harmony - the client successfully did the fast sync, in ~8 hours, with a total of ~60 million state objects (maybe harmony counts contract code as a state object and geth doesn't, or something similar? not sure what is causing the disagreement; both times it synced around block 4.7m). When processing blocks, the speed is sometimes ~20-30 mgas/s, and sometimes ~3-8 mgas/s. Storage size is 25 GB.
Thoughts at first glance:
- We should really look into DB optimization
- All clients should bump up the default cache sizes
- We need to fast sync work more reliably, and particuarly make it not lose progress if the user closes the client halfway through the fast sync process
Yes, I managed to do the fast sync too. But not the "full" sync mode. Never mind, I have now ordered 4TB SSD, should arrive in a couple of days :)
@vbuterin yes, the warp issue is a well-known annoyance. openethereum/parity-ethereum#6372
I have now ordered 4TB SSD
@AlexeyAkhunov
Oh damn yeah. Need that too. Any model recommentation?
@vbuterin I have not tried Harmony, but I have similar experience with geth and parity.
One other thing that would be really really nice, but probably quite difficult to achieve, is make it possible to do a sync in an HDD. I have tried to do mainnet syncs in HDDs many times. Fast/warp works fine (after many many retries), but after finishing it an HDD just can't keep up with the network with neither parity nor geth.
Any model recommentation?
I chose Samsung 850 EVO, but cannot recommend it until I use myself :)
make it possible to do a sync in an HDD
I am trying to hack together a version of geth that can do that. That is what I have spent most of my time last few days...
Otherwise we would have lost the ability to run full nodes without SSD
EIP648 (Easy parallelizability) was brought up on reddit and there was some hope there might be some discussion of it during the dev meeting. Where/If it fits into the roadmap would be great to hear.
I would like to add some more observations :
1- It seems to me that the uncle rate is partially related to reaching the block gas limit and to growth of the mempool size. So probably raising carefully the block gas limit could lower a bit the uncle rate. (in the short term)
Question is: How much does it cost in terms of computational stress (time to manage and broadcast a block) the mempool management when the the gas limit is reached ? Is it something to do in this specific area?
2- the actual uncle rate is high (about 26%) but lower than the 33% we reach a couple of weeks ago when the gas limit was 6.7 mil. (now is about 8 mil.).
@pkieltyka yes have been looking into badger, and done some experiments. Orignally, I think a major blocker was that badger did panic
on every fault, instead of surfacing errors. IIUC, that's been changed now, and we've done some more experiments. @fjl knows more, here's the first experiment from May this year: https://github.com/fjl/go-ethereum/tree/badger-exp
Badger works, but it's not a lot faster than leveldb. The other thing to keep in mind is that badgers approach (keeping keyspace separate from value space) is only beneficial on SSDs.
@fjl Badger has iterated a lot since 7months ago when you made your badger-exp branch. It’s prob worth upgrading the dep and trying again. True it is optimized for SSDs but worth benching as well on an HDD if that’s an important requirement.
Just to leave it here. There are 3 things I am trying to do with geth to create an optimised version (that will also hopefully work with HDD):
-
Disable background miner unless the miner is enabled (currently it is still running)
-
REMOVED
While processing blocks, not to write state to disk in the middle of the block, even for pre-Byzantium blocks. Currently, whenever state.IntermediateRoot() is called, it forces disk write of the trie. At the end of each block, there is a batched write to the disk, only that one should be performed. -
Write/read state to/from disk as key-value pairs directly as well as trie-structure, which will require order of magnitude fewer reads
Geth does not write to disk during transaction processing, it keeps the trie in memory. Only statedb.Commit
writes to disk, called once per block.
@karalabe Yes, you are right, of course. I am removing number 2
But we should add an exception, that self-calls and calls to precompiles cost only 100.
I like the idea of different cost for self-calling, because there is no cost for loading the code again, but the rate compared to the "regular" call should be determined more carefully since it still needs to handle state changes (applying or rolling back depending on call outcome).
I'd be against subsidising precompiles even more.
@Souptacular @vbuterin I can do experiments with parallel SSD reads if you want (point 2e)
@AlexeyAkhunov That would be great! Thanks.
Notes from this call are here https://www.reddit.com/r/ethereum/comments/7khro1/notes_from_ethereum_core_devs_meeting_29_120117/.
Can anyone give 1 reason why block gas limits need to be so low? Shouldn't "mining" be 90% useful and only minimal wastage of electricity. I mean gigahashing is the dumbest way to heat this already scorching planet to oblivion ever invented (bless you Satoshi), so a higher block gas limit must surely only help reduce power usage at the expense of miners (screw them anyhow).
I understand that running the transactions likely takes between 0.1 and 2 seconds out of the 17 second block time for Ethereum mainnet. Block gas limit should be targeting somewhere between 50% and 75% time utilisation (via reduced difficulty in the protocol, and via higher block gas limits, and via lower block rewards for miners etc). THAT would be mining.
Uncle rate.
@tomachinz ethereum is actually having issue to scale on chain because of the time nedeed to validate a new block. The problem seems to be the heavy load of I/O on the hdd/ssd . Geth 1.8.0 could eventually mitigate this issue with a far better DB setup, there are other projects on their way such as TurboGeth ( @AlexeyAkhunov ).
We will see (when there will be another tx demand peak) if geth 1.8.0 (and/or other clients) allow more block gas limit.
Anyway sooner or later ethereum will hit some bandwidth limit (to maintain decentralization) that imho is much hard to solve (and probably there is some bandwidth problem right now).
Uncle rate is a good measure of the difficulty of ethereum to remain decentralized with increasing use of network resources.
"Block gas limit should be targeting somewhere between 50% and 75% time utilization (via reduced difficulty in the protocol, and via higher block gas limits, and via lower block rewards for miners etc). "
So you're proposing to miners they would perform more work for less revenue? Are you personally okay to do work more for less money?
This is very good for Ethereum to be heavily used. But on top of bringing viable answers, the solutions to scale up need also to be decentralized. It is much easier to bring fast and simple solutions that are paving the way for a centralized system. A decent ETH node already requires a 32GB machine and 512GB SSD. So for the scaling, please be cautious about the decentralization. The fastest path is to build a centralized database controlled and ran by a few. This is not what to expect from this project.