ethereum/EIPs

Clique PoA protocol & Rinkeby PoA testnet

karalabe opened this issue Β· 121 comments

Changelog:

  • Apr 4, 2017:
    • Mention the cascading proposal-execution corner case and its avoidance.
  • Mar 14, 2017:
    • Expanded the Clique block authorization section, added a strategy proposal.
    • Expanded the Clique signer voting section, added a strategy proposal.
  • Mar 13, 2017:
    • Polished up the constants in the Clique consensus protocol spec.
    • Added the two difficulty values and described in-turn/out-of-turn signing.
  • Mar 11, 2017:
    • Added initial technical specs for the Clique PoA consensus protocol.
    • Added checkpointing to reset votes and embed the list of signers into epoch headers.
    • Reintroduced authorized signer vanity extra-data as a fixed 32 byte allowance.
  • Mar 6, 2017
    • First proposal of the Rinkeby testnet and its PoA implementation ideas.

Clique proof-of-authority consensus protocol

Note, for the background and rationale behind the proposed proof-of-authority consensus protocol, please read the sections after this technical specification. I've placed this on top to have an easy to find reference for implementers without having to dig through the discussions.

We define the following constants:

  • EPOCH_LENGTH: Number of blocks after which to checkpoint and reset the pending votes.
    • Suggested 30000 for the testnet to remain analogous to the mainnet ethash epoch.
  • BLOCK_PERIOD: Minimum difference between two consecutive block's timestamps.
    • Suggested 15s for the testnet to remain analogous to the mainnet ethash target.
  • EXTRA_VANITY: Fixed number of extra-data prefix bytes reserved for signer vanity.
    • Suggested 32 bytes to retain the current extra-data allowance and/or use.
  • EXTRA_SEAL: Fixed number of extra-data suffix bytes reserved for signer seal.
    • 65 bytes fixed as signatures are based on the standard secp256k1 curve.
  • NONCE_AUTH: Magic nonce number 0xffffffffffffffff to vote on adding a new signer.
  • NONCE_DROP: Magic nonce number 0x0000000000000000 to vote on removing a signer.
  • UNCLE_HASH: Always Keccak256(RLP([])) as uncles are meaningless outside of PoW.
  • DIFF_NOTURN: Block score (difficulty) for blocks containing out-of-turn signatures.
    • Suggested 1 since it just needs to be an arbitrary baseline constant.
  • DIFF_INTURN: Block score (difficulty) for blocks containing in-turn signatures.
    • Suggested 2 to show a slight preference over out-of-turn signatures.

We also define the following per-block constants:

  • BLOCK_NUMBER: Block height in the chain, where the height of the genesis is block 0.
  • SIGNER_COUNT: Number of authorized signers valid at a particular instance in the chain.
  • SIGNER_INDEX: Index of the block signer in the sorted list of current authorized signers.
  • SIGNER_LIMIT: Number of consecutive blocks out of which a signer may only sign one.
    • Must be floor(SIGNER_COUNT / 2) + 1 to enforce majority consensus on a chain.

We repurpose the ethash header fields as follows:

  • beneficiary: Address to propose modifying the list of authorized signers with.
    • Should be filled with zeroes normally, modified only while voting.
    • Arbitrary values are permitted nonetheless (even meaningless ones such as voting out non signers) to avoid extra complexity in implementations around voting mechanics.
    • Must be filled with zeroes on checkpoint (i.e. epoch transition) blocks.
  • nonce: Signer proposal regarding the account defined by the beneficiary field.
    • Should be NONCE_DROP to propose deauthorizing beneficiary as a existing signer.
    • Should be NONCE_AUTH to propose authorizing beneficiary as a new signer.
    • Must be filled with zeroes on checkpoint (i.e. epoch transition) blocks.
    • Must not take up any other value apart from the two above (for now).
  • extraData: Combined field for signer vanity, checkpointing and signer signatures.
    • First EXTRA_VANITY bytes (fixed) may contain arbitrary signer vanity data.
    • Last EXTRA_SEAL bytes (fixed) is the signer's signature sealing the header.
    • Checkpoint blocks must contain a list of signers (N*20 bytes) in between, omitted otherwise.
    • The list of signers in checkpoint block extra-data sections must be sorted in ascending order.
  • mixHash: Reserved for fork protection logic, similar to the extra-data during the DAO.
    • Must be filled with zeroes during normal operation.
  • ommersHash: Must be UNCLE_HASH as uncles are meaningless outside of PoW.
  • timestamp: Must be at least the parent timestamp + BLOCK_PERIOD.
  • difficulty: Contains the standalone score of the block to derive the quality of a chain.
    • Must be DIFF_NOTURN if BLOCK_NUMBER % SIGNER_COUNT != SIGNER_INDEX
    • Must be DIFF_INTURN if BLOCK_NUMBER % SIGNER_COUNT == SIGNER_INDEX

Authorizing a block

To authorize a block for the network, the signer needs to sign the block's hash containing everything except the signature itself. The means that the hash contains every field of the header (nonce and mixDigest included), and also the extraData with the exception of the 65 byte signature suffix. The fields are hashed in the order of their definition in the yellow paper.

This hash is signed using the standard secp256k1 curve, and the resulting 65 byte signature (R, S, V, where V is 0 or 1) is embedded into the extraData as the trailing 65 byte suffix.

To ensure malicious signers (loss of signing key) cannot wreck havoc in the network, each singer is allowed to sign maximum one out of SIGNER_LIMIT consecutive blocks. The order is not fixed, but in-turn signing weighs more (DIFF_INTURN) than out of turn one (DIFF_NOTURN).

Authorization strategies

As long as signers conform to the above specs, they can authorize and distribute blocks as they see fit. The following suggested strategy will however reduce network traffic and small forks, so it's a suggested feature:

  • If a signer is allowed to sign a block (is on the authorized list and didn't sign recently).
    • Calculate the optimal signing time of the next block (parent + BLOCK_PERIOD).
    • If the signer is in-turn, wait for the exact time to arrive, sign and broadcast immediately.
    • If the signer is out-of-turn, delay signing by rand(SIGNER_COUNT * 500ms).

This small strategy will ensure that the in-turn signer (who's block weighs more) has a slight advantage to sign and propagate versus the out-of-turn signers. Also the scheme allows a bit of scale with the increase of the number of signers.

Voting on signers

Every epoch transition (genesis block included) acts as a stateless checkpoint, from which capable clients should be able to sync without requiring any previous state. This means epoch headers must not contain votes, all non settled votes are discarded, and tallying starts from scratch.

For all non-epoch transition blocks:

  • Signers may cast one vote per own block to propose a change to the authorization list.
  • Only the latest proposal per target beneficiary is kept from a single signer.
  • Votes are tallied live as the chain progresses (concurrent proposals allowed).
  • Proposals reaching majority consensus SIGNER_LIMIT come into effect immediately.
  • Invalid proposals are not to be penalized for client implementation simplicity.

A proposal coming into effect entails discarding all pending votes for that proposal (both for and against) and starting with a clean slate.

Cascading votes

A complex corner case may arise during signer deauthorization. When a previously authorized signer is dropped, the number of signers required to approve a proposal might decrease by one. This might cause one or more pending proposals to reach majority consensus, the execution of which might further cascade into new proposals passing.

Handling this scenario is non obvious when multiple conflicting proposals pass simultaneously (e.g. add a new signer vs. drop an existing one), where the evaluation order might drastically change the outcome of the final authorization list. Since signers may invert their own votes in every block they mint, it's not so obvious which proposal would be "first".

To avoid the pitfalls cascading executions would entail, the Clique proposal explicitly forbids cascading effects. In other words: Only the beneficiary of the current header/vote may be added to/dropped from the authorization list. If that causes other proposals to reach consensus, those will be executed when their respective beneficiaries are "touched" again (given that majority consensus still holds at that point).

Voting strategies

Since the blockchain can have small reorgs, a naive voting mechanism of "cast-and-forget" may not be optimal, since a block containing a singleton vote may not end up on the final chain.

A simplistic but working strategy is to allow users to configure "proposals" on the signers (e.g. "add 0x...", "drop 0x..."). The signing code can then pick a random proposal for every block it signs and inject it. This ensures that multiple concurrent proposals as well as reorgs get eventually noted on the chain.

This list may be expired after a certain number of blocks / epochs, but it's important to realize that "seeing" a proposal pass doesn't mean it won't get reorged, so it should not be immediately dropped when the proposal passes.

Background

Ethereum's first official testnet was Morden. It ran from July 2015 to about November 2016, when due to the accumulated junk and some testnet consensus issues between Geth and Parity, it was finally laid to rest in favor of a testnet reboot.

Ropsten was thus born, clearing out all the junk and starting with a clean slate. This ran well until the end of February 2017, when malicious actors decided to abuse the low PoW and gradually inflate the block gas limits to 9 billion (from the normal 4.7 million), at which point sending in gigantic transactions crippling the entire network. Even before that, attackers attempted multiple extremely long reorgs, causing network splits between different clients, and even different versions.

The root cause of these attacks is that a PoW network is only as secure as the computing capacity placed behind it. Restarting a new testnet from zero wouldn't solve anything, since the attacker can mount the same attack over and over again. The Parity team decided to go with an emergency solution of rolling back a significant number of blocks, and enacting a soft-fork rule that disallows gas limits above a certain threshold.

While this solution may work in the short term:

  • It's not elegant: Ethereum supposed to have dynamic block limits
  • It's not portable: other clients need to implement new fork logic themselves
  • It's not compatible with sync modes: fast and light clients are both out of luck
  • It's just prolonging the attacks: junk can still be steadily pushed in ad infinitum

Parity's solution although not perfect, is nonetheless workable. I'd like to propose a longer term alternative solution, which is more involved, yet should be simple enough to allow rolling out in a reasonable amount of time.

Standardized proof-of-authority

As reasoned above, proof-of-work cannot work securely in a network with no value. Ethereum has its long term goal of proof-of-stake based on Casper, but that is heavy research so we cannot rely on that any time soon to fix today's problems. One solution however is easy enough to implement, yet effective enough to fix the testnet properly, namely a proof-of-authority scheme.

Note, Parity does have an implementation of PoA, though it seems more complex than needed and without much documentation on the protocol, it's hard to see how it could play along with other clients. I welcome feedback from them on this proposal from their experience.

The main design goals of the PoA protocol described here is that it should be very simple to implement and embed into any existing Ethereum client, while at the same time allow using existing sync technologies (fast, light, warp) without needing client developers to add custom logic to critical software.

Proof-of-authority 101

For those not aware of how PoA works, it's a very simplistic protocol, where instead of miners racing to find a solution to a difficult problem, authorized signers can at any time at their own discretion create new blocks.

The challenges revolve around how to control minting frequency, how to distribute minting load (and opportunity) between the various signers and how to dynamically adapt the list of signers. The next section defines a proposed protocol to handle all these scenarios.

Rinkeby proof-of-authority

There are two approaches to syncing a blockchain in general:

  • The classical approach is to take the genesis block and crunch through all the transactions one by one. This is tried and proven, but in Ethereum complexity networks quickly turns out to be very costly computationally.
  • The other is to only download the chain of block headers and verify their validity, after which point an arbitrary recent state may be downloaded from the network and checked against recent headers.

A PoA scheme is based on the idea that blocks may only be minted by trusted signers. As such, every block (or header) that a client sees can be matched against the list of trusted signers. The challenge here is how to maintain a list of authorized signers that can change in time? The obvious answer (store it in an Ethereum contract) is also the wrong answer: fast, light and warp sync don't have access to the state during syncing.

The protocol of maintaining the list of authorized signers must be fully contained in the block headers.

The next obvious idea would be to change the structure of the block headers so it drops the notions of PoW, and introduces new fields to cater for voting mechanisms. This is also the wrong answer: changing such a core data structure in multiple implementations would be a nightmare development, maintenance and security wise.

The protocol of maintaining the list of authorized signers must fit fully into the current data models.

So, according to the above, we can't use the EVM for voting, rather have to resort to headers. And we can't change header fields, rather have to resort to the currently available ones. Not much wiggle room.

Repurposing header fields for signing and voting

The most obvious field that currently is used solely as fun metadata is the 32 byte extra-data section in block headers. Miners usually place their client and version in there, but some fill it with alternative "messages". The protocol would extend this field to with 65 bytes with the purpose of a secp256k1 miner signature. This would allow anyone obtaining a block to verify it against a list of authorized signers. It also makes the miner section in block headers obsolete (since the address can be derived from the signature).

Note, changing the length of a header field is a non invasive operation as all code (such as RLP encoding, hashing) is agnostic to that, so clients wouldn't need custom logic.

The above is enough to validate a chain, but how can we update a dynamic list of signers. The answer is that we can repurpose the newly obsoleted miner field and the PoA obsoleted nonce field to create a voting protocol:

  • During regular blocks, both of these fields would be set to zero.
  • If a signer wishes to enact a change to the list of authorized signers, it will:
    • Set the miner to the signer it wishes to vote about
    • Set the nonce to 0 or 0xff...f to vote in favor of adding or kicking out

Any clients syncing the chain can "tally" up the votes during block processing, and maintain a dynamically changing list of authorized signers by popular vote. The initial set of signers can be given as genesis chain parameters (to avoid the complexity of deploying an "initial voters list" contract in the genesis state).

To avoid having an infinite window to tally up votes in, and also to allow periodically flushing stale proposals, we can reuse the concept of an epoch from ethash, where every epoch transition flushes all pending votes. Furthermore, these epoch transitions can also act as stateless checkpoints containing the list of current authorized signers within the header extra-data. This permits clients to sync up based only on a checkpoint hash without having to replay all the voting that was done on the chain up to that point. It also allows the genesis header to fully define the chain, containing the list of initial signers.

Attack vector: Malicious signer

It may happen that a malicious user gets added to the list of signers, or that a signer key/machine is compromised. In such a scenario the protocol needs to be able to defend itself against reorganizations and spamming. The proposed solution is that given a list of N authorized signers, any signer may only mint 1 block out of every K. This ensures that damage is limited, and the remainder of the miners can vote out the malicious user.

Attack vector: Censoring signer

Another interesting attack vector is if a signer (or group of signers) attempts to censor out blocks that vote on removing them from the authorization list. To work around this, we restrict the allowed minting frequency of signers to 1 out of N/2. This ensures that malicious signers need to control at least 51% of signing accounts, at which case it's game over anyway.

Attack vector: Spamming signer

A final small attack vector is that of malicious signers injecting new vote proposals inside every block they mint. Since nodes need to tally up all votes to create the actual list of authorized signers, they need to track all votes through time. Without placing a limit on the vote window, this could grow slowly, yet unbounded. The solution is to place a moving window of W blocks after which votes are considered stale. A sane window might be 1-2 epochs. We'll call this an epoch.

Attack vector: Concurrent blocks

If the number of authorized signers are N, and we allow each signer to mint 1 block out of K, then at any point in time N-K+1 miners are allowed to mint. To avoid these racing for blocks, every signer would add a small random "offset" to the time it releases a new block. This ensures that small forks are rare, but occasionally still happen (as on the main net). If a signer is caught abusing it's authority and causing chaos, it can be voted out.

Notes

Does this suggest we use a censored testnet?

So and so. The proposal suggests that given the malicious nature of certain actors and given the weakness of the PoW scheme in a "monopoly money" network, it is better to have a network with a bit of spam filtering enabled that developers can rely on to test their programs vs. to have a wild wild west chain that dies due to its uselessness.

Why standardize proof-of-authority?

Different clients are better at different scenarios. Go may be awesome in capable server side environments, but CPP may be better suited to run on an RPI Zero. Having a possibility to mix clients in private environments too would be a net win for the ecosystem, as well as being able to participate in a single spamless testnet would be a win for everyone at large.

Doesn't manual voting get messy?

This is an implementation detail, but signers may implement contract based voting strategy leveraging the full capabilities of the EVM, only pushing the results into the headers for average nodes to verify.

Clarifications and feedback

  • This proposal does not rule out clients running a PoW based testnet side by side, whether Ropsten or a new one based on it. The ideal scenario would be that clients provide a way to attach to both PoW as well as PoA based test networks (#225 (comment)).
  • The protocol parameters although can be made configurable at client implementers' discression, the Rinkeby network should be as close to the main network as possible. That includes dynamic gas limits, variable block times around 15 seconds, gas prices and such (#225 (comment)).
  • The scheme requires that at least K signers are online at any time, since that is the minimum number required to ensure "minting" diversity. This means that if more than K drop off, the network stalls. This should be solved by ensuring the signers are high-uptime machines and failing ones should be voted out in a timely fashion before too many failures occur (#225 (comment)).
  • The proposal does not address "legitimate" spam, as in an attacker validly spending testnet ether to create junk, however without PoW mining, an attacker may not be able to obtain infinite ether to mount the attack in the first place. One possibility would be to have a faucet giving out ether based on GitHub (or whatever else) accounts in a limited fashion (e.g. 10 / day) (#225 (comment)).
  • A suggestion was made to create checkpoint blocks for every epoch that contains a list of authorized signers at that point in time. This would allow light clients at a later point to say "sync from here" without needing to start from the genesis. This could be added to the extradata field as a prefix before the signature (#225 (comment)).

As the word censored is in cursive, I'd like to point out that while this proposal proposes a new public testnet with less decentralized characteristics, it's possible for anyone to run their own PoW testnet. Then you bear the infrastructure cost of doing so and the proposal does not limit your ability for this any way. This has been true from Ethereum day zero, as Ethereum clients have been very user friendly for running your private testnet.

Just to add on that, the proposal also does not restrict clients to run this exclusively. The proposal can run side-by-side with the current testnet, so users would be free to chose between the PoW Ropsten or the PoA Rinkeby.

We greatly support this approach! As a DApp Developer, we urgently need a public safe and reliable testnet, which obviously cannot be secured by PoW. DApps are beginning to interact heavily - only to mention status.im, metamask, uport, or other wallets - and only on a broadly accepted public testnet all projects will be present and able to test dependencies to others. For similar reasons, the new testnet should be as similar as possible to the mainnet - only then it can serve as a valid reference for developement. I'd prefer:

  • similar gas limit
  • similar block time
  • similar gas price
  • and for each parameter, a similar statistical distribution
    only then you can consider an application which runs on testnet as "tested".
    I appreciate the parity solution with kovan, because it gives some relief for short term, but I would like to encourage all involved parties to work together on a shared solution.

@christoph2806 Definitely, added to the proposal's clarification section.

With time some signers can go offline. Couldn't it be the case when at some block all of (N-K) signers who can mint the next block are stale and the network stuck?

For my proposal the network operators should ensure that stale signers are removed/replaced in a timely fashion. For testnet purposes this would probably be only a handful of signers that we can guarantee uptime.

How will the ether be distributed? It is important since spammer can try to get as much ether as possible from various sources and then use it to spam the network.

@hrishikeshio The issue with Ropsten was that the attacker minted tens of thousands of blocks, producing huge reorgs and pushing the gas limit up to 9B. These two scenarios could be avoided since only signers can mint blocks, so they could also retain some sanity limits.

The proposal does not specify any means for spam filtering for individual transactions as that is a new can of worms. I'll have to think a bit how best to solve that issue (around miner strategies), but limiting ether availability on a testnet is imho a bad idea. We want to be as inclusive as possible.

One possible solution would be to have a faucet that grants X ether / Y time (e.g 10 / day) but is bound to some OAuth protocol that has proper protection against mass account creation (e.g. github accout, email address, etc).

Snippet to claim a github user ownership to an ethereum address

contract GitHubOracle is  usingOraclize {
    //constant for oraclize commits callbacks
    uint8 constant CLAIM_USER = 0;
    //temporary storage enumerating oraclize calls
    mapping (bytes32 => uint8) claimType;
    //temporary storage for oraclize user register queries
    mapping (bytes32 => UserClaim) userClaim;
    //permanent storage of sha3(login) of github users
    mapping (bytes32 => address) users;
    //events
    event UserSet(string githubLogin, address account);
    //stores temporary data for oraclize user register request
    struct UserClaim {
        address sender;
        bytes32 githubid;
        string login;
    }

    //register or change a github user ethereum address
    function register(string _github_user, string _gistid)
     payable {
        bytes32 ocid = oraclize_query("URL", strConcat("https://gist.githubusercontent.com/",_github_user,"/",_gistid,"/raw/"));
        claimType[ocid] = CLAIM_USER;
        userClaim[ocid] = UserClaim({sender: msg.sender, githubid: sha3(_github_user), login: _github_user});
    }
  //oraclize response callback
    function __callback(bytes32 _ocid, string _result) {
        if (msg.sender != oraclize_cbAddress()) throw;
        uint8 callback_type = claimType[_ocid];
        if(callback_type==CLAIM_USER){
            if(strCompare(_result,"404: Not Found") != 0){    
                address githubowner = parseAddr(_result);
                if(userClaim[_ocid].sender == githubowner){
                    _register(userClaim[_ocid].githubid,userClaim[_ocid].login,githubowner);
                }
            }
            delete userClaim[_ocid]; //should always be deleted
        }
        delete claimType[_ocid]; //should always be deleted
    }
    function _register(bytes32 githubid, string login, address githubowner) 
     internal {
        users[githubid] = githubowner;
        UserSet(login, githubowner);
    }
}

User create a gist with his public address and call register passing _github_user + _gistid

From https://github.com/ethereans/github-token/blob/master/contracts/GitHubToken.sol

There could be a light quick proof of stake system where (like the github oraclize above) people need 5ETH locked to a live net contract address that then allows them to be on the testnet. Misbehave, and the ethereum foundation (or who ever runs it) confiscates your eth.

Yeah, side chains are an interesting idea but those are a whole new can of worms :)

Two thoughts:

Last week, INFURA launched a (private but publicly available) chain called INFURAnet (with INFURA running all the authorities) to provide a usable test network in the face of the Ropsten issues. It was obviously based on Parity but we would feel better if PoA was a standard and compatible feature across all clients. Therefore, we support this EIP.

Additionally, if Ropsten is replaced with a PoA network, we would be happy to run one of the authorities.

What about still using PoW on the testnet, but with slightly modified parameters:

  1. Block Reward = 0
  2. Gas price is fixed to certain value
  3. There is a hard cap on the gas limit in a block
  4. Faucet gives testnet Ether only to accounts that have Ether in the same account on the main net, and that Ether is at least 24 hours old. Each account only receives test Ether once. Or some other limitation of this sort, which will allow faucet to be automatic, but will limit sybil attacks.

Hopefully, implementation could be much easier than Proof Of Authority

EDIT: Another idea - can Block Reward be negative? Meaning that mining actually cost Test Ether. That allows implementing sort of "Proof Of Authority" trivially, by simply distributing large amounts of test Ether. It also means that if Test Ether is dished out periodically, the maintainers of the test net can disallow abusive miners by not giving them the next tranche of test Ether

The issue with your modified PoW scheme is that it still permits creating huge reorgs by mining lots of blocks, even if without reward.

The second proposal doesn't solve this issue either as a malicious user might accumulate a lot of ether first, then create many many parallel chains. All will be valid since he does have the funds, and there's no way to take it away. Arguably more stable than the first proposal, but doing negative rewards might break clients unexpectedly as I don't think most codebases catered for this possibility.

Btw, the zero block reward is a nice idea for PoA too, as it prevents a rogue signer / leaked key from ruining the chain with accumulated funds.

@karalabe Thanks! What I meant with the negative rewards - the maintainer of the network gives out enough Test Eth to current miner authorities to mine, lets say, for a week. After the week, the maintainer looks who needs a top-up, and only gives a top up to miners who behaved well. For those who did not behave well, the payouts simply stop.

@karalabe Ah, I got your point about the parallel chains now. In that case, there needs to be some kind of regular expiration of Test Eth :)

Here's GoEthereum on Tendermint.

https://github.com/tendermint/ethermint

The goal is to make as much of GoEthereum as compatible as possible.

Come to #ethermint on the Tendermint slack for discussions.

We have some upstream patches that would make Ethermint much cleaner. See the bottom of https://github.com/tendermint/ethermint/pull/42/files

We're pushing GoEthereum to high tx limits and uncovering some issues.

Just to mention a proposal by @frozeman and @fjl of adding the set of signers to the extra-data field of every X block to act as a checkpoint. This wouldn't be useful now, but it would permit anyone trivially adding a logic to "sync form H(X)" where H(X) is the hash of a checkpoint block.

The added benefit is that this would allow the genesis block to store the initial set of signers and we wouldn't need extra chain configuration parameters.

Here's a suggested protocol change: https://gist.github.com/holiman/5e021b24a7bfec95c8cc84b97e44e45a

It was a bit too long for fitting in a comment.

@holiman To react a bit to the proposal here too, I see one problem that's easy-ish to solve, another that's hard:

Your scheme must also ensure that blocks cannot be minted like crazy, otherwise the difficulty becomes irrelevant. This can be done with the same "min 15 seconds apart" guarantee that the original proposal had.

The harder part is that with no guarantee on signer ordering/frequency (only relying on the difficulty for chain quality/validation), malicious signers can mine very long chains that aren't difficult enough to beat the canonical, however the nodes cannot know this before processing them. And since creating these chains is mostly free in a PoA world, malicious signers can keep spamming with little effort.

The original proposal had a guarantee that the majority of the signers agreed at some point that a chain is valid (even if it was reorged afterwards), so minority malicious miners can only feed made up chains of N/2 blocks.

The difficulty idea is elegant btw, just not sure how yet to make use of it :)

keorn commented

If you do not mind somewhat relying on UNIX time and longer block times when validators are down, then Aura (in Parity) uses something like that:

  • time is divided into steps, the current step is t / step_duration
  • the primary for step is step % length(validators)
  • the header seal is a list of two values: step and signature (step is redundant and can be removed in a future version)
  • the total difficulty or as we refer to it "chain score" is set to be (using appropriate differencing to obtain block difficulty): U128_max * height - step

Validation: block at a given step can be only signed by the primary, only first block for a given step is accepted (if a second is received, a vote to remove the authority should be issued), block can arrive at most 1 step ahead.

Validator set can be altered in the way @karalabe proposed.

Either way we will attempt to implement whichever solution is elected.

I'm not too fond of relying on time. Using @holiman 's proposal of calculating "your turn" based only on block height seems a bit better in respect as nodes don't have to be synced.

Any particular reason for having the chain difficulty calculated like that instead of just the height of the chain for example? What does this more complex formula gain you?

The issue I see with Aura's turn based scheme is that if a few signers drop off (which can be only natural in an internet scale system), then the chain dynamics would become quite irregular, with "gaps" in the minting time; versus my proposal where multiple signers can fill in for those that dropped.

If I understand correctly, the idea in the difficulty algorithm is to score those chains higher that have the most signers signing at the correct turn. So chains that skip blocks are scored less vs. those that include all signers.

What happens in scenarios where blocks are minted in step, but propagated later after the step ends? Or if some signers receive the next block in time, while some signers receive it a bit later after the step ended?

I've updated the proposal with a tech spec section describing the proposed PoA protocol itself. It's still missing a few details around signing (notably the 1-out-of-K block constraint), and I've yet to figure out the difficulty calculation.

Also I split off the PoA protocol from the testnet itself naming wise as I'd like to keep the two concepts separated to avoid confusion. Using metro station names for the testnets is fine, but for a reusable PoA scheme I wanted something a bit more "mundane" and/or obvious.

The names are still up for finalization. The Clique name for the PoA scheme (best until now) was suggested by @holiman .

Id recommend using the Ethermint or Eris DB permissioning native contract or both. They've both been tested extensively and both would not require reinventing the wheel. Furthermore we're all friends here and have done the heavy leg work here so...why not?

It's hard to evaluate such a proposal without any details. I personally am not familiar with either how the work, so I cannot comment on their feasibility.

My main design goals here are to be easy to add to any client and support current techs (fast, light, warp sync) without invasive changes.

Can those consensus engines be plugged into all clients? Can they run on mobile and embedded devices? Are they fully self contained without external dependencies? Can they achieve consensus header only? Are they compatible licensing wise with all clients? These all are essential requirements I've tried to meet.

I'm happy to consider them, but you need to provide a lot more detail to evaluate based upon.

Absolutely.

So both use a tendermint consensus Proof of Stake, that is detailed here:

https://github.com/tendermint/tendermint/wiki/Byzantine-Consensus-Algorithm

As for the pluggability of the algorithm, it's been proven to be quite doable, in fact, Parity has already done it:

https://github.com/ethcore/parity/blob/ade5a13f5bad745b4200ececde42aa219ad768ae/json/src/spec/engine.rs

And ethermint already implements this through geth in a way (I wouldn't be the one to give the details, that would be something for @jaekwon or @ebuchman to explain)

https://github.com/tendermint/ethermint

As for Eris-DB and your attempt at permissioning by way of Proof Of Authority, we simply utilize the above BFT consensus algorithm and on top of that utilize a native contract (not dissimilar to the current cryptographic addresses such as SHA256, RIPEMD-160, etc.) to implement a permissioning scheme amongst the validators.

While we have our own version of the EVM that is much more stripped down than Geth, I don't think it would be something difficult to make a modular go package for ease of implementation (CC @silasdavis ):

https://github.com/eris-ltd/eris-db/blob/master/manager/eris-mint/evm/snative.go#L73

The above ^^^ could be implemented in a way through geth via some tinkering with this function in geth:

https://github.com/ethereum/go-ethereum/blob/master/core/vm/contracts.go#L33

Both solutions are written in Golang so there is surely a way to make them somewhat compatible. Again. Trying to find a way to work together so ya'll can keep your focus ;)

Maybe instead of all these fancy ideas just ask Bitcoin how they are able to have functional PoW testnet?
Hint: block size (i.e., gas limit) is bounded.

But off-course we cannot allow testnet to have different behavior than mainnet.
So let's us PoA instead. Exactly as in mainnet.

We could have a bounded-limit PoW network as well. Let's have several options.

Could the PoA testnet be started from a state snapshot taken from the PoW testnet (perhaps from the Ropsten bounded-gas-limit soft-fork block)? And if the PoA configuration uses the same EIP155 CHAIN_ID=3 as Ropsten, then transactions can be replayed on both the PoA chain and the PoW chain. Replaying transactions on both testnets might be convenient for deploying contracts etc.

I'm not convinced that's a good idea.

  • Starting from a huge snapshot would require that all clients implement snapshotting, or at least whatever's needed to load it in the genesis. Geth afaik can do it, but I'm not sure whether the others support it or not.
  • One of the spam protection feature planned (and AFAIK also present in Kovan) was to try and limit the supply of ether so malicious actors cannot stockpile too large amounts. This would be utterly broken if we loaded up a Ropsten snapshot where I'm assuming our original attacker had a huge pile already (or others for that matter).
  • Ropsten is probably also dead and I can imagine it would get a reboot too. Still though, relaying transactions between these two networks (irrelevant whether reboot or not) would just be messy since there would be a ton of funds on ropsten for mining, so there would be a constant influx of ropsten transactions that can't be executed anyway.

Imho it's nicer to start with a clean slate.

cdetrio does not suggest a snapshot feature (as far as I understand).
Just have same network id and replay all ropsten txs until the attack.

I don't understand why everyone keep claiming the amount of ether the attacker had was the problem.
IMO it was his (relatively) huge mining power.
If gas limit would stayed at 4.7M he couldn't spam as much.

PoA doesn't have mining rewards and the block miners would be different, the the transactions couldn't be replayed as is, since the accounts wouldn't have the funds.

Noone claimed the ether was the problem. We highlighted that with infinite ether, you can repro the same problem in a PoA network too without much mining power if blocks are not limited.

Technically you can make a fresh account with many ethers, and after every ropsten block was mined add a tx that gives the miner the block reward and fees.
I am not suggesting to do it. Just wondering if this is what @cdetrio had in mind.

If ether amount is not a problem (given block size is bounded), why do you insist to verify an indentity before giving away ether?

I personally don't want to place a limit on the block size. Looking at bitcoin, they have huge problems because of that limit. Even though this is a testnet, I'd like to retain the core concepts of Ethereum (yes, I know PoA isn't mainnet, but Ethereum never wanted to settle on PoW anyway, so I see no issue with pushing towards dropping PoW).

Do you think it wise not to have PoW testnet at all, while mainnet is still PoW?

Personally, I have an agenda here. I am part of the smartpool.io team. And it will be hard to deploy it on mainnet before we can show people it works on testnet (we have our own private network but it is not the same).

I don't know how many more people need PoW feature.
I think metropolis has some changes in uncle mechanism. How can they test it without a PoW testnet?

It's fine to have a PoW testnet too beside a PoA one to test out forks. We can go down the block limiting route on that.

Just wanted to ping the thread that I've finished writing up the proposal. We also have a prototype implementation in go-ethereum ethereum/go-ethereum#3753, in the consensus/clique package (I didn't link the commit because occasionally I force push the PR during development).

I'll spend the next few days trying to put together a small beta-test network and also to write up some tests to validate that everything works correctly (mostly around voting and dynamic signer updates).

@VoR0220 I'm still uncertain whether I understand your two proposals, but I did notice a few things that made me uncertain whether they would be appropriate.

Tendermint seems to rely on a complex cross node interaction to reach consensus on the final block, which inherently means added network complexity. Eris DB seems to be based on a slimmed down EVM, which inherently means that stateless syncs (fast, light) cannot verify the chain. Did I misunderstand something?

All in all though to support my proposal or any of your proposals, clients need to have support for some baseline pluggable consensus engines, so either approach requires work from core devs. I'm not sure about the other proposals, but at least after implementing mine I can guarantee that the PoA and previous PoW can be done without too invasive rewrites, although non trivial to say the truth.

Here's an alternative idea:
Keep the list in the contract for flexibility. The contract emits events when the list changes. Light/fast sync can examine event blooms and transaction receipts and downloads proofs for the changes. The proofs are also added to the warp snapshot.

The idea is not bad per se, but it blows up the complexity of the proposal significantly:

Light clients don't have access to receipts during sync, so every time the event bloom looks like there's something there, the light client needs to retrieve it. This means that sync code and consensus code all of a sudden get tied together, since sync needs to occasionally pull in extra data. This is quite a large can of worms to open up, especially since there might be much stricter resource constraints on light clients for network traffic, as well as serving nodes may throttle them on large downloads.

The scheme is susceptible to attack that fake "consensus updates" in the log bloom. E.g. I as an attacker can issue a transaction per block that emits some logs which map to the same bloom bits as the consensus contract events. This means that light clients will end up needing to pull in all recepts and a ton of state just to figure out it's a false alarm.

But perhaps most importantly, one of the core requirements of the original proposal was that it should be trivial to embed into other clients. Of course they do need to support some consensus engine pluggability, but based on the code in geth, the entire Clique consensus engine can be done (extremely commented) in 500-750 lines of code, fully self contained into two files. (My pr contains a lot of general cleanup and also reworking ethash in the mean time). The entire proposal depends on implementing a "header check", a "header preparer" and a "sign block" method, which are analogous to those needed by ethash. All else works just as is. Imho this is a very strong benefit that should not be discarded lightly.

Light clients don't have access to receipts during sync

LES protocol supports GetReceipts message.

The scheme is susceptible to attack that fake "consensus updates" in the log bloom. E.g. I as an attacker can issue a transaction per block that emits some logs which map to the same bloom bits as the consensus contract events. This means that light clients will end up needing to pull in all recepts and a ton of state just to figure out it's a false alarm.

This would require reversing a Keccak hash, wouldn't it?

As for traffic increase, list modifications are expected to be rare enough for it to be negligible.

Does not look much harder to implement to me. Trivial for clients that don't support fast sync or light client protocol. And it does not impose a hard-coded governance scheme.

LES protocol supports GetReceipts message.

That's a significant overhead to call during syncing.

This would require reversing a Keccak hash, wouldn't it?

The blooms don't use the full hash, only a few bytes from it, so it should be significantly easier to brute force. Given that the consensus contract's address wouldn't change, it shouldn't be too much of an effort to try and break it.

As for traffic increase, list modifications are expected to be rare enough for it to be negligible.

Not if I can attack it.

Trivial for clients that don't support fast sync or light client protocol.

Given that CPP is just working on adding fast sync and I assume light is next for many client implementations, that's just taking a shortcut now that will bite us hard in the long run.

And it does not impose a hard-coded governance scheme.

That hard coded governance is PoA by majority consensus. I don't see a reason to make it more flexible than this.

fjl commented

I can see both sides for this:

The annoying part of scalable-ish PoA is managing authorised signers. Doing it with a contract is easier because the logic can be shared solidity code and arbitrary new signer management policies can be implemented later.

But implementing it as a contract also adds non-trivial development overhead now because blockchain syncing gets more complicated. @arkpar, I guess you could answer these:

  1. How much work would it be to add clique PoA (as proposed here) to Parity?
  2. How much work would it be to add contract-based PoA (as suggested by you) to Parity?

Contract-based PoA is already implemented in Parity, just without conveniences for light clients. Clique/Rinkeby probably wouldn't be a whole lot to implement, but @keorn can answer better.

I would favor a middle ground. A generic validators contract has one method: getValidators() -> [Address]. We can include the signed sha3(getValidators()) as part of the seal for any given block. Light clients can simply fetch fraud proofs when this changes. In the event that some malicious validators don't update the field even when getValidators() would be different, a mandate to follow the longest chain and an honest majority assumption is enough to ensure that the correct chain is synchronized to.

This will work most efficiently with infrequent changes in the validator contract. If they are epoch-based at around once per day, the overhead imposed on light clients synchronizing would not be very high, although there is a stronger availability requirement on the network to continue to store getValidators() state proofs for ancient transitions.

Hi i've some questions on this:

  1. If a Signer is 'in-turn' he can do whenever he wants, also sign an invalid block (e.g. a block with empty transactions). How the protocol manage this is not clear.
  2. How signers collect transactions? May happens that two signers in different time broadcast blocks with the same transactions? I mean, clients transactions are broadcast to all signers or to just one of them?

Thanks

  1. A signed block doesn't make it valid. All the yellow paper rules still apply, the signature is just one more requirement. Empty block are not invalid, a signer is free not to include any transactions.

  2. Broadcasting and mining is the same as for all consensus engines. Transactions propagate all over the network, signers aggregate them and include them in blocks when it's their turn (or possibly out of turn too for less difficulty).

Ethereum has its long term goal of proof-of-stake based on Casper, but that is heavy research so we cannot rely on that any time soon to fix today's problems.

Apparently that's no longer true?

https://blog.ethereum.org/2017/04/01/ethereum-dev-roundup-q1/

After three years of trying to find solutions to the β€œnothing at stake” and β€œstake grinding” attacks, we have decided that the problem is too hard, and secure proof of stake is almost certainly unachievable. Instead, we are now planning to transition the Ethereum mainnet to proof of authority in 2018.

@jamesray1 You are referring to April Fools' joke :)

Rightio ;). I guess that makes sense, especially given some of the other statements, e.g. PoTcoin, Proof of Vitalik, etc., and why he had a "boring edition". I can't believe I fell for this the first time I quickly read some of it, although I did remember being perplexed when reading the bullet point for PoV.

Is PoA only planned for achieving testnet security? Or will the general public Ethereum network be adopting this protocol as well?... What is the scope of this scheme?

Just for testnet, hackathon networks, private development networks etc. Maybe cross company networks can work too. Global? No.

Is the network asynchronous, synchronous or eventually synchronous ??

Hi there. @deanstef I guess the network is eventually synchronous, but I'm not sure and I have a concern about it.

Specifically, could anyone explain me whether is there any difference between the network model of Clique and Aura? With reference to the CAP theorem, I suppose that Aura is a CA system (consistency and availability, but not partition tolerance as it requires synchronous network). Instead, I guess that Clique should be PA (partition tolerance and availability, no consistency as I can have forks).

If my claim is correct, Clique could work in a eventual synchronous system. Am I right?

If an EIP based on this were accepted, should it mean:

  1. Updating the main Ethereum spec to remove the constraint of 32 bytes maximum length on extraData, or...
  2. Would 32 bytes remain the maximum of the main spec, and only this PoA sub-specification relax that constraint, or...
  3. Some third option, like the data will end up somewhere else than extraData?

If the answer is number 2, it seems important that nodes can communicate to clients that they are using the PoA spec. I'm not sure the best way for the node to accomplish this, but one option is to return a different protocol version from eth_protocolVersion. (This is important in web3.py, For example, because web3.py currently rejects extraData longer than 32 bytes).

Can POA really be vulnerable to one malicious node?

@zillerium not at all. PoA is more in the field of classical consensus algorithms where you have a simple honest majority assumption. Depending on the kind of network faults you'll also tolerate, you can either tolerate <1/2 malicious or <1/3. In some situations you can tolerate all but one node being malicious but just not be able to finalize anything. It all depends on the algorithm you use and your assumptions.

In a private network with POA, is it possible to limit nodes of deploying smart contracts (specially creating new tokens)?

@AminOroji The Clique engine proposed here and implemented by Geth only limits the block creation, but doesn't enforce (nor support) any other limitations.

Guys, for those interested, here you can find a nice analysis of PoA algorithms:

https://eprints.soton.ac.uk/415083/

Specifically we have compared Clique and Aura (Parity) against PBFT, by referring the CAP Theorem.

Interesting read, thank you for sharing that!

Though I think it's important to mention that Clique was never meant as a full blown universal permissioned consensus engine, rather its target use case was to have stable test/private networks and its internal design was made in such a way as to minimize differences between Ethash and the general operation of existing nodes.

@karalabe Thank you very much for valuable information.

@deanstef Very interesting. Thanks!

hadv commented

Hi @karalabe

In my private PoA network; I try to make a block reward similar to PoW but the nodes will be failed sync and report bad block. Can you help me to clarify the root cause of this problem when we modify state on the Finalize() function of consensus.Engine?

Thank you so much!

You need to find the counterpart which verifies the header fields and modify that too to accept your modified block reward.

@karalabe

As you mentioned above, the nonce has been repurposed in POA. May I know what is the role of nonce in Clique.

hadv commented

@karalabe okay, I found that it's the problem of the modified code; it make consensus become difference between the nodes so some bad block was reported when importing chain. Thank you!

hadv commented

@AminOroji AFAIK, the coinbase and nonce in Clique is now used for proposing a new signer

@hadv Do you know how to solve "bad block" problem when add block reward in clique. Thanks in advance.

hadv commented

@manxiaqu your chain is brand new or already have mined blocks?

@hadv Sorry for the delay, our chain is just begin on test, totally new. Do you think modify the Finalize method to code belows is a good idea/reasonable solution?

func (c *Clique) Finalize(chain consensus.ChainReader, header *types.Header, state *state.StateDB, txs []*types.Transaction, uncles []*types.Header, receipts []*types.Receipt) (*types.Block, error) {
	// Accumulate any block rewards and commit the final state root
	// Try to get block signer from the block header. Otherwise use clique singer(on mining)
	signer, err := ecrecover(header, c.signatures)
	if err != nil {
		signer = c.signer
	}
        // Just add block rewards to signer
	accumulateRewards(chain.Config(), state, header, signer)
	header.Root = state.IntermediateRoot(chain.Config().IsEIP158(header.Number))
	header.UncleHash = types.CalcUncleHash(nil)

	// Assemble and return the final block for sealing
	return types.NewBlock(header, txs, nil, receipts), nil
}
hadv commented

@manxiaqu I think no problem, but if your chain already mined some block before you change the consensus in the Finalize() method then it will become a hard-fork. We should set the block number to start to apply the new consensus to avoid bad block.

@hadv Got it, Thank you very much!

@karalabe you mention handling the PoA voting process over to using a smart-contract based approach.

I'm currently trying to implement a basic proof-of-concept for doing just that - but i have had a couple of hurdles..

Can anyone give some pointers in relation to how this would be done correctly.. So far ive tried to do some logic in the apply() function, to just add a random address from a valid unauthorised node to the signers array but this usually ends up with 2 errors (invalid chain hash/issues with the block difficulty).. i was then thinking I could programatically assign addresses to the clique.propose array set to true and set the NONCE_AUTH to true but still no luck, my network dies a harsh death and no new signers.

Why is wiggleTime 500ms? Why not 200ms or 100ms? What are the special considerations here?

It generally takes a few hundred milliseconds to propagate a block through the entire network, even to fast and well connected peers. That means that you want your in-turn signer to have enough time to get their blocks in before everyone else starts spamming.

@karalabe Thank you very much!

@karalabe In the changelog "Attack vector: Concurrent blocks", it is mentioned that "If the number of authorized signers are N, and we allow each signer to mint 1 block out of K, then at any point in time N-K miners are allowed to mint." Why is it not N-K+1 miners are allowed to mint any any point in time?

You are correct, it should be n-k+1

Thank you!

@karalabe Why do we need the nonce field for validator proposal?

I understand if nonce==0x000000 the vote is to remove a validator else if nonce==0xFFFFFF the validator is to be added.

However since we can only add addresses that are not already listed as validators and remove addresses that are validators I don't see the use.

I can see that potentially it makes it easier to tell initially whether the vote should be added to remove or add lists but a check will nonetheless have to be made that an address exists or not...

That's a fair point. I guess when I designed the thing I though this is cleaner or more explicit. Could have been done without. Too late to change it now though.

Thanks was worried I was missing something crucial :)

This is just fantastic, I Love your guys. Finally i can figure out how does Clique do

@karalabe In the Attack Vector: Censoring signer it is mentioned K = floor(SIGNER_COUNT / 2) + 1 prevents a signer from censoring votes, would K > floor(SIGNER_COUNT / 2) + 1 offer better security? let say if K = SIGNER_COUNT doesn't that mean an attacker would need 100% of the signer to launch an attack? Also since it is a PoA doesn't it also make sense to just have 1 signer proposing blocks instead of N-K, this would make Clique deterministic instead of probabilistic. Could someone please explain the advantage/reasoning behind K = floor(SIGNER_COUNT / 2) + 1?

Can someone please clarify what it means by "Only the latest proposal per target beneficiary is kept from a single signer", does it mean that a signer can vote for 2 or more different address to get added or removed while proposing blocks, however, a single signer cannot vote twice for the same address to be added or removed across their block proposals, right?

@aneequesafdar It's a tradeoff between security and usability. Making K == SIGNER_COUNT would indeed be the most secure option, but then if one signer goes offline, your chain gets stuck because only that single signer is allowed sign the next block. The reason I chose 50% is because that's what the PoW guarantees too.

Can someone please clarify what it means by "Only the latest proposal per target beneficiary is kept from a single signer"?

If a signer proposes to add someone, and afterwards proposes to not add them (i.e. goes back on the proposal), then only the vote against counts, not the vote for.

@karalabe Does the extraData have a size limit?

If so doesn't it mean we a limiting the number of validators in the network?

If there is no limit to the extraData, then I believe the block size will increase for a large number (> 100) of signers on the network, what impact would this have on the whole network?

There is no limit on the extra data. However, only epoch blocks (30K by defaulf) contain the full signer list, intermediate blocks calculate it based on the votes in the epoch, so for 1000 signers, that would still mean 32KB of extra data every week or so.

@karalabe Is there a limit of how many transaction we can fit in a block? I am running a private PoA network with a ridiculous targetgaslimit of 800 million with 10s period but i can only see a max of 2300 transactions in a block and with gas used of 50337000. What is stopping my block from containing more than 2300 transactions? thanks.

axic commented

@karalabe would it be possible to turn this into an EIP and merge it? Then this issue can still be used as a discussion url.

The benefit of having it merged as an EIP is that changes are tracked properly.

@karalabe like this algorithm very much! It's till my favourite one for PoA scenarios.

I have two questions:

1/ can transactions in clique be considered 100% final, if we have enough confirmation blocks from 50%+1 different signers? Or how else would you define transaction as final?

2/ having a remote client using web3 via JSON RPC port, how do I get the number of signers in the network? Do I have to count entries in block's extraData at epoch blocks? Or can I access somehow the clique objects via JSON RPC?

@ivica7

  1. There is a small chance of soft forks in Clique, thus, same heuristics as PoW can be used, ie 5-6 block confirmations or more.

  2. Use the JSON RPC to connect to the client and you can use the following clique functions:

clique
{
  proposals: {},
  discard: function(),
  getProposals: function(callback),
  getSigners: function(),
  getSignersAtHash: function(),
  getSnapshot: function(),
  getSnapshotAtHash: function(),
  propose: function()
}

@karalabe can we set a period longer than 15s for clique?

@karalabe can we set a period longer than 15s for clique?

@nahuseyoum Yeah just set it when your making the genesis. use puppeth, to generate it.

@karalabe : can you explain the part about 'Attack Vector: malicious signer' ? If i am a authorised sealer and I enter some malicious transactions into the block and mine it. How will my block be invalidated? I know that in Aura there is back and forth voting on the validity of every block but as far as i understand in Clique the sealer just mines the block.. correct me if i am wrong..

The malicious signer scenario is more for the case where a signer is mining side forks all over the place, or including junk into the chain. In that case the network can try to vote it out.

Minting an invalid block is not possible, because all other nodes (and signers) in the network still validate and execute each block, so if you create a bad one, it will just get discarded.

Hi all, sorry to bring up a whole new topic, but I got myself wondering today what would happen when the Ether on all the accounts eventually runs out...

I mean, because no Ether is given for minting blocks, (actually no Ether is EVER given out) the accounts on a PoA network will only spend Ether and even though I allocated a lot of Ether for the accounts upon creation and I transfer Ether to new joiners, one day (even though after 1,000 years) the Ether will end for all the accounts, and when that does happen, how will new transactions be issued?

Is there a way to continuously give away Ether to all accounts, or something like that??

I run a demo private network where two nodes are authorities and one is only information poster...

Thanks in advance,