ethereum/EIPs

Yet another cool checksum address encoding

vbuterin opened this issue Β· 77 comments

EDITOR UPDATE (2017-08-24): This EIP is now located at https://eips.ethereum.org/EIPS/eip-55. Please go there for the correct specification. The text below may be incorrect or outdated, and is not maintained.

Code:

def checksum_encode(addr): # Takes a 20-byte binary address as input
    o = ''
    v = utils.big_endian_to_int(utils.sha3(addr))
    for i, c in enumerate(addr.encode('hex')):
        if c in '0123456789':
            o += c
        else:
            o += c.upper() if (v & (2**(255 - i))) else c.lower()
    return '0x'+o

In English, convert the address to hex, but if the ith digit is a letter (ie. it's one of abcdef) print it in uppercase if the ith bit of the hash of the address (in binary form) is 1 otherwise print it in lowercase.

Benefits:

  • Backwards compatible with many hex parsers that accept mixed case, allowing it to be easily introduced over time
  • Keeps the length at 40 characters
  • The average address will have 60 check bits, and less than 1 in 1 million addresses will have less than 32 check bits; this is stronger performance than nearly all other check schemes. Note that the very tiny chance that a given address will have very few check bits is dwarfed by the chance in any scheme that a bad address will randomly pass a check

UPDATE: I was actually wrong in my math above. I forgot that the check bits are per-hex-character, not per-bit (facepalm). On average there will be 15 check bits per address, and the net probability that a randomly generated address if mistyped will accidentally pass a check is 0.0247%. This is a ~50x improvement over ICAP, but not as good as a 4-byte check code.

Examples:

  • 0xCd2a3d9f938e13Cd947eC05ABC7fe734df8DD826 (the "cow" address)
  • 0x9Ca0e998dF92c5351cEcbBb6Dba82Ac2266f7e0C
  • 0xcB16D0E54450Cdd2368476E762B09D147972b637

This is very nice idea.

I wander about the 0x prefix. Is it mandatory or preferred?

Hmm, I'm fine either way, though I definitely see the rationale for standardizing one way or the other.

I saw comments on the TurboEthereum guide that suggested that we were moving away from raw hex keys into ICAP keys:

ICAP: XE472EVKU3CGMJF2YQ0J9RO1Y90BC0LDFZ
Raw hex: 0092e965928626f8880629cec353d3fd7ca5974f

"Notice the last two lines there. One is the ICAP address, the other is the raw hexadecimal address. The latter is an older representation of address that you'll sometimes see and is being phased out in favour of the shorter ICAP address which also includes a checksum to avoid problems with mistyping. All normal (aka direct) ICAP addresses begin with XE so you should be able to recognise them easily."

My concern is that if there was a previous decision to start moving to ICAP, I'm not sure if this will add confusion. However, if this helps give raw hex addresses a checksum I guess that can only be beneficial, even if everyone wants to move to ICAP eventually.

My preference is that a checksum-enabled Ethereum address is immediately recognizable as such.

The proposed solution is not immediately recognizable as being distinct from a standard Ethereum address and could be confused for being a strangely-cased version of non-checksummed addresses. Although it offers superior backwards compatibility, I believe will only cause additional confusion to the end-user.

Since the change in format serves to make the address less error prone through checksums, I posit they should also be immediately recognizable through a fixed prefix or otherwise obvious identifier. One reason why I prefer ICAP over this proposed solution is that it signals to the user clearly that this is an Ethereum address and cannot be confused with a transaction/block hash.

Just saw this proposal now.

I disagree @tgerring that it will cause confusion: to a layman, it will be indistinguishable from a normal address. This approach is very easy to implement in the client side and doesnt require much. I would say this could be adopted as a great intermediary before ICAP – also would be a good alternative if ICAPs don't catch on.

I did a rudimentary implementation on javascript in the web3 object:

var isAddress = function (address) {
    if (!/^(0x)?[0-9a-f]{40}$/i.test(address)) {
        // check if it has the basic requirements of an address
        return false;
    } else if (/^(0x)?[0-9a-f]{40}$/.test(address) || /^(0x)?[0-9A-F]{40}$/.test(address)) {
        // If it's all small caps or all all caps, return true
        return true;
    } else {
        // Otherwise check each case
        address = address.replace('0x','');

        // creates the case map using the binary form of the hash of the address
        var caseMap = parseInt(web3.sha3('0x'+address.toLowerCase()),16).toString(2).substring(0, 40);

        for (var i = 0; i < 40; i++ ) { 
            // the nth letter should be uppercase if the nth digit of casemap is 1
            if ((caseMap[i] == '1' && address[i].toUpperCase() != address[i])|| (caseMap[i] == '0' && address[i].toLowerCase() != address[i])) {
                return false;
            }
        }
        return true;
    }
};


/**
 * Makes a checksum address
 *
 * @method toChecksumAddress
 * @param {String} address the given HEX adress
 * @return {String}
*/
var toChecksumAddress = function (address) {

    var checksumAddress = '0x';
    address = address.toLowerCase().replace('0x','');

    // creates the case map using the binary form of the hash of the address
    var caseMap = parseInt(web3.sha3('0x'+address),16).toString(2).substring(0, 40);

    for (var i = 0; i < address.length; i++ ) {  
        if (caseMap[i] == '1') {
          checksumAddress += address[i].toUpperCase();
        } else {
            checksumAddress += address[i];
        }
    }

    console.log('create: ', address, caseMap, checksumAddress)
    return checksumAddress;
};

It works internally and it's almost invisible to the user. I don't really see a good reason not to implement it.
My results don't match yours, @vbuterin it might be interesting to figure out why. Here are my results omitting the '0x' before hashing the address:

  • 0xCD2a3d9F938e13cd947Ec05AbC7fE734dF8dd826
  • 0x9CA0E998df92C5351CeCBBb6DBa82Ac2266F7E0c
  • 0xCb16d0e54450cDd2368476E762b09D147972B637

And here the results including 0x on the hash of the address:

  • 0xCd2A3D9f938e13CD947ec05abC7Fe734dF8dd826
  • 0x9Ca0e998DF92c5351CECbBb6DbA82ac2266F7e0C
  • 0xCB16D0e54450cDD2368476e762b09d147972B637

Just pushed an experimental branch to web3.js and the wallet.

I would love feedback from anyone on those.

web3.sha3('0x'+address)

You're hashing the hex and not the binary.

Good catch, I switched to the sha3 of the binary but the results still won't match. I'm a bit confused on what you meant by # Takes a 20-byte binary address as input. Ethereum addresses are 160 bits..

For example:

  • address: 0xcd2a3d9f938e13cd947ec05abc7fe734df8dd826 (why cow? What joke did I miss?)
  • To binary: 1100110100101010001111011001111110010011100011100001001111001101100101000111111011000000010110101011110001111111111001110011010011011111100011011101100000100110
  • First 40 binary digits of the sha3 of the binary: 1110011101011010000000000110010011011010
  • That means it should start with three uppercase letters, followed by 2 lowercases, followed by 3 upper etc: 0xCD2a3D9F938E13Cd947ec05abC7fe734DF8DD826 not 0xCd2a3d9f938e13Cd947eC05ABC7fe734df8DD826 in your example.

I suppose I am misunderstanding what you are using as input..

PS: you can probably simplify your example by not checking for letters: you can do uppercase conversions on numbers and although there is such a thing as a lowercase digits they are represented the same

>>> from ethereum import utils
>>> base_addr = utils.privtoaddr(utils.sha3('cow'))
>>> base_addr
'\xcd*=\x9f\x93\x8e\x13\xcd\x94~\xc0Z\xbc\x7f\xe74\xdf\x8d\xd8&'
>>> utils.sha3(base_addr)
'\xa2\x86)\xe4\x18A\xcc^p(\x99"z\x10\xd8\xfd}\xeb\xed\x9c\xe8\x7fG\xa9]\xcc;\xed\xd9\xa8\xa4\xef'

By "binary" I meant "just the raw bytes, not any kind of encoded representation". There's also the special chars ¹²³⁴⁡⁢⁷⁸⁹⁰ I suppose, but that's not backwards-compatible anymore.

I initially like this quite a bit. All of the cons that I see are extreme edge cases and I think that it's pretty trivial for library authors to handle gracefully. I like the backwards compatibility, the compatibility with existing hex parsing utilities.

base_addr '\xcd*=\x9f\x93\x8e\x13\xcd\x94~\xc0Z\xbc\x7f\xe74\xdf\x8d\xd8&'

I'm not sure if the web3.js coverts to bytes. Also, pure javascript only supports binary conversion up to a hard limit, any larger and I had to use the BigNumber library. Wouldn't it be simpler just to use sha3(address)?

Wouldn't it be simpler just to use sha3(address)?

Mathematically speaking it would be a bit ugly imo.

I'm not sure if the web3.js coverts to bytes

Yeah, I had this problem; for one of my example gambling dapps where I was using a hash-commit-reveal protocol I took an existing sha3 impl; you could do the same: https://github.com/ethereum/dapp-bin/blob/master/serpent_gamble/scripts/sha3.min.js

I see some problems with ICAP's variable length and low checksum bitsize:

"XE7338O073KYGTWWZN0F2WZ0R8PX5ZPPZS": This is a 30 charaters address, IBAN compatible, based on the "Direct approach" from https://github.com/ethereum/wiki/wiki/ICAP:-Inter-exchange-Client-Address-Protocol

Now, If you enter such an address, and accidentally add another character somewhere, you have created a "Basic" (incompatible, but allowed and valid in ethereum ICAP implementation). The problem is that naively, without knowing all properties of the checksum algorithm, there is a 1% chance this will pass validation, and consequently you are sending money into a black hole.

On the topic of checksums in hex addresses:
0xCd2a3d9f938e13Cd947eC05ABC7fe734df8DD826

I agree that there should be some easy identification mechanism to separate it from an unchecked address. Alternatives might include:
XxCd2a3d9f938e13Cd947eC05ABC7fe734df8DD826
ExCd2a3d9f938e13Cd947eC05ABC7fe734df8DD826
#Cd2a3d9f938e13Cd947eC05ABC7fe734df8DD826

This makes it not completely backwards compatilble, but increadably easy to edit to satisfy a legacy system without any checksums.

@simenfd there should be some easy identification mechanism to separate it from an uncheck address.

I disagree. I think the whole point of this scheme is that it's completely backwards compatible. There's no point in separating them. In my implementation, if the address is all caps or all small caps then it assumes to be a unchecksummed address. In a 40 char address, there will be in average 15 letters, the chances of all of them being the same case is 1:16384 so I guess it's strong enough.

the chances of all of them being the same case is 1:16384

That was exactly my line of thinking as well. It's safe enough to assume that all caps or all lower addresses are not checksummed.

The backwards compatibility is nice but IMO presents a clear danger: If the user believes that the address has a checksum she might be willing to input an address by hand. If she then happens to use an old version of transaction handling that just parses the hex ignoring the case then her funds are lost in the case of a typo.

For this reason my feeling is that I prefer a scheme that would make a normal hex parser throw an error, rather than a user thinking she's protected by a checksum when in fact she is not.

@christianlundkvist that's a good point, which can be solved with UI: show red when it fails, show yellow when it's not checksummed.

@christianlundkvist Exactly my point: False security might be more dangerous than no security. E.g. when I enter a bitcoin address by hand (yeah, quite rarely), I am quite confident that the system will capture an error with the 32bit checksum that is universally implemented there; I wish I will get this confidence in ethereum as well.

For fun, I tried to make some ICAP addresses, using the functions in the go-ethereum implementation. The first two in bold are the original addresses, and the ICAP, the remaining are all ICAP mutation-addresses that validate, but of course, are different addresses.
0x11c5496aee77c1ba1f0854206a26dda82a81d6d8 == XE1222Q908LN1QBBU6XUQSO1OHWJIOS46OO

XE1222Q908LN1QBBU6XUQSO1OHWJIOS4603
XE1222Q908LN1QBBU6XUQSO1OHWJIOS4700
XE1222Q908LN1QBBU6XUQSO1OHWJIOS48IO
XE1222Q908LN1QBBU6XUQSO1OHWJIOS49FO
XE1222Q908LN1QBBU6XUQSO1OHWJIOS5AO5

@alexvandesande: My main point was that backwards compatibility allows you to use the address in a dapp that was created before this EIP. So the UI in this case wouldn't know anything about checksummed addresses and wouldn't give the user any specific warning. If the user receives an address like 0xCd2a3d9f938e13Cd947eC05ABC7fe734df8DD826 they would think "sweet, it's checksummed!" and type it by hand into an app which hasn't been updated, and lose Ether when they make a typo.

I'd like to challenge the idea that we should pay much attention to the "type it in by hand" use cases. If the ecosystem matures then we'll have good tooling around QR-code based transmission of addresses or something else that's even better UX.

If the user receives an address like 0xCd2a3d9f938e13Cd947eC05ABC7fe734df8DD826 they would think "sweet, it's checksummed!" and type it by hand into an app which hasn't been updated, and lose Ether

The only way to avoid this situation is to have checksummed addresses be backwards incompatible. I'm of the opinion that backwards incompatibility is worse than cases where someone burns ether using an app that doesn't implement checksumming using an address that "looks" like it's checksummed. I think this situation is likely to be rare and to largely apply to using old software from before the checksum days, or poorly written software.

@pipermerriam:

I'd like to challenge the idea that we should pay much attention to the "type it in by hand" use cases.

In that case do you think we should not worry about checksumming at all? Are there other scenarios where checksums are used?

The only way to avoid this situation is to have checksummed addresses be backwards incompatible.

I feel like this would be preferred.

I think this situation is likely to be rare and to largely apply to using old software from before the checksum days, or poorly written software.

My view is that the moment the checksum is introduced a majority of software becomes old software, and people are notoriously slow at updating too...

In that case do you think we should not worry about checksumming at all? Are there other scenarios where checksums are used?

My point was that I believe the type-by-hand use case is a small corner case where the user is potentially already doing something questionable. We can still apply checksums to these, but I am of the opinion that we don't need to cater to this use case.

As for the other stuff, I don't have very strong opinions on the matter. Backwards compatibility seems nice but I see the validity in the idea that a breaking change is also a way to achieve a level of security in the area since it removes ambiguity.

I don't believe we can expect any users to realize the difference between a check summed address and a normal one (most people don't realize this even for bank accounts when the last digit is separated like12345-7), this is not the point of the checksum.

The point of backwards of compatibility is that transactions between checksum enabled wallets are safer. If you make a typo in a non checksum enabled wallet you'll lose your ether, just like you do now, and it's that particular wallet's developer job to make that client more secure.

Also, I don't think copying by hand is the main situation here, if we were trying to optimize that then we should be talking about pseudo-word seeds and name registries. Checksums are just extra securities against accidental typos, letters that were cut out by copying the wrong digit and are an extra assurance to the user that the address is still intact, just like the icon is.

I don't really see any disadvantage of adding these are they were very simple to implement to web3.js

Although I still haven't matched the initial implementation, probably because basic primitives on Python are very different than what JavaScript comes up with. Since a lot of implementations will be JavaScript I still think it makes more sense to use the sha of the hex, since that's how it comes to the library..

On Feb 19, 2016, at 18:08, Piper Merriam notifications@github.com wrote:

In that case do you think we should not worry about checksumming at all? Are there other scenarios where checksums are used?

My point was that I believe the type-by-hand use case is a small corner case where the user is potentially already doing something questionable. We can still apply checksums to these, but I am of the opinion that we don't need to cater to this use case.

As for the other stuff, I don't have very strong opinions on the matter. Backwards compatibility seems nice but I see the validity in the idea that a breaking change is also a way to achieve a level of security in the area since it removes ambiguity.

β€”
Reply to this email directly or view it on GitHub.

I don't really feel very strongly either way TBH and the design of this particular checksum scheme is actually super cool. 😊
Thinking about my own interactions it's the need to always tell people to NEVER EVER type in an address by hand that gets annoying. But you are right @alexvandesande that as long as I update my own tools to use checksums I don't have to give people this advice anymore when advising them on using the tools that I build. 😊

Any reason we don't use good old base 58?

Jonathan: This would break backwards compatibility. We already have a proposed standard without backwards compatibility that adopts more characters it's called IBAN

Sent from my iPhone

On Feb 20, 2016, at 03:08, Jonathan Brown notifications@github.com wrote:

Any reason we don't use good old base 58?

β€”
Reply to this email directly or view it on GitHub.

Just chiming in as a web2 dev mostly being an observer (of your work and of end users discussions): If you look at the Ethereum subreddit these days there are a ton of new adopters with no tech experience at all trying to find out how to use Ethereum. In short, I believe anything including typing addresses by hand should be expected. I remember seeing twitter pinned tweets in 2014 with images (not text) of dogecoin addresses for charities etc. A lot of adopters may barely know their way around a computer at all, and I think if you accomplish retaining them you are a raging success and have what is needed for mass adoption.

Agree. And adding a case sensitive checksum increases security for those cases, while being invisible for implementations that don't support it

On Feb 20, 2016, at 12:26, Michael Kilday notifications@github.com wrote:

Just chiming in as a web2 dev mostly being an observer (of your work and of end users discussions): If you look at the Ethereum subreddit these days there are a ton of new adopters with no tech experience at all trying to find out how to use Ethereum. In short, I believe anything including typing addresses by hand should be expected. I remember seeing twitter pinned tweets in 2014 with images (not text) of dogecoin addresses for charities etc. A lot of adopters may barely know their way around a computer at all, and I think if you accomplish retaining them you are a raging success and have what is needed for mass adoption.

β€”
Reply to this email directly or view it on GitHub.

Jonathan: This would break backwards compatibility. We already have a proposed standard without backwards compatibility that adopts more characters it's called IBAN

The IBAN / ICAP proposal is pretty bad though for those of us who want to use any wallet w/ HD capabilities. If you want HD, you can't have compatible IBAN/ICAP addresses in which case, you might as well just pick something else. Base58-check encoding is a good compromise that's familiar and easy to implement.

Jonathan, this tread is not about IBAN. There are substantial enough roadblocks and criticism of IBAN that it may be hard to make it the new standard and these can be addresses elsewhere but suffice to say this is a good argument to have a backwards compatible checksum right now.

On Feb 20, 2016, at 20:52, JP Richardson notifications@github.com wrote:

Jonathan: This would break backwards compatibility. We already have a proposed standard without backwards compatibility that adopts more characters it's called IBAN

The IBAN / ICAP proposal is pretty bad though for those of us who want to use any wallet w/ HD capabilities. If you want HD, you can't have compatible IBAN/ICAP addresses in which case, you might as well just pick something else. Base58-check encoding is a good compromise that's familiar and easy to implement.

β€”
Reply to this email directly or view it on GitHub.

Using binary byte conversion on javascript would be a complicated burden and add a lot of unnecessary complexity on the code IMHO. Instead I was able to simplify the code a lot by simply using the sha3(address.toLowerCase()) and the checking each nth letter of the hash. If it's 9 or upper (including a-f) then it should be uppercase, otherwise, it should be lowercase.

// Make a checksum address
var toChecksumAddress = function (address) {    
    address = address.toLowerCase().replace('0x','');
    var addressHash = web3.sha3(address);
    var checksumAddress = '0x';

    for (var i = 0; i < address.length; i++ ) { 
        // If ith character is 9 to f then make it uppercase 
        if (parseInt(addressHash[i], 16) > 8) {
          checksumAddress += address[i].toUpperCase();
        } else {
            checksumAddress += address[i];
        }
    }
    return checksumAddress;
};

//Check if address is checksum
var isAddress = function (address) {
    if (!/^(0x)?[0-9a-f]{40}$/i.test(address)) {
        // check if it has the basic requirements of an address
        return false;
    } else if (/^(0x)?[0-9a-f]{40}$/.test(address) || /^(0x)?[0-9A-F]{40}$/.test(address)) {
        // If it's all small caps or all all caps, return true
        return true;
    } else {
        // Otherwise check each case
        address = address.replace('0x','');
        var addressHash = web3.sha3(address.toLowerCase());

        for (var i = 0; i < 40; i++ ) { 
            // the nth letter should be uppercase if the nth digit of casemap is 1
            if ((parseInt(addressHash[i], 16) > 8 && address[i].toUpperCase() != address[i]) || (parseInt(addressHash[i], 16) <= 8 && address[i].toLowerCase() != address[i])) {
                return false;
            }
        }
        return true;
    }
};

It's a very simple code, very low impact and I don't see a good reason not to push it on the next release..

@alexvandesande

If it's 9 or upper (including a-f) then it should be uppercase, otherwise, it should be lowercase.

Shouldn't this be "8 or upper" to make the split 8/8 instead of 9/7 for upper/lowercase? Was this intentional or an off-by one error or something else?

Also, can you post some example checksums so that I can test a python implementation against your implementation?

Also, can you post some example checksums so that I can test a python implementation against your implementation?

Yes, I think we're gonna do this as well, so test vectors would be appreciated!

here is a gist with a python implementation.

https://gist.github.com/pipermerriam/f7633dc2657b1292860c

It differs from @alexvandesande 's js implementation in using 0-7 to mean lowercase and 8-f to mean uppercase. This results in approximately 1:1100 addresses being either all uppercase or lowercase in their checksum format.

Using 0-8/9-f for uppercase/lowercase split changes the collision rate to about 1:700

@pipermerriam is completely right, I missed by one, should be 8 or higher.

  1. Lowercase and remove 0x from address
  2. sha3 address
  3. change nth letter of address according to this from the nth letter of the hash:
  • 0,1,2,3,4,5,6,7 β†’ Lowercase
  • 8, 9, a, b, c, d, e, f β†’ Uppercase

Here ares some examples:

  • 0x5aAeb6053F3E94C9b9A09f33669435E7Ef1BeAed
  • 0xfB6916095ca1df60bB79Ce92cE3Ea74c37c5d359
  • 0xdbF03B407c01E7cD3CBea99509d93f8DDDC8C6FB
  • 0xD1220A0cf47c7B9Be7A2E6BA89F429762e7b9aDb

Would love to see if it matches anyone's else implementation.

UPDATE: I was actually wrong in my math above. I forgot that the check bits are per-hex-character, not per-bit (facepalm). On average there will be 15 check bits per address, and the net probability that a randomly generated address if mistyped will accidentally pass a check is 0.0247%. This is a ~50x improvement over ICAP, but not as good as a 4-byte check code.

@avsa our code agrees on those checksums. Added 4 more test vectors for all upper and all lower checksums. Heres the updated list.

# All caps
0x52908400098527886E0F7030069857D2E4169EE7
0x8617E340B3D01FA5F11F306F4090FD50E238070D
# All Lower
0xde709f2102306220921060314715629080e2fb77
0x27b1fdb04752bbc536007a920d24acb045561c26
# Normal
0x5aAeb6053F3E94C9b9A09f33669435E7Ef1BeAed
0xfB6916095ca1df60bB79Ce92cE3Ea74c37c5d359
0xdbF03B407c01E7cD3CBea99509d93f8DDDC8C6FB
0xD1220A0cf47c7B9Be7A2E6BA89F429762e7b9aDb

Also, I had the idea of using the x in 0x... as another bit of checksum data by using either an upper or lowercase X/x. Thoughts? Worth the complexity?

@pipermerriam Tested all of them and our implementation matches. Consensus reached! Your 0x27b1fdb04752bbc536007a920d24acb045561c26 address has a super nice icon that looks like an old man with a long blue beard.

Not sure how I feel about using 0Xdeadbeef vs 0xdeadbeef as a part of it. I think it's more noticeable in some ways, not as invisible as the others.. No strong opinion on it yet though..

@pipermerriam My main concern with adding a checksum to the "0x" part would be for maximal compatibility reasons, since that seems to be the direction of this encoding. Personally, I'm happy without affecting the case of the "0x" prefix

@tgerring I had the same thought. I already do enough futzing to handle both 0x prefixed and non-prefixed addresses. This would add yet another case to handle.

The 0x is optional and shouldn't be case checked at all.
Though i personally lean to recommend people using the 0x prefix consistently as it makes a clear distinction between numbers (234567), strings (fad1) and HEX encoded data (0xfad1)

8go commented

Hi, I am a noob. So be tolerant with me.

I thought I could assist in a very tiny way and port the above "toChecksumAddress" JS function to Go and then place it into the Go common package and call it appropriately in account_manager.go.

I found the Sha3() function in crypto.go, but I am puzzled. When passed a 20-byte (40 hex-digit) public address crypto.Sha3() returns a 32-byte hash. That is a 64-digit hex representation. Something does not match here. There is something that I do not understand. Can someone point me in the right direction please. Thank you.

That sounds like the expected behavior. Now all you need to do is the capitalization of the alpha characters.

8go commented

OK, thanks, here is the working Go code. Running it prints "Passed 12 tests and failed 0 tests out of 12.". So far so good. Next I will learn git and place the appropriate change into the geth source code.

package main

import (
"fmt"
"strings"
"strconv"
"encoding/hex"
"github.com/ethereum/go-ethereum/crypto"
)

// for input read: #55
// modelled after the JavaScript function toChecksumAddress()
// Convert address into checksum address
func ChecksumAddress (address string) string {
address = strings.Replace(strings.ToLower(address),"0x","",1)
addressHash := hex.EncodeToString(crypto.Sha3([]byte(address)))
checksumAddress := "0x"
for i := 0; i < len(address); i++ {
// If ith character is 8 to f then make it uppercase
l, _ := strconv.ParseInt(string(addressHash[i]), 16, 16)
if (l > 7) {
checksumAddress += strings.ToUpper(string(address[i]))
} else {
checksumAddress += string(address[i])
}
}
return checksumAddress
}

func main() {
// 4 groups of tests:
// All caps
// All lower
// Normal = mixed
// Taken from myetherwallet.com
testcases := []string{
"0x52908400098527886E0F7030069857D2E4169EE7",
"0x8617E340B3D01FA5F11F306F4090FD50E238070D",

    "0xde709f2102306220921060314715629080e2fb77",
    "0x27b1fdb04752bbc536007a920d24acb045561c26",

    "0x5aAeb6053F3E94C9b9A09f33669435E7Ef1BeAed",
    "0xfB6916095ca1df60bB79Ce92cE3Ea74c37c5d359",
    "0xdbF03B407c01E7cD3CBea99509d93f8DDDC8C6FB",
    "0xD1220A0cf47c7B9Be7A2E6BA89F429762e7b9aDb",

    "0x5A4EAB120fB44eb6684E5e32785702FF45ea344D",
    "0x5be4BDC48CeF65dbCbCaD5218B1A7D37F58A0741",
    "0xa7dD84573f5ffF821baf2205745f768F8edCDD58",
    "0x027a49d11d118c0060746F1990273FcB8c2fC196",
}

failed, passed := 0, 0
for i := 0; i < len(testcases); i++ {
    if testcases[i] ==  ChecksumAddress(testcases[i]) {
        passed++
    } else {
        failed++
        fmt.Println("Failed address: \n\tExpected=", testcases[i], "\n\tReceived=", ChecksumAddress(testcases[i]))
    }
}
fmt.Printf("Passed %d tests and failed %d tests out of %d.\n", passed, failed, passed+failed)

}

8go commented

Sorry, that cut-and-paste did not work right, noob mistake, here is it correctly:

package main

import (
    "fmt"
    "strings"
    "strconv"
    "encoding/hex"
    "github.com/ethereum/go-ethereum/crypto"
)

// for input read: https://github.com/ethereum/EIPs/issues/55
// modelled after the JavaScript function toChecksumAddress()
// Convert address into checksum address 
func ChecksumAddress (address string) string {    
    address = strings.Replace(strings.ToLower(address),"0x","",1)
    addressHash := hex.EncodeToString(crypto.Sha3([]byte(address)))
    checksumAddress := "0x"
    for i := 0; i < len(address); i++ { 
        // If ith character is 8 to f then make it uppercase 
        l, _ := strconv.ParseInt(string(addressHash[i]), 16, 16)
        if (l > 7) {
            checksumAddress += strings.ToUpper(string(address[i]))
        } else {
            checksumAddress += string(address[i])
        }
        }
    return checksumAddress
}

func main() {
    // 4 groups of tests:
    // All caps
    // All lower
    // Normal = mixed
    // Taken from myetherwallet.com
    testcases := []string{
        "0x52908400098527886E0F7030069857D2E4169EE7",
        "0x8617E340B3D01FA5F11F306F4090FD50E238070D",

        "0xde709f2102306220921060314715629080e2fb77",
        "0x27b1fdb04752bbc536007a920d24acb045561c26",

        "0x5aAeb6053F3E94C9b9A09f33669435E7Ef1BeAed",
        "0xfB6916095ca1df60bB79Ce92cE3Ea74c37c5d359",
        "0xdbF03B407c01E7cD3CBea99509d93f8DDDC8C6FB",
        "0xD1220A0cf47c7B9Be7A2E6BA89F429762e7b9aDb",

        "0x5A4EAB120fB44eb6684E5e32785702FF45ea344D",
        "0x5be4BDC48CeF65dbCbCaD5218B1A7D37F58A0741",
        "0xa7dD84573f5ffF821baf2205745f768F8edCDD58",
        "0x027a49d11d118c0060746F1990273FcB8c2fC196",
    }

    failed, passed := 0, 0
    for i := 0; i < len(testcases); i++ {
        if testcases[i] ==  ChecksumAddress(testcases[i]) {
            passed++
        } else {
            failed++
            fmt.Println("Failed address: \n\tExpected=", testcases[i], "\n\tReceived=", ChecksumAddress(testcases[i]))
        }
    }
    fmt.Printf("Passed %d tests and failed %d tests out of %d.\n", passed, failed, passed+failed)
}

thanks @8go!

On Thu, Mar 10, 2016 at 9:03 AM, 8go notifications@github.com wrote:

OK, thanks, here is the working Go code. Running it prints "Passed 12
tests and failed 0 tests out of 12.". So far so good. Next I will learn git
and place the appropriate change into the geth source code.

package main

import (
"fmt"
"strings"
"strconv"
"encoding/hex"
"github.com/ethereum/go-ethereum/crypto"
)

// for input read: #55 #55
// modelled after the JavaScript function toChecksumAddress()
// Convert address into checksum address
func ChecksumAddress (address string) string {

address = strings.Replace(strings.ToLower(address),"0x","",1)
addressHash := hex.EncodeToString(crypto.Sha3([]byte(address)))
checksumAddress := "0x"
for i := 0; i < len(address); i++ {
// If ith character is 8 to f then make it uppercase
l, _ := strconv.ParseInt(string(addressHash[i]), 16, 16)
if (l > 7) {
checksumAddress += strings.ToUpper(string(address[i]))
} else {
checksumAddress += string(address[i])
}
}
return checksumAddress
}

func main() {
// 4 groups of tests:
// All caps
// All lower
// Normal = mixed
// Taken from myetherwallet.com
testcases := []string{
"0x52908400098527886E0F7030069857D2E4169EE7",
"0x8617E340B3D01FA5F11F306F4090FD50E238070D",

"0xde709f2102306220921060314715629080e2fb77",
"0x27b1fdb04752bbc536007a920d24acb045561c26",

"0x5aAeb6053F3E94C9b9A09f33669435E7Ef1BeAed",
"0xfB6916095ca1df60bB79Ce92cE3Ea74c37c5d359",
"0xdbF03B407c01E7cD3CBea99509d93f8DDDC8C6FB",
"0xD1220A0cf47c7B9Be7A2E6BA89F429762e7b9aDb",

"0x5A4EAB120fB44eb6684E5e32785702FF45ea344D",
"0x5be4BDC48CeF65dbCbCaD5218B1A7D37F58A0741",
"0xa7dD84573f5ffF821baf2205745f768F8edCDD58",
"0x027a49d11d118c0060746F1990273FcB8c2fC196",

}

failed, passed := 0, 0
for i := 0; i < len(testcases); i++ {
if testcases[i] == ChecksumAddress(testcases[i]) {
passed++
} else {
failed++
fmt.Println("Failed address: \n\tExpected=", testcases[i], "\n\tReceived=", ChecksumAddress(testcases[i]))
}
}
fmt.Printf("Passed %d tests and failed %d tests out of %d.\n", passed, failed, passed+failed)

}

β€”
Reply to this email directly or view it on GitHub
#55 (comment).

Alex Van de Sande
UX Designer

8go commented

I just created the Pull Request ethereum/go-ethereum#2323. Since it is the first time I did this in the ethereum project, it would be nice if someone could tell me if I have created the pull request correctly following the std dev procedure of the go-ethereum team (commit on develop branch, etc.). Thanks for any feedback.

Looks like somebody just successfully mistyped by hand a hex address wrong in Mist recently:

https://www.reddit.com/r/ethereum/comments/49x2be/lost_ether_any_ideas/

@tauteh1221 This was bound to happen. I am afraid we can se even larger errors in the future between ICAP Direct and ICAP Basic. The two formats differ by a single character, and there is a ~1% chance the checksum will say "go ahead".

@simenfd Agreed, I even predicted it further up in this thread:
#55 (comment)

The good news is there are surely non-technical folks using Ethereum now, just have to balance security and features with ease-of-use and failsafes going forward to keep them around.

Since this is implemented in several places now, and since the actual implementation doesn't match that described in the initial post, would it be possible to write this up as a proper EIP and submit it, so one doesn't have to read the whole bug thread to determine what's in actual use?

@Arachnid you're right

Maybe it's a little bit offtopic, but i've tested some SHA3 implementations for php (https://github.com/strawbrary/php-sha3 , https://github.com/0xbb/php-sha3 , https://notabug.org/desktopd/PHP-SHA3-Streamable) and other js-libraries (https://github.com/emn178/js-sha3 , https://github.com/Caligatio/jsSHA/releases/tag/v2.2.0).

Hashing example string: qwerty
Hash output variant / length: 224

All of them results in the following hashed value: 13783bdfa4a63b202d9aa1992eccdd68a9fa5e44539273d8c2b797cd
Comparing it to the output of the Crypto-JS SHA3 implementation the hashed value completely differs: d7a12ecec4442f1b31eea5f7d5470f0ca6169463e09d91a147c3b8e8

Someone mentioned this issue already at stackoverflow: http://stackoverflow.com/questions/36657354/cryptojs-sha3-and-php-sha3

So checksumAddresses only works "correctly" with Crypto-JS, with other libraries it's failing, because of calculating wrong uppercase and lowercase signs.

What you are seeing is SHA3-224 vs Keccak-224. Check for yourself at: https://emn178.github.io/online-tools/keccak_224.html

What you want is SHA-3, that is the "standard", and most compatible with other libraries.

Remember also that Ethereum is using Keccak, not SHA3.

Short summary because it seems that implementations have evolved and I chased the correct implementation.

janx commented

Python's implementation:

axic commented

@chevdor the main tree is at https://github.com/ethereumjs/ethereumjs-util/ and passes the tests listed in #55 (comment)

8-f is 50/50.

On Thu, Oct 20, 2016, 2:24 PM Chevdor notifications@github.com wrote:

Short summary because it seems that implementations have evolved and I
chased the correct implementation.

β€”
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#55 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAyTgkbzOVgKgo023hX_YeKWcyp_gR3Aks5q184MgaJpZM4HEtnF
.

Capitalising for >= 8 or >= a should be identical as 8 and 9 cannot be
capitalised anyway.

This isn't correct. You capitalize based on the digit in the sha3 of the
lowcased 40 character (20 byte) hexidecimal representation of the address.
The capitalization is done to the actual characters of the address itself
so there is a difference between >=8 and >=9. >=8 is the correct
implementation.

Another python implementation here:
https://github.com/pipermerriam/web3.py/blob/master/web3/utils/address.py#L45

On Fri, Oct 21, 2016 at 3:37 AM Alex Beregszaszi notifications@github.com
wrote:

@chevdor https://github.com/chevdor the main tree is at
https://github.com/ethereumjs/ethereumjs-util/ and passes the tests
listed in this EIP.

Capitalising for >= 8 or >= a should be identical as 8 and 9 cannot be
capitalised anyway.

β€”
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#55 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAyTgk486tOB8GMnuXw2RKBFQYpq30Alks5q2IftgaJpZM4HEtnF
.

@pipermerriam I think you are commenting old comments. I discussed with @axic and the topic is clear.
I do agree with your comment about >=8 not being the same than >=9 since it is based on the hash.

@chevdor not sure what happened there. I must have been looking at really old email notifications or something. πŸ˜„ carry on.. nothing to see here...

axic commented

@pipermerriam I've commented that without reading the implementation from months ago πŸ˜ƒ

recmo commented

Initially, @vbuterin suggested to capitalise whenever the hash character is a..f

No. The original proposal capitalizes the n-th hex-digit whenever the n-th bit in the hash of the address is set. So the first 40 bits of the 224 bit hash are used.

The current implementation modifies this by taking the hash of the lowercase hexadecimal encoding of the address and then it uses every fourth bit for capitalization (so 1st bit, 5th bit, etc.). The main reason for this extra complexity is that Javascript or it's libraries are bad at handling binary data, and this is somehow easier.

Here is @vbuterin original implementation updated with these changes. It passes @alexvandesande's test vectors:

from ethereum import utils

def checksum_encode2(addr): # Takes a 20-byte binary address as input
    o = ''
    v = utils.big_endian_to_int(utils.sha3(addr.hex()))
    for i, c in enumerate(addr.hex()):
        if c in '0123456789':
            o += c
        else:
            o += c.upper() if (v & (2**(255 - 4*i))) else c.lower()
    return '0x'+o

def test(addrstr):
    assert(addrstr == checksum_encode2(bytes.fromhex(addrstr[2:])))

test('0x5aAeb6053F3E94C9b9A09f33669435E7Ef1BeAed')
test('0xfB6916095ca1df60bB79Ce92cE3Ea74c37c5d359')
test('0xdbF03B407c01E7cD3CBea99509d93f8DDDC8C6FB')
test('0xD1220A0cf47c7B9Be7A2E6BA89F429762e7b9aDb')

Is there a valid, latest go implementation of this that you could recommend?

Could someone please finally specify the Hash algorithm used to hash the address and get the bits from?

There are at least 3 different hashes mentioned and even used in various imlementations.

My understanding is that the correct hash is supposed to be SHA3-256, but it seems some implementations are using SHA3-224 and others use Keccak-256 and Keccak-224

I am curious what java implementation of this is ?

@almindor

You'll find the correct specification and example implementations at the file here: https://github.com/ethereum/EIPs/blob/master/EIPS/eip-55.md. The file also includes an adoption table to help track the adoption of EIP-55 checksums in the ecosystem.

We're going to close this issue now. If any corrections need to be made (or to update the adoption table), please open a PR on the file.

You should edit the example code and test vectors in the first post. It is wrong and someone who does not read the whole conversation will use the incorrect implementation.

This EIP is now located at https://github.com/ethereum/EIPs/blob/master/EIPS/eip-55.md. Please go there for the correct specification. The text in this issue may be incorrect or outdated, and is not maintained.

axic commented

@cdetrio can you push the "official test suite" into the EIP?

I believe it is this one: #55 (comment)

voron commented

Current python3 eth-utils implementation

python3 -c "from eth_utils import address; import sys; print(address.to_checksum_address(sys.argv[1]));" 0x5aaeb6053f3e94c9b9a09f33669435e7ef1beaed

Output is

0x5aAeb6053F3E94C9b9A09f33669435E7Ef1BeAed

Thanks

EDITOR UPDATE (2017-08-24): This EIP is now located at https://eips.ethereum.org/EIPS/eip-55. Please go there for the correct specification. The text below may be incorrect or outdated, and is not maintained.

Code:

def checksum_encode(addr): # Takes a 20-byte binary address as input
    o = ''
    v = utils.big_endian_to_int(utils.sha3(addr))
    for i, c in enumerate(addr.encode('hex')):
        if c in '0123456789':
            o += c
        else:
            o += c.upper() if (v & (2**(255 - i))) else c.lower()
    return '0x'+o

In English, convert the address to hex, but if the ith digit is a letter (ie. it's one of abcdef) print it in uppercase if the ith bit of the hash of the address (in binary form) is 1 otherwise print it in lowercase.

Benefits:

  • Backwards compatible with many hex parsers that accept mixed case, allowing it to be easily introduced over time
  • Keeps the length at 40 characters
  • The average address will have 60 check bits, and less than 1 in 1 million addresses will have less than 32 check bits; this is stronger performance than nearly all other check schemes. Note that the very tiny chance that a given address will have very few check bits is dwarfed by the chance in any scheme that a bad address will randomly pass a check

UPDATE: I was actually wrong in my math above. I forgot that the check bits are per-hex-character, not per-bit (facepalm). On average there will be 15 check bits per address, and the net probability that a randomly generated address if mistyped will accidentally pass a check is 0.0247%. This is a ~50x improvement over ICAP, but not as good as a 4-byte check code.

Examples:

  • 0xCd2a3d9f938e13Cd947eC05ABC7fe734df8DD826 (the "cow" address)
  • 0x9Ca0e998dF92c5351cEcbBb6Dba82Ac2266f7e0C
  • 0xcB16D0E54450Cdd2368476E762B09D147972b637