Implement new format
hgmnz opened this issue · 24 comments
All fields are fixed length except for the message itself, so separators are superfluous if fixed-length fields are at the beginning of the token, which now looks like:
base64_encode(HMAC (32 bytes) + UNIX timestamp (8 bytes) + Random IV (16 bytes) + encrypted_message)
@ryandotsmith it is. With a test suite: https://gist.github.com/kr/9b2e828244013e7d308a
@kr, in go-fernet, did you handle the case where the user does not encrypt the payload?
@tmaher, is there a real disadvantage in dropping unencrypted payload support in fernet?
As it stands, go-fernet appears to enforce encryption, whereas original rb-fernet allowed for encryption optional.
The problem is, as we move toward the new format as described in this issue, we are assuming fixed length fields in the fernet token. Without encryption, there's no IV, so this system breaks.
We could modify the token format once more, to support encryption or not, maybe via the first bit alone, but before we go there, does it make sense to sign but not encrypt messages? There's a performance implication to encrypting, so it's possible a user decides to not encrypt at all (as in the original fernet). Any thoughts?
@hgmnz you're right, I didn't support unencrypted payload.
It seemed unnecessary and as you note would've complicated
the message format and the implementation.
I wouldn't worry at all about the performance impact of encryption.
Generating a fernet token (encrypted, of course) on my laptop takes
11us (that's over 90,000 gen/s on a single core).
If we decide we still need this feature I'm happy to add it to go-fernet.
@kr, cool.
If we decide we still need this feature I'm happy to add it to go-fernet.
Well, it's more about the fact that if we do add it, we need to think through the actual format.
I'm going to assume encryption is not optional for the time being, as I migrate to the new format. @tmaher, feel free to object if need be, but seems to be the sensible way forward.
Another potential problem with this new format is the fact that these fields are not base64 encoded and as such are not URL safe. Any real concern with this? So far these tokens could have been used as GET params and this change would break that ability.
The whole token is base 64 encoded.
The whole token is base 64 encoded.
yay.
Encrypting everything sounds awesome. It's one less decision for the developer to make, and you don't have to play the guessing game of whether it's safe to leave in plaintext or not.
Base64 itself isn't URL-safe. If necessary, URL-safety can be achieved by using a base64 variant here: http://tools.ietf.org/html/rfc4648#section-5 . The equals sign used for padding is generally omitted. Ruby supports this with Base64.urlsafe_encode64
and .urlsafe_decode64
. The transliteration should be pretty trivial to add to Go. More broadly, I'd discourage the use of tokens in GETs - they should either go in POST body or in an Authorization header. I can imagine it being useful in a few situations, though, so I can deal with making it easier for GETs.
I think we should consider a protocol version number. I can imagine changing to from CBC+HMAC to Galois Counter Mode (aka "GCM", added to openssl in v1.0.1, Cedar has 0.9.8), or going to AES-256, or adding PKI, or adding some metadata later on to help with key rotation. Any of those would necessitate format changes, as well as a major version bump for the gem.
The format would then be...
VERSION(1 byte) + HMAC (32b) + TIMESTAMP (8b) + IV (16b) + ciphertext
The version number needs to go first first, since the size of HMAC could change, but the version number is also something the signature should be computed over. HMAC computation should include the version number. So like hmac = hmac-sha256(signing_key, version + timestamp + iv + ciphertext)
. If we get near version=255, we can just declare that in version=255, you have to look at the next N bytes to determine the real version number.
Haha, actually go-fernet already uses the URL-safe variant of base 64.
I should've documented that, sorry.
A version number makes a lot of sense. Will the key also need to
indicate what version it's meant for?
go-fernet already uses the URL-safe variant of base 64
this has also always used that variant of base 64, so glad that's there.
Will the key also need to indicate what version it's meant for?
I don't think the signing key needs that. It would then require you to regenerate a new key when bumping versions of fernet, but I can't think of why that'd be important at the moment.
@kr, I think we should start with version 2
, where version 1
was the first, somewhat broken version of the format. Sound good?
On further consideration and re-reading @fdr 's favorite file header at http://www.libpng.org/pub/png/spec/1.2/PNG-Structure.html , I'm going to recommend starting with 0x80 . That byte is a parse error in both ASCII (high bit is set) and UTF-8. It stops being a parse error in UTF-8 at 0xC0, which gives us 64 versions. It may or may not be a parse error in UTF-16, depending on what the first byte of the HMAC is. I appreciate PNG's arguments for using extensible feature-adding rather than a version number, but they're not as applicable here, and way harder to foresee in crypto-land.
If this feels "too clever", I'm fine with version=0x02.
@tmaher I like 0x80 and if we ever get to 0xBF (which would surprise me)
we can start adding more bytes.
@hgmnz it's not unlikely you'd have to regenerate the key anyway when
bumping versions of fernet, since future versions might require a different
size key. Even if the key is the same size, the algorithm might be different
and it feels weird to me to use the same key in two different crypto schemes.
But I don't have any concrete reason to say why that would necessarily be
wrong.
With more than one key, what sould the verify api look like?
List of keys to try? Table mapping version numbers to keys?
I think we should only support multiple keys for the same token version via as you say, a list of secrets/keys.
for example
Fernet.verifier([key1, key2], token)
and fernet tries them both. This is useful when rolling credentials in general, keys being part of that.
I agree with just using 0x80
. I'm creating a new issue for including version on the token format.
Closing this issue, new format implementation is on master.
Depending on the specifics, rotating keys may or may not be necessary. If we go to AES128-GCM, or make any currently-foreseeable change to the MAC, e.g., ditch HMAC-SHA256 for SHA-3-anything, there's no cryptographic issue using the same keys as the old versions. Switching to AES256, regardless of block mode, would require a re-key
@hgmnz I think I want to re-open the format discussion, or spawn another issue. As the recent non-ASCII config var corruption on core showed us, Fernet doesn't preserve the String#encoding metadata. I see two options for the new version:
- Have the encryptor see if the plaintext has an encoding other than ASCII_8BIT. If so, it can be transcoded into UTF-8. On decryption, see if the plaintext passes through the UTF-8 parser without throwing an exception. If so, it must be a string, so mark it as a UTF-8 string. If it can't, assume it must be ASCII_8BIT and use that instead. This will handle all the common cases, but if someone wants to send actual real binary data through Fernet that just-so-happens to be a valid sequence of UTF-8 bytes, then the data object handed back to the user will be inappropriately marked as a String with UTF-8 encoding. That feels really ugly.
- Change the format again and create an encoding field.
VERSION(1 byte) + HMAC (32b) + ENCODING (1b) + TIMESTAMP (8b) + IV (16b) + ciphertext
2 bytes for encoding would be reasonable as well. We'd need an agreed-upon numbering of various encodings (ASCII => 1, UTF-8 => 2, ISO-8859-Latin-2 => 3, ....).
@kr I have approximately zero idea how strings work in Go, and how encoding is represented. However, the situation in Ruby is we're currently losing metadata on the round-trip. This really should be round-trip safe for non-English text, without requiring the developer to do anything special.
http://www.iana.org/assignments/character-sets/character-sets.xml
IANA to the rescue! We can fit everything (including the wack-ass EBCDIC variants) in two bytes, as-is. If we drop a few of the more obscure IBM sets and renumber everything that's currently above 119 on that list, we could fit it in one byte.
I think metadata belongs in the payload. Fernet should not have to care about it at all,
except maybe to provide it as a convenience to users on top of the basic function.
It should not be baked into the format.
To be concrete, here's a strawman api to illustrate:
tok = Fernet.generate_raw("some bytes") # payload is "some bytes" unaltered
tok = Fernet.generate_text("Hello, 世界") # calls generate_raw with encoding_id+text
Another question, does the original encoding actually matter?
Or do we just want characters to get through safely? We could
encourage users to always converting text input to UTF-8
before generating the token (and have the fernet library raise
an error if its input isn't either binary or utf8), then the verifier
only has know whether to expect text or binary data, which it
presumably already knows. In neither case does fernet have
to alter the data it's given.
Also known as the "FFS, JUST USE UTF8" approach.