shadowsocks/shadowsocks-org

SIP004 - Support for AEADs implemented by large libraries

Mygod opened this issue · 280 comments

Mygod commented

Use AEADs to replace stream cipher + OTA. Previous discussion: #29.

Proposed AEAD algorithms:

  • ChaCha20-Poly1305 (see also: xSocks)
  • XChaCha20-Poly1305
  • Salsa20-Poly1305
  • AES-256-GCM (faster but not low-end-box-friendly)

Update: The following shows an example of TCP stream in chacha20-ietf-poly1305 mode (original idea by @breakwa11 and @Noisyfox). Other AEAD should follow the similar format.

Cipher: chacha20-ietf-poly1305

TCP request (after encryption, *ciphertext*)
+--------+----------------+--------------+--------------+---------------+
| NONCE  | PayloadLen_TAG | *PayloadLen* | Payload_TAG  |   *Payload*   |
+--------+----------------+--------------+--------------+---------------+
|  12    |       16       |       2      |     16       |    Variable   |
+--------+----------------+--------------+--------------+---------------+

TCP Chunk (after encryption, *ciphertext*)
+--------------+------------+-----------+----------+
| DATA_LEN_TAG | *DATA_LEN* |  DATA_TAG |  *DATA*  |
+--------------+------------+-----------+----------+
|      16      |     2      |     16    | Variable |
+--------------+------------+-----------+----------+

My plan is forking ourselves, dropping all the support of stream ciphers and OTA.

After that the we can provide a new version of the protocol (v2?).

I want to clarify AES-GCM is the fastest on machines with particular instruction sets, for example, server and client are both running on modern Intel CPU.

I agree with #30 (comment)

Mygod commented

I don't see the reason to drop compatibility for stream ciphers for now as @madeye has said:

However, I don't think any known attacks to shadowsocks protocol is realistic. And please don't forget that most of the traffic over shadowsocks is TLS...

People who has higher security standards could migrate to using these algorithms. And if benchmark shows its influence to performance is negligible we should start to encourage everyone to migrate to AEADs.

And I don't think adding these additional algorithms would be a lot of work (we need at most 1 more abstraction layer).

They can coexist as different projects, it is easy to maintain IMO.

Mygod commented

Considering SIP003, I think it's easier to maintain if implemented in one single project.

To support AEADs, first we need to define chunks for the TCP stream. Here is a proposal:

+----------+----------+----------+----------+
| DATA.LEN |   NONCE  |    TAG   |   DATA   | ...
+----------+----------+----------+----------+
|     2    |    8     |    16    | Variable | ...
+----------+----------+----------+----------| 

Assuming average data length equals to a typical MTU - 26 (1492 - 26 = 1466), then the maximum transfer efficiency is 1466 / 1492 = 98%.

So which part of the chunk is encrypted? DATA? Or the whole chunk? Or both of them?

@Noisyfox DATA.LEN, NONCE and TAG should not be encrypted. Only DATA is encrypted. TAG is generated from encrypted DATA. For more details: https://github.com/lparam/xSocks/blob/master/src/crypto.c#L25

That's basically the pattern of TLS record if I did understand correctly.

@Noisyfox Yes, exactly.

Mygod commented

Is DATA.LEN necessary?

@wongsyrone I don't think we need AEAD actually?

@Mygod Yes, I think so. It's required to split TCP stream to chunks.

AE:

  • Encrypts a message with a key and a nonce to keep it confidential
  • Computes an authentication tag. This tag is used to make sure that the message hasn't been tampered with before decrypting it.

AEAD:

  • Encrypts a message with a key and a nonce to keep it confidential
  • Computes an authentication tag. This tag is used to make sure that the message, as well as optional, non-confidential (non-encrypted) data, haven't been tampered with.

AEAD can protect plain text as well.

Mygod commented

@madeye What about letting it equal to packet length - 24?

@wongsyrone I don't think we have non-confidential data to protect however.

@Mygod In a TCP stream, there is not packet length...

Mygod commented

Oh sorry I got confused because you mentioned MTU. 😅

MTU here is only to demonstrate how large a chunk could be in the real world... In our implementation, we should also limit the DATA.LEN, make it smaller than the socket buffer size, e.g. 2048 in shadowsocks-libev.

NONCE is too short.

Note: the currently included AEAD cipher based on the draft only uses an eight byte nonce, which is too short to be safely generated randomly. A later version of libsodium will include the updated version with a longer nonce.

https://crypto.stackexchange.com/questions/29311/how-do-i-tack-on-additional-data-to-libsodiums-public-key-authenticated-encry

Then what about this:

+----------+----------+----------+----------+----------|
| DATA.LEN |   SEQ    |   NONCE  |    TAG   |   DATA   | ...
+----------+----------+----------+----------+----------+
|     2    |    8     |    8     |    16    | Variable | ...
+----------+----------+----------+----------+----------+

SEQ is a 64-bit sequence number. The real NONCE is SEQ+ NONCE. DATA.LEN + SEQ is also the AD of the AEAD.

(Why shouldn't I use TLS directly? 😅 )

[draft] main idea from @breakwa11 and @Noisyfox
The purpose of Length tag is to keep length as-is and not tempered by the third party since this is the key to separate general ciphertext and auth tag.

Please note that this is not compatible with current OTA and original protocol.

/*
 * The main difference between OTA designed by madeye and this one is MtE vs EtM.
 *
 * The first nonce is either from client or server side, it is generated via randombytes_buf()
 * in libsodium, and the sequent nonces are incremented via sodium_increment() in libsodium.
 * Please note that sodium_increment() considers the number to be encoded in little-endian format.
 * IMPORTANT: nonce should be used only once, let us running on the right track.
 *
 * Data.Len is used to separate general ciphertext and Auth tag. We can start decryption
 * if and only if the verification is passed. 
 * Firstly, we do length check, then decrypt it, separate ciphertext and attached data tag
 * based on the verified length, verify data tag and decrypt the corresponding data.
 * Finally, do what you supposed to do, e.g. forward user data.
 *
 * For UDP, nonces are generated randomly without the incrementation.
 *
 * TCP request (before encryption)
 * +------+---------------------+------------------+
 * | ATYP | Destination Address | Destination Port |
 * +------+---------------------+------------------+
 * |  1   |       Variable      |         2        |
 * +------+---------------------+------------------+
 *
 * TCP request (after encryption)
 * +--------+-----------+---------------+----------+---------------+
 * | NONCE  |PayloadLen |PayloadLen_TAG | Payload  |  Payload_TAG  |
 * +--------+-----------+---------------+----------+---------------+
 * | Fixed  |     2     |     Fixed     | Variable |     Fixed     |
 * +--------+-----------+---------------+----------+---------------+
 * 
 * Payload input: atyp + dst.addr + dst.port
 * PayloadLen is length(atyp + dst.addr + dst.port)
 * Payload_TAG and PayloadLen_TAG are in plaintext
 *
 * TCP Chunk (before encryption)
 * +----------+
 * |  DATA    |
 * +----------+
 * | Variable |
 * +----------+
 *
 * Data.Len is a 16-bit big-endian integer indicating the length of DATA.
 *
 * TCP Chunk (after encryption)
 * +------------+---------+-----------+----------+
 * | Data.Len*  | Len_TAG |  DATA_TAG |  DATA*   |
 * +------------+---------+-----------+----------+
 * |      2     | Fixed   |  Fixed    | Variable |
 * +------------+---------+-----------+----------+
 *
 * Len_TAG and DATA_TAG have the same length, they are in plaintext.
 * After encryption, DATA -> DATA*
 *
 * UDP (before encryption)
 * +------+---------------------+------------------+----------+
 * | ATYP | Destination Address | Destination Port |   DATA   |
 * +------+---------------------+------------------+----------+
 * |  1   |       Variable      |         2        | Variable |
 * +------+---------------------+------------------+----------+
 *
 * UDP (after encryption)
 * +--------+----------+-----------+
 * | NONCE  |  DATA*   |  DATA_TAG |
 * +--------+----------+-----------+
 * | Fixed  | Variable |  Fixed    |
 * +--------+----------+-----------+
 *
 * DATA* is Encrypt(atyp + dst.addr + dst.port + DATA)
 * RSV and FRAG are dropped
 * Since UDP packet is either received completely or missed,
 * we don't have to keep a field to track its length.
 *
 */

I don't see the point of "length-tag". AEAD support additional data to be authenticated as plaintext. You can use length as the "additional data" for temper proofing.

@v2ray Agreed. The DATA.LEN can be protected by AEAD as plaintext.

Mygod commented

+1. But this reminds me that we need random noise indistinguishability more than authentication (i.e. IND-CCA2) since the original purpose of this repo is to circumvent GFW and it's much harder to perform a realistic CCA2 distinction. Let's review our additions against random noise indistinguishability:

  1. TAG: Check. This should be guaranteed by the security of the hash algorithm we used.
  2. DATA.LEN: Oh shit this is not random at all. Adversary can distinguish them very easily.
  3. NONCE: No problem as long as they are randomly generated. (which isn't the case in TCP)
  4. SEQ: Terrible idea. It makes distinction much more easier.

OTA also has this problem. So what do you think?

Mygod commented

removes some of my useless comments

Okay I think I got this sorted out. What we need to do is just to put data.len in the encryption data; send nonce ONLY in the first data and it auto increments for EVERY data. And voilà!

v3aqb commented
Mygod commented

@v3aqb

better encrypted

I beg to differ. I took a quick glance and you implemented what shadowsocks was designed to prevent.

@Mygod Then the only difference from our current OTA is that the new proposal is Encrypt-Then-MAC... 😄

This is just an example of improving content security (aka CCA-proof), while it undermines protocol security (aka censorship-bypassing-detection-proof).

Before any progress, we should examine it carefully and make the right choice.

Mygod commented

@madeye No. There are two significant differences:

  1. Data.len is transmitted in ciphertext instead of plain text. This implies that it's easy for GFW to detect shadowsocks connections with OTA enabled.
  2. AE algorithms are chosen and implemented by large library and tested/attempted attack by more people.

@wongsyrone Yes. We need to know which one is more important in this context. I think my approach should be safest afaic.

@Mygod if data.len is encrypted, how to figure out the length of chunk for decryption (in the context of AEAD)?

The noise data is another approach to obfuscating packet length pattern, which may help.

Without data.len authentication, you cannot assume it hasn't been touched. That's why we decide to add a tag for it.

Encryption never implies authentication.

In fact, the AD part is never used in the proposal; I planned to make it work like another pre-shared info and verify it before decryption. It looks like a kind of overkill.

@v2ray You get the point man.
That's why we encrypt length as a 'chunk' with it's own tag, which means we could first verify and decrypt the length which has a fixed size and then read the remain data according to the verified length.

Mygod commented

@v2ray Just decrypt the first 2 bytes for stream cipher or first block for block cipher. I don't see a problem.

@wongsyrone All encryption I mentioned are authenticated encryption. data.len will be included when calculating hash as well.

@Mygod If some one increase the length by one byte, then the server will wait for one more byte instead of close the connection immediately.

Well you see, AEAD's main point is never decrypt any data before the cipher text is been verified.

Mygod commented

@Noisyfox Good point. So maybe we should authenticate data.len and data separately instead? Like TAG, encrypted_data.len, TAG, encrypted_data?

Mygod commented

@Noisyfox Hmm on second thought actually that may not be a big issue. If a man-in-the-middle removes the last byte from the original stream, the server is also going to wait for that last byte to arrive instead of closing the connection immediately. A malformed data.len isn't a big issue IMO. What do you think?

The proposal is supposed to encrypt length and corresponding data separately.
The length verification laies on first class. The second is data verification.

@Mygod But if modify one byte in cipher text will lead to server's blocking, then the attacker could find out that one specific byte means length. Well I think it's better to hide our protocol too.

Mygod commented

@wongsyrone Oh you're right. I didn't see that there was an asteroid after data.len which means encrypted. Sorry about that y'all. 😛

@Noisyfox Yes. That's a better attack! Now I'm convinced that we should take your proposal.

There is a simpler way for 'encrypting' data.len. Only the first 'data.len' matters. If the attacker can't figure out where is (or what is) the first 'data.len', he/she can't crack the following data.len as well. To simplify the design (and also increase traffic throughput), you can just encrypt the data.len for the first chunk of payload.

Mygod commented

You ignored timing. TCP stream can pause for some time and then resumed. Also we should never put plain text in the traffic unless it's indistinguishable with random noise.

Great! I think we're very close to the new protocol.

@Mygod Could you add @Noisyfox's design to your proposal?

Updated draft to describe the procedure.

Mygod commented

Maybe put tag before encrypted data.len since "never decrypt any data before the cipher text is been verified"?

In general, there are three parts after encryption:

  1. Nonce
  2. data.len and its tag
  3. data and its tag

The sequence of tag and ciphertext in each part doesn't matter since tags are sharing the same length.
The difference is how to compute the offset.

Mygod commented

I meant TAG, data.len, TAG, data. It just looks nicer.

I see, no objection from my side.

Just come up with another idea, How about dropping data tag, use only

data.len_TAG, data.len, data

Update: Okay, you can ignore this comment.

Mygod commented

Then this is almost no different with the current protocol without OTA.

At least, it ensures datalen is not compromised. It's just a hint.

Another problem is about data.len_TAG. Should we use the full length of AEAD for two bytes of data.len?

Mygod commented

@wongsyrone Also your approach is also vulnerable to the detect method proposed by @breakwa11 as long as you are using stream cipher.

@madeye 2 bytes may be too short. Only 65,536 combinations for attacker to try.

The sad truth is crypto library expects a full-length tag.

Mygod commented

...and attaching full length tag isn't a big overhead IMO.

Another problem is key derivation, I proposed to use crypto_generichash() in libsodium, which uses Blake2b internally. Or some real password hashing algorithms like Argon2 in libsodium.

2 bytes may be too short. Only 65,536 combinations for attacker to try.

When computing the tag, we can add user input as AD. No matter how many times you tried, you don't know AD; then it will result in auth failure.
After auth failure, close connection instantly or add them to a list with a random timeout, close conn when timeout exceeds.

Mygod commented

@wongsyrone Good points except I don't see the point of using AD. I assume your user input is some kind of pre-shared message. In that case we already have a secret key.

Yes, it is pre-shared message work like password we already have, we can reuse it or add a similar entry.

Mygod commented

I still don't think it's necessary.

@wongsyrone As we don't have any plaintext now (except the nonce and tag), I think there is no need to use AD here.

I'd like to summarize the ideas above. Now, we have a proposal like this:

Cipher: chacha20-poly1305

TCP request (after encryption, *ciphertext*)
+--------+----------------+--------------+--------------+---------------+
| NONCE  | PayloadLen_TAG | *PayloadLen* | Payload_TAG  |   *Payload*   |
+--------+----------------+--------------+--------------+---------------+
|  12    |       32       |       2      |     32       |    Variable   |
+--------+----------------+--------------+--------------+---------------+

TCP Chunk (after encryption, *ciphertext*)
+--------------+------------+-----------+----------+
| DATA_LEN_TAG | *DATA_LEN* |  DATA_TAG |  *DATA*  |
+--------------+------------+-----------+----------+
|      32      |     2      |     32    | Variable |
+--------------+------------+-----------+----------+

It should be:

Cipher: chacha20-ietf-poly1305

TCP request (after encryption, *ciphertext*)
+--------+----------------+--------------+--------------+---------------+
| NONCE  | PayloadLen_TAG | *PayloadLen* | Payload_TAG  |   *Payload*   |
+--------+----------------+--------------+--------------+---------------+
|  12    |       16       |       2      |     16       |    Variable   |
+--------+----------------+--------------+--------------+---------------+

TCP Chunk (after encryption, *ciphertext*)
+--------------+------------+-----------+----------+
| DATA_LEN_TAG | *DATA_LEN* |  DATA_TAG |  *DATA*  |
+--------------+------------+-----------+----------+
|      16      |     2      |     16    | Variable |
+--------------+------------+-----------+----------+

Non-ietf variant, change Nonce from 12 to 8.
For xchacha20-ietf-poly1305, change Nonce from 12 to 24.

Non-IETF versions of crypto_aead_xchacha20poly1305 is already removed: jedisct1/libsodium@1686da3

BTW, xchacha20-ietf-poly1305 is not released yet.

For shadowsocks-libev:

#ifdef crypto_aead_xchacha20poly1305_ietf_ABYTES
#define HAVE_XCHACHA20IETF
#endif

static const char *supported_ciphers[CIPHER_NUM] = {
    "aes-128-gcm",
    "aes-192-gcm",
    "aes-256-gcm",
    "chacha20-poly1305",
    "chacha20-poly1305-ietf",
#ifdef HAVE_XCHACHA20IETF
    "xchacha20-poly1305-ietf"
#endif
};

/*
 * use mbed TLS cipher wrapper to unify handling
 */
#ifdef USE_CRYPTO_MBEDTLS
static const char *supported_ciphers_mbedtls[CIPHER_NUM] = {
    "AES-128-GCM",
    "AES-192-GCM",
    "AES-256-GCM",
    CIPHER_UNSUPPORTED,
    CIPHER_UNSUPPORTED,
#ifdef HAVE_XCHACHA20IETF
    CIPHER_UNSUPPORTED
#endif
};
#endif

static const int supported_ciphers_nonce_size[CIPHER_NUM] = {
    12, 12, 12, 8, 12,
#ifdef HAVE_XCHACHA20IETF
    24
#endif
};

static const int supported_ciphers_key_size[CIPHER_NUM] = {
    16, 24, 32, 32, 32,
#ifdef HAVE_XCHACHA20IETF
    32
#endif
};

static const int supported_ciphers_tag_size[CIPHER_NUM] = {
    16, 16, 16, 16, 16,
#ifdef HAVE_XCHACHA20IETF
    16
#endif
};
Mygod commented

@madeye I think plugins are basically settled for now (unless one's starting to port PT to our codebase). Let's start implementing this because I think it's pretty urgent. A man-in-the-middle should be able to easily identify OTA-enabled trafficc since there is only 255 possibilities to try. I was trying to write a plugin based on simple-obfs to serve as a emulated man-in-the-middle to illustrate my attack but I haven't finished debugging it yet. After implementing this we should deprecate OTA.

@Mygod Could you update the final frame design in your proposal? I'm OK with the design of @wongsyrone. So let's just follow it #30 (comment).

@Mygod where does the number 255 come from? The address type? If so, are you saying the new GCM method also applies to the header part?

Mygod commented

@madeye Done. (btw I believe you can edit my comment directly)

@v2ray Address length varies from 2 bytes to 256 bytes inclusive. The most obvious problem of the old approach is that its data.len is transmitted in plaintext.

You'd better mention my name if you choose to follow this design which idea was came from me actually

@breakwa11 Yep, I think @Mygod has already added your name to his proposal. #30 (comment)

Please reverse the order of TAG and PAYLOAD (so it's PAYLOAD then TAG), as it's easier to use the normal combined mode AEAD encrypt/decrypt functions instead of the detached mode (e.g. in libsodium). Similarly, Go's standard library has a fixed AEAD signature assuming the normal combined mode order (https://golang.org/pkg/crypto/cipher/#AEAD).

Since the whole point of using proven AEAD is to prevent developers from screwing up crypto, it is preferable to follow conventions to reduce complexity.

Mygod commented

LGTM. Although I don't think the order here really matters, golang indeed implements GCM in that way: https://golang.org/src/crypto/cipher/gcm.go#147

It doesn't matter.

Updated nonce incrementation:

Please note that sodium_increment() considers the number to be encoded in little-endian format.

For key derivation, libsodium's crypto_generichash is marginally better than MD5/SHA1.

It's safer to use proven password hashes such as PBKDF2. Argon2 is fine, but I worry about lack of mature library support due to its novelty. Additionally, we have to choose the right parameters (number of iterations, salt, etc) for password hashes since most of them require knobs…

Here is the experimental branch for SIP004 https://github.com/shadowsocks/shadowsocks-libev/tree/sip004

Please help to run tests and refine the document by adding comments here.

The password hashing looks overkill, I prefer to generic hashing like blake2b.

@wongsyrone Let's keep using previous key driviation (OpenSSL way). Not every shadowsocks port is able to use blake2b hashing.

I'd suggest at least upgrade to PBKDF2 for key derivation.

Just wondering which port? According to its official website, there are many implementations available, and many libsodium binding wrappers.

@wongsyrone qt5, golang, nodejs, chromeapp, rust. Also, if we are really care about password hashing, I prefer argon2 here, as it's a memory-hard approach.

I think we should get rid of MD5 at least. Argon2 is good alternative as well.

If we go for Argon2, we need to choose which variant of Argon2 as there are three: Argon2i, Argon2d, and Argon2id. According to https://github.com/P-H-C/phc-winner-argon2, Argon2i seems to be a better fit in our case

Argon2 has three variants: Argon2i, Argon2d, and Argon2id. Argon2d is faster and uses data-depending memory access, which makes it highly resistant against GPU cracking attacks and suitable for applications with no threats from side-channel timing attacks (eg. cryptocurrencies). Argon2i instead uses data-independent memory access, which is preferred for password hashing and password-based key derivation, but it is slower as it makes more passes over the memory to protect from tradeoff attacks. Argon2id is a hybrid of Argon2i and Argon2d, using a combination of data-depending and data-independent memory accesses, which gives some of Argon2i's resistance to side-channel cache timing attacks and much of Argon2d's resistance to GPU cracking attacks.

Alternatively, we could leave the decision of which KDF to use to deployment and require a strong key instead.

It seems things are getting complicated now. Since we are not passing it directly, does it really matter to use real password hashing algorithm?

Since we are not passing it directly, does it really matter to use real password hashing algorithm?

Yes if you want the server to be resistant to password guessing attacks.

No if we move it out of the scope of this project and require long random keys.

The reason why I prefer to blake2b is it claims as secure as SHA3 but slightly faster than MD5 and SHA1. Blake2b is also used in Argon2.

blake2b, SHA, and any fast hashing in general are not suitable for password hashing.

You need slower, tunable password hashing like PBKDF2, bcrypt, scrypt, or Argon2 (each has its own crypto tradeoffs).

Ultimately, though, nothing beats long random keys from /dev/urand ^_^

Okay, I leave this to madeye to balance.

Furthermore, isn't it necessary a configuration field for hashing salt as well?

Mygod commented

For maximum security, we can also get rid of key derivation by using a random generated key file (or base64 encoded string) like PGP. This also guarantees compatibility on every platform ever.

Let's use Argon2 as our default key derivation for AEAD (nothing changed for old stream ciphers).

We follow the predefined settings from libsodium and take "shadowsocks hash" as the default salt.

        const unsigned char salt[crypto_pwhash_SALTBYTES] = {
            's', 'h', 'a', 'd', 'o', 'w', 's', 'o',
            'c', 'k', 's', ' ', 'h', 'a', 's', 'h'
        };
        int err = crypto_pwhash (key, nkey, (char*)pass, strlen(pass), salt,
                crypto_pwhash_OPSLIMIT_INTERACTIVE, crypto_pwhash_MEMLIMIT_INTERACTIVE,
                crypto_pwhash_ALG_DEFAULT);

More details:

  1. shadowsocks/shadowsocks-libev@e18f71b
  2. https://download.libsodium.org/doc/password_hashing/the_argon2i_function.html

The above Argon2 settings require 4MB memory to compute keys. Not sure if it's gonna be ok on low end routers.

@riobard 4MB memory is not a problem for current low end routers. (32MB DDR2, MT7620A, $10).

Great!