SIP004 - Support for AEADs implemented by large libraries

Question

SIP004 - Support for AEADs implemented by large libraries

Mygod opened this issue 7 years ago · 280 comments

Use AEADs to replace stream cipher + OTA. Previous discussion: #29.

Proposed AEAD algorithms:

ChaCha20-Poly1305 (see also: xSocks)
XChaCha20-Poly1305
Salsa20-Poly1305
AES-256-GCM (faster but not low-end-box-friendly)

Update: The following shows an example of TCP stream in chacha20-ietf-poly1305 mode (original idea by @breakwa11 and @Noisyfox). Other AEAD should follow the similar format.

Cipher: chacha20-ietf-poly1305

TCP request (after encryption, *ciphertext*)
+--------+----------------+--------------+--------------+---------------+
| NONCE  | PayloadLen_TAG | *PayloadLen* | Payload_TAG  |   *Payload*   |
+--------+----------------+--------------+--------------+---------------+
|  12    |       16       |       2      |     16       |    Variable   |
+--------+----------------+--------------+--------------+---------------+

TCP Chunk (after encryption, *ciphertext*)
+--------------+------------+-----------+----------+
| DATA_LEN_TAG | *DATA_LEN* |  DATA_TAG |  *DATA*  |
+--------------+------------+-----------+----------+
|      16      |     2      |     16    | Variable |
+--------------+------------+-----------+----------+

riobard commented 7 years ago

Great!

Answer 1 · 2017-01-12T06:38:59.000Z

My plan is forking ourselves, dropping all the support of stream ciphers and OTA.

After that the we can provide a new version of the protocol (v2?).

Answer 2 · 2017-01-12T06:39:03.000Z

I want to clarify AES-GCM is the fastest on machines with particular instruction sets, for example, server and client are both running on modern Intel CPU.

Answer 3 · 2017-01-12T06:39:28.000Z

I agree with #30 (comment)

Answer 4 · 2017-01-12T06:49:00.000Z

I don't see the reason to drop compatibility for stream ciphers for now as @madeye has said:

However, I don't think any known attacks to shadowsocks protocol is realistic. And please don't forget that most of the traffic over shadowsocks is TLS...

People who has higher security standards could migrate to using these algorithms. And if benchmark shows its influence to performance is negligible we should start to encourage everyone to migrate to AEADs.

And I don't think adding these additional algorithms would be a lot of work (we need at most 1 more abstraction layer).

Answer 5 · 2017-01-12T06:51:16.000Z

They can coexist as different projects, it is easy to maintain IMO.

Answer 6 · 2017-01-12T06:54:05.000Z

Considering SIP003, I think it's easier to maintain if implemented in one single project.

Answer 7 · 2017-01-12T07:13:34.000Z

To support AEADs, first we need to define chunks for the TCP stream. Here is a proposal:

+----------+----------+----------+----------+
| DATA.LEN |   NONCE  |    TAG   |   DATA   | ...
+----------+----------+----------+----------+
|     2    |    8     |    16    | Variable | ...
+----------+----------+----------+----------|

Assuming average data length equals to a typical MTU - 26 (1492 - 26 = 1466), then the maximum transfer efficiency is 1466 / 1492 = 98%.

Answer 8 · 2017-01-12T07:20:40.000Z

So which part of the chunk is encrypted? DATA? Or the whole chunk? Or both of them?

Answer 9 · 2017-01-12T07:28:09.000Z

@Noisyfox DATA.LEN, NONCE and TAG should not be encrypted. Only DATA is encrypted. TAG is generated from encrypted DATA. For more details: https://github.com/lparam/xSocks/blob/master/src/crypto.c#L25

Answer 10 · 2017-01-12T07:31:25.000Z

That's basically the pattern of TLS record if I did understand correctly.

Answer 11 · 2017-01-12T07:32:00.000Z

@Noisyfox Yes, exactly.

Answer 12 · 2017-01-12T07:37:56.000Z

xsocks' looks like https://github.com/jedisct1/libsodium/blob/master/src/libsodium/crypto_secretbox/xsalsa20poly1305/ref/secretbox_xsalsa20poly1305.c

It is AE not AEAD.

Answer 13 · 2017-01-12T07:37:57.000Z

Is DATA.LEN necessary?

@wongsyrone I don't think we need AEAD actually?

Answer 14 · 2017-01-12T07:43:35.000Z

@Mygod Yes, I think so. It's required to split TCP stream to chunks.

Answer 15 · 2017-01-12T07:44:11.000Z

AE:

Encrypts a message with a key and a nonce to keep it confidential
Computes an authentication tag. This tag is used to make sure that the message hasn't been tampered with before decrypting it.

AEAD:

Encrypts a message with a key and a nonce to keep it confidential
Computes an authentication tag. This tag is used to make sure that the message, as well as optional, non-confidential (non-encrypted) data, haven't been tampered with.

AEAD can protect plain text as well.

Answer 16 · 2017-01-12T07:47:44.000Z

@madeye What about letting it equal to packet length - 24?

@wongsyrone I don't think we have non-confidential data to protect however.

Answer 17 · 2017-01-12T07:49:14.000Z

@Mygod In a TCP stream, there is not packet length...

Answer 18 · 2017-01-12T07:52:15.000Z

Oh sorry I got confused because you mentioned MTU. 😅

Answer 19 · 2017-01-12T07:57:37.000Z

MTU here is only to demonstrate how large a chunk could be in the real world... In our implementation, we should also limit the DATA.LEN, make it smaller than the socket buffer size, e.g. 2048 in shadowsocks-libev.

Answer 20 · 2017-01-12T08:00:30.000Z

NONCE is too short.

Note: the currently included AEAD cipher based on the draft only uses an eight byte nonce, which is too short to be safely generated randomly. A later version of libsodium will include the updated version with a longer nonce.

https://crypto.stackexchange.com/questions/29311/how-do-i-tack-on-additional-data-to-libsodiums-public-key-authenticated-encry

Answer 21 · 2017-01-12T08:19:26.000Z

Then what about this:

+----------+----------+----------+----------+----------|
| DATA.LEN |   SEQ    |   NONCE  |    TAG   |   DATA   | ...
+----------+----------+----------+----------+----------+
|     2    |    8     |    8     |    16    | Variable | ...
+----------+----------+----------+----------+----------+

SEQ is a 64-bit sequence number. The real NONCE is SEQ+ NONCE. DATA.LEN + SEQ is also the AD of the AEAD.

(Why shouldn't I use TLS directly? 😅 )

Answer 22 · 2017-01-18T08:53:04.000Z

[draft] main idea from @breakwa11 and @Noisyfox
The purpose of Length tag is to keep length as-is and not tempered by the third party since this is the key to separate general ciphertext and auth tag.

Please note that this is not compatible with current OTA and original protocol.

/*
 * The main difference between OTA designed by madeye and this one is MtE vs EtM.
 *
 * The first nonce is either from client or server side, it is generated via randombytes_buf()
 * in libsodium, and the sequent nonces are incremented via sodium_increment() in libsodium.
 * Please note that sodium_increment() considers the number to be encoded in little-endian format.
 * IMPORTANT: nonce should be used only once, let us running on the right track.
 *
 * Data.Len is used to separate general ciphertext and Auth tag. We can start decryption
 * if and only if the verification is passed. 
 * Firstly, we do length check, then decrypt it, separate ciphertext and attached data tag
 * based on the verified length, verify data tag and decrypt the corresponding data.
 * Finally, do what you supposed to do, e.g. forward user data.
 *
 * For UDP, nonces are generated randomly without the incrementation.
 *
 * TCP request (before encryption)
 * +------+---------------------+------------------+
 * | ATYP | Destination Address | Destination Port |
 * +------+---------------------+------------------+
 * |  1   |       Variable      |         2        |
 * +------+---------------------+------------------+
 *
 * TCP request (after encryption)
 * +--------+-----------+---------------+----------+---------------+
 * | NONCE  |PayloadLen |PayloadLen_TAG | Payload  |  Payload_TAG  |
 * +--------+-----------+---------------+----------+---------------+
 * | Fixed  |     2     |     Fixed     | Variable |     Fixed     |
 * +--------+-----------+---------------+----------+---------------+
 * 
 * Payload input: atyp + dst.addr + dst.port
 * PayloadLen is length(atyp + dst.addr + dst.port)
 * Payload_TAG and PayloadLen_TAG are in plaintext
 *
 * TCP Chunk (before encryption)
 * +----------+
 * |  DATA    |
 * +----------+
 * | Variable |
 * +----------+
 *
 * Data.Len is a 16-bit big-endian integer indicating the length of DATA.
 *
 * TCP Chunk (after encryption)
 * +------------+---------+-----------+----------+
 * | Data.Len*  | Len_TAG |  DATA_TAG |  DATA*   |
 * +------------+---------+-----------+----------+
 * |      2     | Fixed   |  Fixed    | Variable |
 * +------------+---------+-----------+----------+
 *
 * Len_TAG and DATA_TAG have the same length, they are in plaintext.
 * After encryption, DATA -> DATA*
 *
 * UDP (before encryption)
 * +------+---------------------+------------------+----------+
 * | ATYP | Destination Address | Destination Port |   DATA   |
 * +------+---------------------+------------------+----------+
 * |  1   |       Variable      |         2        | Variable |
 * +------+---------------------+------------------+----------+
 *
 * UDP (after encryption)
 * +--------+----------+-----------+
 * | NONCE  |  DATA*   |  DATA_TAG |
 * +--------+----------+-----------+
 * | Fixed  | Variable |  Fixed    |
 * +--------+----------+-----------+
 *
 * DATA* is Encrypt(atyp + dst.addr + dst.port + DATA)
 * RSV and FRAG are dropped
 * Since UDP packet is either received completely or missed,
 * we don't have to keep a field to track its length.
 *
 */

Answer 23 · 2017-01-18T09:02:50.000Z

I don't see the point of "length-tag". AEAD support additional data to be authenticated as plaintext. You can use length as the "additional data" for temper proofing.

Answer 24 · 2017-01-18T09:05:39.000Z

@v2ray Agreed. The DATA.LEN can be protected by AEAD as plaintext.

Answer 25 · 2017-01-18T13:27:47.000Z

+1. But this reminds me that we need random noise indistinguishability more than authentication (i.e. IND-CCA2) since the original purpose of this repo is to circumvent GFW and it's much harder to perform a realistic CCA2 distinction. Let's review our additions against random noise indistinguishability:

TAG: Check. This should be guaranteed by the security of the hash algorithm we used.
DATA.LEN: Oh shit this is not random at all. Adversary can distinguish them very easily.
NONCE: No problem as long as they are randomly generated. (which isn't the case in TCP)
SEQ: Terrible idea. It makes distinction much more easier.

OTA also has this problem. So what do you think?

Answer 26 · 2017-01-18T14:52:08.000Z

removes some of my useless comments

Okay I think I got this sorted out. What we need to do is just to put data.len in the encryption data; send nonce ONLY in the first data and it auto increments for EVERY data. And voilà!

Answer 27 · 2017-01-18T15:10:32.000Z

FYI: http://github.com/v3aqb/hxsocks Mygod <notifications@github.com>于2017年1月18日周三 22:52写道：

…

*removes some of my useless comments* Okay I think I got this sorted out. What we need to do is just to put data.len in the encryption data; send nonce ONLY in the first data and it auto increments for EVERY data. And voilà! — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#30 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ADJX912UsW4SkbvZFxqimnrr4D1YHvz4ks5rTicZgaJpZM4Lhapt> .

Answer 28 · 2017-01-18T15:24:23.000Z

@v3aqb

better encrypted

I beg to differ. I took a quick glance and you implemented what shadowsocks was designed to prevent.

Answer 29 · 2017-01-18T15:26:22.000Z

@Mygod Then the only difference from our current OTA is that the new proposal is Encrypt-Then-MAC... 😄

Answer 30 · 2017-01-18T15:27:43.000Z

This is just an example of improving content security (aka CCA-proof), while it undermines protocol security (aka censorship-bypassing-detection-proof).

Before any progress, we should examine it carefully and make the right choice.

Answer 31 · 2017-01-18T15:33:18.000Z

@madeye No. There are two significant differences:

Data.len is transmitted in ciphertext instead of plain text. This implies that it's easy for GFW to detect shadowsocks connections with OTA enabled.
AE algorithms are chosen and implemented by large library and tested/attempted attack by more people.

@wongsyrone Yes. We need to know which one is more important in this context. I think my approach should be safest afaic.

Answer 32 · 2017-01-18T15:41:32.000Z

@Mygod if data.len is encrypted, how to figure out the length of chunk for decryption (in the context of AEAD)?

Answer 33 · 2017-01-18T15:46:34.000Z

The noise data is another approach to obfuscating packet length pattern, which may help.

Without data.len authentication, you cannot assume it hasn't been touched. That's why we decide to add a tag for it.

Encryption never implies authentication.

In fact, the AD part is never used in the proposal; I planned to make it work like another pre-shared info and verify it before decryption. It looks like a kind of overkill.

Answer 34 · 2017-01-18T15:48:55.000Z

@v2ray You get the point man.
That's why we encrypt length as a 'chunk' with it's own tag, which means we could first verify and decrypt the length which has a fixed size and then read the remain data according to the verified length.

Answer 35 · 2017-01-18T15:52:14.000Z

@v2ray Just decrypt the first 2 bytes for stream cipher or first block for block cipher. I don't see a problem.

@wongsyrone All encryption I mentioned are authenticated encryption. data.len will be included when calculating hash as well.

Answer 36 · 2017-01-18T15:56:03.000Z

@Mygod If some one increase the length by one byte, then the server will wait for one more byte instead of close the connection immediately.

Answer 37 · 2017-01-18T15:57:50.000Z

Well you see, AEAD's main point is never decrypt any data before the cipher text is been verified.

Answer 38 · 2017-01-18T16:00:35.000Z

@Noisyfox Good point. So maybe we should authenticate data.len and data separately instead? Like TAG, encrypted_data.len, TAG, encrypted_data?

Answer 39 · 2017-01-18T16:02:39.000Z

@Noisyfox Hmm on second thought actually that may not be a big issue. If a man-in-the-middle removes the last byte from the original stream, the server is also going to wait for that last byte to arrive instead of closing the connection immediately. A malformed data.len isn't a big issue IMO. What do you think?

Answer 40 · 2017-01-18T16:05:14.000Z

The proposal is supposed to encrypt length and corresponding data separately.
The length verification laies on first class. The second is data verification.

Answer 41 · 2017-01-18T16:06:07.000Z

@Mygod But if modify one byte in cipher text will lead to server's blocking, then the attacker could find out that one specific byte means length. Well I think it's better to hide our protocol too.

Answer 42 · 2017-01-18T16:10:43.000Z

@wongsyrone Oh you're right. I didn't see that there was an asteroid after data.len which means encrypted. Sorry about that y'all. 😛

@Noisyfox Yes. That's a better attack! Now I'm convinced that we should take your proposal.

Answer 43 · 2017-01-18T16:49:01.000Z

There is a simpler way for 'encrypting' data.len. Only the first 'data.len' matters. If the attacker can't figure out where is (or what is) the first 'data.len', he/she can't crack the following data.len as well. To simplify the design (and also increase traffic throughput), you can just encrypt the data.len for the first chunk of payload.

Answer 44 · 2017-01-18T16:51:45.000Z

You ignored timing. TCP stream can pause for some time and then resumed. Also we should never put plain text in the traffic unless it's indistinguishable with random noise.

Answer 45 · 2017-01-19T00:21:31.000Z

Just reference, we may borrow some ideas from https://github.com/WeMobileDev/article/blob/master/%E5%9F%BA%E4%BA%8ETLS1.3%E7%9A%84%E5%BE%AE%E4%BF%A1%E5%AE%89%E5%85%A8%E9%80%9A%E4%BF%A1%E5%8D%8F%E8%AE%AEmmtls%E4%BB%8B%E7%BB%8D.md

Answer 46 · 2017-01-19T00:57:44.000Z

Great! I think we're very close to the new protocol.

@Mygod Could you add @Noisyfox's design to your proposal?

Answer 47 · 2017-01-19T01:14:59.000Z

Updated draft to describe the procedure.

Answer 48 · 2017-01-19T02:19:20.000Z

Maybe put tag before encrypted data.len since "never decrypt any data before the cipher text is been verified"?

Answer 49 · 2017-01-19T02:26:59.000Z

In general, there are three parts after encryption:

Nonce
data.len and its tag
data and its tag

The sequence of tag and ciphertext in each part doesn't matter since tags are sharing the same length.
The difference is how to compute the offset.

Answer 50 · 2017-01-19T02:30:29.000Z

I meant TAG, data.len, TAG, data. It just looks nicer.

Answer 51 · 2017-01-19T02:31:14.000Z

I see, no objection from my side.

Answer 52 · 2017-01-19T02:44:36.000Z

Just come up with another idea, How about dropping data tag, use only

data.len_TAG, data.len, data

Update: Okay, you can ignore this comment.

Answer 53 · 2017-01-19T02:46:54.000Z

Then this is almost no different with the current protocol without OTA.

Answer 54 · 2017-01-19T02:50:44.000Z

At least, it ensures datalen is not compromised. It's just a hint.

Answer 55 · 2017-01-19T02:53:15.000Z

Another problem is about data.len_TAG. Should we use the full length of AEAD for two bytes of data.len?

Answer 56 · 2017-01-19T02:55:03.000Z

@wongsyrone Also your approach is also vulnerable to the detect method proposed by @breakwa11 as long as you are using stream cipher.

@madeye 2 bytes may be too short. Only 65,536 combinations for attacker to try.

Answer 57 · 2017-01-19T02:55:21.000Z

The sad truth is crypto library expects a full-length tag.

Answer 58 · 2017-01-19T02:56:36.000Z

...and attaching full length tag isn't a big overhead IMO.

Answer 59 · 2017-01-19T03:00:05.000Z

Another problem is key derivation, I proposed to use crypto_generichash() in libsodium, which uses Blake2b internally. Or some real password hashing algorithms like Argon2 in libsodium.

Answer 60 · 2017-01-19T03:13:04.000Z

2 bytes may be too short. Only 65,536 combinations for attacker to try.

When computing the tag, we can add user input as AD. No matter how many times you tried, you don't know AD; then it will result in auth failure.
After auth failure, close connection instantly or add them to a list with a random timeout, close conn when timeout exceeds.

Answer 61 · 2017-01-19T03:17:54.000Z

@wongsyrone Good points except I don't see the point of using AD. I assume your user input is some kind of pre-shared message. In that case we already have a secret key.

Answer 62 · 2017-01-19T03:41:17.000Z

Yes, it is pre-shared message work like password we already have, we can reuse it or add a similar entry.

Answer 63 · 2017-01-19T03:44:26.000Z

I still don't think it's necessary.

Answer 64 · 2017-01-19T05:24:17.000Z

@wongsyrone As we don't have any plaintext now (except the nonce and tag), I think there is no need to use AD here.

I'd like to summarize the ideas above. Now, we have a proposal like this:

Cipher: chacha20-poly1305

TCP request (after encryption, *ciphertext*)
+--------+----------------+--------------+--------------+---------------+
| NONCE  | PayloadLen_TAG | *PayloadLen* | Payload_TAG  |   *Payload*   |
+--------+----------------+--------------+--------------+---------------+
|  12    |       32       |       2      |     32       |    Variable   |
+--------+----------------+--------------+--------------+---------------+

TCP Chunk (after encryption, *ciphertext*)
+--------------+------------+-----------+----------+
| DATA_LEN_TAG | *DATA_LEN* |  DATA_TAG |  *DATA*  |
+--------------+------------+-----------+----------+
|      32      |     2      |     32    | Variable |
+--------------+------------+-----------+----------+

Answer 65 · 2017-01-19T06:28:32.000Z

It should be:

Cipher: chacha20-ietf-poly1305

TCP request (after encryption, *ciphertext*)
+--------+----------------+--------------+--------------+---------------+
| NONCE  | PayloadLen_TAG | *PayloadLen* | Payload_TAG  |   *Payload*   |
+--------+----------------+--------------+--------------+---------------+
|  12    |       16       |       2      |     16       |    Variable   |
+--------+----------------+--------------+--------------+---------------+

TCP Chunk (after encryption, *ciphertext*)
+--------------+------------+-----------+----------+
| DATA_LEN_TAG | *DATA_LEN* |  DATA_TAG |  *DATA*  |
+--------------+------------+-----------+----------+
|      16      |     2      |     16    | Variable |
+--------------+------------+-----------+----------+

Answer 66 · 2017-01-19T06:30:44.000Z

Non-ietf variant, change Nonce from 12 to 8.
For xchacha20-ietf-poly1305, change Nonce from 12 to 24.

Answer 67 · 2017-01-19T06:37:23.000Z

Non-IETF versions of crypto_aead_xchacha20poly1305 is already removed: jedisct1/libsodium@1686da3

BTW, xchacha20-ietf-poly1305 is not released yet.

For shadowsocks-libev:

#ifdef crypto_aead_xchacha20poly1305_ietf_ABYTES
#define HAVE_XCHACHA20IETF
#endif

static const char *supported_ciphers[CIPHER_NUM] = {
    "aes-128-gcm",
    "aes-192-gcm",
    "aes-256-gcm",
    "chacha20-poly1305",
    "chacha20-poly1305-ietf",
#ifdef HAVE_XCHACHA20IETF
    "xchacha20-poly1305-ietf"
#endif
};

/*
 * use mbed TLS cipher wrapper to unify handling
 */
#ifdef USE_CRYPTO_MBEDTLS
static const char *supported_ciphers_mbedtls[CIPHER_NUM] = {
    "AES-128-GCM",
    "AES-192-GCM",
    "AES-256-GCM",
    CIPHER_UNSUPPORTED,
    CIPHER_UNSUPPORTED,
#ifdef HAVE_XCHACHA20IETF
    CIPHER_UNSUPPORTED
#endif
};
#endif

static const int supported_ciphers_nonce_size[CIPHER_NUM] = {
    12, 12, 12, 8, 12,
#ifdef HAVE_XCHACHA20IETF
    24
#endif
};

static const int supported_ciphers_key_size[CIPHER_NUM] = {
    16, 24, 32, 32, 32,
#ifdef HAVE_XCHACHA20IETF
    32
#endif
};

static const int supported_ciphers_tag_size[CIPHER_NUM] = {
    16, 16, 16, 16, 16,
#ifdef HAVE_XCHACHA20IETF
    16
#endif
};

Answer 68 · 2017-01-24T15:51:52.000Z

@madeye I think plugins are basically settled for now (unless one's starting to port PT to our codebase). Let's start implementing this because I think it's pretty urgent. A man-in-the-middle should be able to easily identify OTA-enabled trafficc since there is only 255 possibilities to try. I was trying to write a plugin based on simple-obfs to serve as a emulated man-in-the-middle to illustrate my attack but I haven't finished debugging it yet. After implementing this we should deprecate OTA.

Answer 69 · 2017-01-25T00:51:26.000Z

@Mygod Could you update the final frame design in your proposal? I'm OK with the design of @wongsyrone. So let's just follow it #30 (comment).

Answer 70 · 2017-01-25T02:51:06.000Z

@Mygod where does the number 255 come from? The address type? If so, are you saying the new GCM method also applies to the header part?

Answer 71 · 2017-01-25T02:59:28.000Z

@madeye Done. (btw I believe you can edit my comment directly)

@v2ray Address length varies from 2 bytes to 256 bytes inclusive. The most obvious problem of the old approach is that its data.len is transmitted in plaintext.

Answer 72 · 2017-01-25T04:46:16.000Z

You'd better mention my name if you choose to follow this design which idea was came from me actually

Answer 73 · 2017-01-25T04:54:23.000Z

@breakwa11 Yep, I think @Mygod has already added your name to his proposal. #30 (comment)

Answer 74 · 2017-01-26T15:27:48.000Z

Please reverse the order of TAG and PAYLOAD (so it's PAYLOAD then TAG), as it's easier to use the normal combined mode AEAD encrypt/decrypt functions instead of the detached mode (e.g. in libsodium). Similarly, Go's standard library has a fixed AEAD signature assuming the normal combined mode order (https://golang.org/pkg/crypto/cipher/#AEAD).

Since the whole point of using proven AEAD is to prevent developers from screwing up crypto, it is preferable to follow conventions to reduce complexity.

Answer 75 · 2017-01-26T15:50:21.000Z

Sounds good to me.

Answer 76 · 2017-01-26T23:59:47.000Z

LGTM. Although I don't think the order here really matters, golang indeed implements GCM in that way: https://golang.org/src/crypto/cipher/gcm.go#147

Answer 77 · 2017-01-27T00:09:57.000Z

It doesn't matter.

Answer 78 · 2017-01-28T11:21:41.000Z

Updated nonce incrementation:

Please note that sodium_increment() considers the number to be encoded in little-endian format.

Answer 79 · 2017-01-28T15:59:56.000Z

For key derivation, libsodium's crypto_generichash is marginally better than MD5/SHA1.

It's safer to use proven password hashes such as PBKDF2. Argon2 is fine, but I worry about lack of mature library support due to its novelty. Additionally, we have to choose the right parameters (number of iterations, salt, etc) for password hashes since most of them require knobs…

Answer 80 · 2017-01-29T04:08:58.000Z

Here is the experimental branch for SIP004 https://github.com/shadowsocks/shadowsocks-libev/tree/sip004

Please help to run tests and refine the document by adding comments here.

Answer 81 · 2017-01-29T04:44:14.000Z

The password hashing looks overkill, I prefer to generic hashing like blake2b.

Answer 82 · 2017-01-29T05:31:55.000Z

@wongsyrone Let's keep using previous key driviation (OpenSSL way). Not every shadowsocks port is able to use blake2b hashing.

Answer 83 · 2017-01-29T05:49:12.000Z

I'd suggest at least upgrade to PBKDF2 for key derivation.

Answer 84 · 2017-01-29T06:24:47.000Z

Just wondering which port? According to its official website, there are many implementations available, and many libsodium binding wrappers.

Answer 85 · 2017-01-29T06:46:33.000Z

@wongsyrone qt5, golang, nodejs, chromeapp, rust. Also, if we are really care about password hashing, I prefer argon2 here, as it's a memory-hard approach.

Answer 86 · 2017-01-29T06:51:46.000Z

I think we should get rid of MD5 at least. Argon2 is good alternative as well.

Answer 87 · 2017-01-29T07:45:02.000Z

If we go for Argon2, we need to choose which variant of Argon2 as there are three: Argon2i, Argon2d, and Argon2id. According to https://github.com/P-H-C/phc-winner-argon2, Argon2i seems to be a better fit in our case

Argon2 has three variants: Argon2i, Argon2d, and Argon2id. Argon2d is faster and uses data-depending memory access, which makes it highly resistant against GPU cracking attacks and suitable for applications with no threats from side-channel timing attacks (eg. cryptocurrencies). Argon2i instead uses data-independent memory access, which is preferred for password hashing and password-based key derivation, but it is slower as it makes more passes over the memory to protect from tradeoff attacks. Argon2id is a hybrid of Argon2i and Argon2d, using a combination of data-depending and data-independent memory accesses, which gives some of Argon2i's resistance to side-channel cache timing attacks and much of Argon2d's resistance to GPU cracking attacks.

Answer 88 · 2017-01-29T07:49:24.000Z

Alternatively, we could leave the decision of which KDF to use to deployment and require a strong key instead.

Answer 89 · 2017-01-29T07:49:27.000Z

It seems things are getting complicated now. Since we are not passing it directly, does it really matter to use real password hashing algorithm?

Answer 90 · 2017-01-29T07:51:58.000Z

Since we are not passing it directly, does it really matter to use real password hashing algorithm?

Yes if you want the server to be resistant to password guessing attacks.

No if we move it out of the scope of this project and require long random keys.

Answer 91 · 2017-01-29T07:58:11.000Z

The reason why I prefer to blake2b is it claims as secure as SHA3 but slightly faster than MD5 and SHA1. Blake2b is also used in Argon2.

Answer 92 · 2017-01-29T08:03:02.000Z

blake2b, SHA, and any fast hashing in general are not suitable for password hashing.

You need slower, tunable password hashing like PBKDF2, bcrypt, scrypt, or Argon2 (each has its own crypto tradeoffs).

Answer 93 · 2017-01-29T08:04:54.000Z

Ultimately, though, nothing beats long random keys from /dev/urand ^_^

Answer 94 · 2017-01-29T08:07:41.000Z

Okay, I leave this to madeye to balance.

Answer 95 · 2017-01-29T08:21:25.000Z

Furthermore, isn't it necessary a configuration field for hashing salt as well?

Answer 96 · 2017-01-29T09:09:02.000Z

For maximum security, we can also get rid of key derivation by using a random generated key file (or base64 encoded string) like PGP. This also guarantees compatibility on every platform ever.

Answer 97 · 2017-01-29T09:47:34.000Z

Let's use Argon2 as our default key derivation for AEAD (nothing changed for old stream ciphers).

We follow the predefined settings from libsodium and take "shadowsocks hash" as the default salt.

        const unsigned char salt[crypto_pwhash_SALTBYTES] = {
            's', 'h', 'a', 'd', 'o', 'w', 's', 'o',
            'c', 'k', 's', ' ', 'h', 'a', 's', 'h'
        };
        int err = crypto_pwhash (key, nkey, (char*)pass, strlen(pass), salt,
                crypto_pwhash_OPSLIMIT_INTERACTIVE, crypto_pwhash_MEMLIMIT_INTERACTIVE,
                crypto_pwhash_ALG_DEFAULT);

More details:

Answer 98 · 2017-01-29T10:21:49.000Z

The above Argon2 settings require 4MB memory to compute keys. Not sure if it's gonna be ok on low end routers.

Answer 99 · 2017-01-29T10:25:59.000Z

@riobard 4MB memory is not a problem for current low end routers. (32MB DDR2, MT7620A, $10).