nodejs/node

Happy Eyeballs support (address IPv6 issues in Node 17)

andreialecu opened this issue · 24 comments

What is the problem this feature will solve?

It appears that node 17 has merged this PR: #39987 which may result in IPv6 addresses being sorted first.

As per an answer in this reddit thread from @treysis: https://www.reddit.com/r/ipv6/comments/qbr8jc/comment/hhmg7uz/?context=3 it appears that Happy Eyeballs is not implemented however.

The lack of Happy Eyeballs support can result in connections failing. In particular, there are a lot of data points related to similar support lacking from the OkHttp library which is used heavily on Android and results in connectivity issues for a relatively big number of users. Ref: square/okhttp#506 (comment)

The problem is that consumer routers sometimes glitch out on IPv6.

A router I manage in one particular location, which is a consumer grade TP-Link router, will routinely hang on IPv6 connections about once per month, until it is restarted. More on this here: square/okhttp#6954 (comment)

I have noticed that Node 17 will then pretty much hang on outgoing connections for up to one minute before eventually being able to connect. Restarting the router resolves the issue.

What is the feature you are proposing to solve the problem?

Implement the Happy Eyeballs algorithm.

What alternatives have you considered?

There is an user-land patch here: https://www.npmjs.com/package/happy-eyeballs by @zwhitchcox which might help as a workaround.

I agree Node.js would benefit having this built-in. Is this something you'd be interested in working on @andreialecu?

I am not sure what subsystem this would fit into. It seems that Happy Eyeballs not only applies to HTTP (http, http2, https), but also to other IPv4/IPv6 sockets, i.e., any TCP (net) or UDP socket (dgram).

There's a writeup on how this is implemented in OSX here by an Apple engineer:

https://mailarchive.ietf.org/arch/msg/v6ops/DYiI9v_O66RNbMJsx0NsatFkubQ

The updated implementation performs the following:

  • Query the DNS resolver for A and AAAA.
    If the DNS records are not in the cache, the requests are sent back to back on the wire, AAAA first.
  • If the first reply we get is AAAA, we send out the v6 SYN immediately
  • If the first reply we get is A and we're expecting a AAAA, we start a 25ms timer
    • If the timer fires, we send out the v4 SYN
    • If we get the AAAA during that 25ms window, we move on to address selection
  • When we have a list of IP addresses (either from the DNS cache or by receiving them close together with v4 before v6),
    we perform our own address selection algorithm to sort them. This algorithm uses historical RTT data to prefer addresses
    that have lower latency - but has a 25ms leeway: if the historical RTT of two compared address are within 25ms of each
    other, we use RFC3484 to pick the best one.
  • Once the list is sorted, we send out the SYN for the first address and start timers based on average and variance of the
    historical TCP RTT. Roughly speaking, we start the second address around the same time we send out a SYN retransmission
    for the first address.
  • The first address to reply with a SYN-ACK wins the race, we then cancel the other TCP connection attempts.

There's also an official RFC at: https://datatracker.ietf.org/doc/html/rfc8305

Since it deals with SYN packets, I'm not sure if this can be implemented in JS land or it needs to touch the C++ networking layer. I'm not familiar at all with nodejs networking internals.

Yes, I believe it should go into net. Doesn't the http module make use of net in the end as well?

Happy Eyeballs with UDP being connectionless is a bit more complicated.

Doesn't the http module make use of net in the end as well?

Mostly, yes, at least by default.

As of 2017, IPv6 brokenness is now generally regarded as a non-problem. Wikipedia

I am wondering how much complexity we can justify just to work around broken IPv6 stacks. The algorithms seem useful even beyond IPv6 compatibility, but so far I have no idea how we could implement this. We probably cannot do this for all TCP connections, at least not by default. We could maybe add an option to socket.connect() to enable this behavior. However, an implementation would probably require changes within libuv first, which is responsible for establishing TCP connections.

We probably cannot do this for all TCP connections, at least not by default.

Why not? If I am not wrong this is the default for all Happy Eyeballs implementations. Though it uses some kind of cache.

As of 2017, IPv6 brokenness is now generally regarded as a non-problem.

That statement is probably not factual. At least not at scale. Even if 1% of users are affected, depending on the number of users, that can add up quite a bit.

Reddit has recently disabled IPv6 due to issues they were having:
https://reddit.com/r/ipv6/comments/r047ah/redditcom_has_deleted_aaaa_records_and_is_back_to

Also their Android app does not implement Happy Eyeballs, and here's a report specifically pinpointing this to be the problem:
https://reddit.com/r/ipv6/comments/qy70hv/new_phone_failing_to_load_any_ipv6_websites_on/hlebckp/?utm_source=reddit&utm_medium=web2x&context=3

There's also a huge thread here: aws-amplify/amplify-js#5539 about an Android only issue that I was also running into, and which was specifically about Happy Eyeballs.

Additionally, there are many other threads in other JS ecosystems related to Happy Eyeballs: facebook/react-native#29608 (comment)

There are tons of reports still popping up. And it goes unreported most of the time, because it's very random and pretty much unreproducible since it's hard to actually pin it down to IPv6 being the culprit.

@andreialecu Don't get me wrong, I am not saying there are no issues. I frequently run into some myself. That was just me quoting Wikipedia. IPv6 is older than myself and I really wish we didn't have to worry about broken IPv6 stacks at this point 😄

Either way, as I said in my last comment, someone would probably have to implement this in libuv before we could consider adding it in Node.js.

Either way, as I said in my last comment, someone would probably have to implement this in libuv before we could consider adding it in Node.js.

Perhaps this might help with that: https://github.com/shtrom/happy-eyeballs-c

someone would probably have to implement this in libuv before we could consider adding it in Node.js

Not necessary, just call uv_tcp_connect() on two uv_tcp_t handles and keep the winner.

someone would probably have to implement this in libuv before we could consider adding it in Node.js

Not necessary, just call uv_tcp_connect() on two uv_tcp_t handles and keep the winner.

My understanding was that libuv might establish multiple connections in that case, whereas the algorithm above only sends SYNs until one receives a reply. Maybe that's wrong, or maybe establishing connections isn't that bad.

You uv_close() the loser. Do that before the ACK is received and the connection was never established.

There has been no activity on this feature request for 5 months and it is unlikely to be implemented. It will be closed 6 months after the last non-automated comment.

For more information on how the project manages feature requests, please consult the feature request management document.

Reddit has recently disabled IPv6 due to issues they were having: https://reddit.com/r/ipv6/comments/r047ah/redditcom_has_deleted_aaaa_records_and_is_back_to

Also their Android app does not implement Happy Eyeballs, and here's a report specifically pinpointing this to be the problem: https://reddit.com/r/ipv6/comments/qy70hv/new_phone_failing_to_load_any_ipv6_websites_on/hlebckp/?utm_source=reddit&utm_medium=web2x&context=3

While Reddit is still in a staged rollout of IPv6, they've already published about implementing Happy Eyeballs on Android: https://www.reddit.com/r/RedditEng/comments/v1upr8/ipv6_support_on_android/

cc @nodejs/tsc this might be something we should look to implement before v18 hits LTS.

bnb commented

+1 this seems to be directly affecting undici / undici-fetch. It would be nice to not have that baked into v18 LTS 😊

we should look to implement before v18 hits LTS.

My team and I have been bitten by this issue (specifically with regard to localhost). Big +1 for fixing it and backporting the fix.

I believe 3rd party package maintainers will realize they have to listen on ::1 as well.

How did you solve this? GitHub runners do not support IPv6 and I cannot use fetch (native fetch i.e. nodejs/undici)

I tried the following but it did not solve it:

import dns from 'dns';
dns.setDefaultResultOrder('ipv4first');

@undergroundwires which NodeJS version are you using.

Thank you @treysis for showing interest.

First I tested with latest 18 (18.19.1), then latest 20 (20.11.1), I send requests using (fetch) on GitHub runner.

It fails specifically on wikipedia.org requests.

Related issue for fetch: nodejs/undici#1531
Related issue for GitHub runners not being able to do IPv6: actions/runner-images#668, actions/runner#3138

@undergroundwires does your host support IPv4? Or is it running on IPv6-only AWS?

I created a repository to reproduce this in as minimal way as possible:

https://github.com/undergroundwires/node-fetch-ipv6

I run fetch on using vite test runner on node and without any test runner. I get same results.

I realize that the first two requests are successful, I start getting the error after the second or third one.

See the test: test file

Second test (multiple fetch in order) fails, and all run fine locally. I run tests on default GitHub runners using both latest node 20 and 18. I get same errors:

stdout | vite-test.spec.ts > batch fetch in order
Fetching:  https://web.archive.org/web/20221029145712/https://kb.mozillazine.org/Downloads.rdf

stdout | vite-test.spec.ts > batch fetch in order
Fetching:  https://web.archive.org/web/20240120213614/https://techcommunity.microsoft.com/t5/windows-it-pro-blog/group-configuration-search-highlights-in-windows/ba-p/3263989

 ❯ vite-test.spec.ts  (2 tests | 1 failed) 24519ms
   ❯ vite-test.spec.ts > batch fetch in order
     → fetch failed

⎯⎯⎯⎯⎯⎯⎯ Failed Tests 1 ⎯⎯⎯⎯⎯⎯⎯

 FAIL  vite-test.spec.ts > batch fetch in order
TypeError: fetch failed
 ❯ vite-test.spec.ts:26:22
     24|   for (const url of urls) {
     25|     console.log('Fetching: ', url);
     26|     const response = await fetch(url, { method: 'HEAD'});
       |                      ^
     27|     await sleep(10000);
     28|     expect(response.status).to.equal(200);

Caused by: Caused by: ConnectTimeoutError: Connect Timeout Error
⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯
Serialized Error: { code: 'UND_ERR_CONNECT_TIMEOUT' }

Error: TypeError: fetch failed
 ❯ vite-test.spec.ts:26:22

Caused by: ConnectTimeoutError: Connect Timeout Error
⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯
Serialized Error: { code: 'UND_ERR_CONNECT_TIMEOUT' }

Here's my workaround (open-source and documented) that I hope that can help you too:

After days of research and trial/error, this is how I got this working:

  1. Create a script called force-ipv4.sh, that configures system to prefer IPv4 over IPv6, call it to configure the machine. It was not easy to find a reliable cross-platform solution and I went Cloudflare WARP for DNS resolution along with some system configurations.
  2. To easily use the script in GitHub workflows, create GitHub action called force-ipv4 and call it in GitHub runners.
  3. Fixes the IPv6 request issues, and you can happily run e.g. fetch API from Node.

Related commit introducing this fix: undergroundwires/privacy.sexy@52fadcd