Adaptive port binding for the shuffle service
Closed this issue · 7 comments
2024-08-13 11:52:04 INFO WorkerLauncher$ [Thread-3]:197 - 2024-08-13 11:52:04 WARN Utils [main]:69 - Service 'sparkWorker' could not bind on port 27001. Attempting port 27002.
2024-08-13 11:52:04 INFO WorkerLauncher$ [Thread-3]:197 - 2024-08-13 11:52:04 WARN Utils [main]:69 - Service 'sparkWorker' could not bind on port 27002. Attempting port 27003.
2024-08-13 11:52:04 INFO WorkerLauncher$ [Thread-3]:197 - 2024-08-13 11:52:04 INFO Utils [main]:57 - Successfully started service 'sparkWorker' on port 27003.
2024-08-13 11:52:04 INFO WorkerLauncher$ [Thread-3]:197 - 2024-08-13 11:52:04 INFO Worker [main]:57 - Worker decommissioning not enabled.
2024-08-13 11:52:04 INFO WorkerLauncher$ [Thread-3]:197 - 2024-08-13 11:52:04 WARN ExternalShuffleService [main]:69 - 'spark.local.dir' should be set first when we use db in ExternalShuffleService. Note that this only affects standalone mode.
2024-08-13 11:52:04 INFO WorkerLauncher$ [Thread-3]:197 - 2024-08-13 11:52:04 INFO Worker [dispatcher-event-loop-1]:57 - Starting Spark worker eu-north1<...>.net:27003 with 23 cores, 256.0 GiB RAM
2024-08-13 11:52:04 INFO WorkerLauncher$ [Thread-3]:197 - 2024-08-13 11:52:04 INFO Worker [dispatcher-event-loop-1]:57 - Running Spark version 3.2.2
2024-08-13 11:52:04 INFO WorkerLauncher$ [Thread-3]:197 - 2024-08-13 11:52:04 INFO Worker [dispatcher-event-loop-1]:57 - Spark home: /slot/sandbox/./tmpfs/spark
2024-08-13 11:52:04 INFO WorkerLauncher$ [Thread-3]:197 - 2024-08-13 11:52:04 INFO ExternalShuffleService [dispatcher-event-loop-1]:57 - Starting shuffle service on port 27000 (auth enabled = false)
2024-08-13 11:52:04 INFO WorkerLauncher$ [Thread-3]:197 - 2024-08-13 11:52:04 ERROR Worker [dispatcher-event-loop-1]:94 - Failed to start external shuffle service
2024-08-13 11:52:04 INFO WorkerLauncher$ [Thread-3]:197 - java.net.BindException: Address already in use
While some of the ports are chosen adaptively, the port for shuffle service seems to be chosen statically. We'd better make it also adaptive to prevent too much conflicts for ports on multi-tenant YT clusters.
It is also a good idea to check that the rest of required ports are also chosen adaptively.
BTW, do you remember why we are not using the functionality of user ports by YT? E.g. environment variables YT_PORT_0, YT_PORT_1, ..., for which YT takes responsibility to ensure that they are not bound to anybody?
First of all, the port for shuffle service must be fixed. It's because an executor concats worker_host and shuffle.service.port to get shuffle service's address of another worker. So we need to select the port before cluster startup when Spark configuration shares between nodes.
We agreed that it's good point to randomize shuffle service port inside spark-launch-yt. Then clusters will have good enough diversity of ports. It will be solved in the next release.
First of all, the port for shuffle service must be fixed. It's because an executor concats worker_host and shuffle.service.port to get shuffle service's address of another worker.
Can't this be changed? We do not have the same issue with different kinds of ports.
How the set of ports to be chosen is to be configured? You do understand that the range should be configurable (ideally, by a cluster-wide configuration), right?
As we can see Spark's developers report about fixed shuffle service port.
Also this function is used for any external shuffle service interaction. So every node must have the same port and it cannot be selected dynamically.
In my opinion it happened because shuffle service is the only way for direct worker-worker interaction. Other ports (rest/ui/etc) are accessible through the master that knows about all workers.
Ok, I see. And what's the answer for these questions?
How the set of ports to be chosen is to be configured? You do understand that the range should be configurable (ideally, by a cluster-wide configuration), right?
We decided to do:
- Remove default service.shuffle.port from the config.
- Introduce two configs
spark.shuffle.service.port.interval.start
(default=27050) andspark.shuffle.service.port.interval.size
(default=50) describing an interval of available ports. These configs might be changed globally in//home/spark/conf
. - Every SPYT cluster startup a random port will be chosen from specified range.
When you start SPYT cluster you can also specify a fixed port or change interval parameters:
spark-launch-yt ... --params {'spark_conf'={'spark.shuffle.service.port'="27123"}}
spark-launch-yt ... --params {'spark_conf'={'spark.shuffle.service.port.interval.start'="19400"}}
It's merged. New port binding will be in the next release.