scylladb/seastar

TCP client based on native-network-stack does not work properly on multi-core processors

sqddk opened this issue · 9 comments

I invoke seastar::connect() on the native-network-stack to access a TCP server, but unfortunately, successful connection establishment is a probabilistic event. I suspect this may be due to shard, as the ACK from the server is not being delivered to the thread that initiated the connection. It's possible that we sent the SYN on core 0, but the ACK is being delivered to core 1.
To address this issue, I had to limit SMP to 1. Is there a good way to allow the client to benefit from shard, as I believe this requires a new upgrade?

I think I have identified the root cause of the issue. In the tcp::connect(socket_address sa) method in include/net/tcp.hh, there is an operation that performs randomization on local_port.

do {
    src_port = _port_dist(_e);
    id = connid{src_ip, dst_ip, src_port, dst_port};
} while (_inet._inet.netif()->hw_queues_count() > 1 &&
         (_inet._inet.netif()->hash2cpu(id.hash(_inet._inet.netif()->rss_key())) != this_shard_id()
          || _tcbs.find(id) != _tcbs.end()));

As originally anticipated, within this loop, we expected to obtain a src_port that satisfies hash2id(dst_ip, src_ip, dst_port, src_port) == this_shard_id(). Unfortunately, due to the interception by hw_queues_count(), the randomly selected src_port may not necessarily satisfy the condition.
In DPDK-supported mode, the value of hw_queues_count() is ultimately determined by the following code:

std::min({_num_queues, _dev_info.max_rx_queues, _dev_info.max_tx_queues});

The subsequent values of max_rx_queues and max_tx_queues are determined by the specific driver implementation in the DPDK project. In my case, I am using an Intel 82753 network interface, and the corresponding driver file is drivers/net/e1000/em_ethdev.c, where these two parameters are directly set to 1. I am using DPDK version 19.05, and in the latest version 23.07, these two parameters are set to 2. Unfortunately, if I were using a higher version of DPDK, this bug wouldn't occur for me.

I can roughly understand the purpose of adding a hw_queues_count check here, indicating that the current process is only using a single core. Unfortunately, this hasn't been effective, and I think a better strategy would be to use smp::count for the check instead.

Seastar has moved to using newer DPDK, I believe (#1832 )

@mykaul However, I still believe that Seastar should not rely on the DPDK version to address this issue. Wouldn't it be a better choice to use smp::count for the check here?

@mykaul However, I still believe that Seastar should not rely on the DPDK version to address this issue. Wouldn't it be a better choice to use smp::count for the check here?

I did not look into this - Just mentioned the fact we have moved to a newer version of DPDK. I wonder what the motivation of the change in DPDK was.

@mykaul However, I still believe that Seastar should not rely on the DPDK version to address this issue. Wouldn't it be a better choice to use smp::count for the check here?

I did not look into this - Just mentioned the fact we have moved to a newer version of DPDK. I wonder what the motivation of the change in DPDK was.

It's just that when I first started working with Seastar, I thought the 22.11 branch was the latest LTS, which corresponding DPDK version is 19.05.

@sqddk i don't think Seastar has the so called LTS at the time of writing.

@sqddk i don't think Seastar has the so called LTS at the time of writing.

Perhaps my expression was a bit unclear; here, I just want to convey that the version is reliable and stable.

@mykaul However, I still believe that Seastar should not rely on the DPDK version to address this issue. Wouldn't it be a better choice to use smp::count for the check here?

could just send a pull request if you've root-caused it.