ssh community cannot work normally with the rsocket
ling0329 opened this issue · 7 comments
I have tried to use FreeFlow open source project to test the performance of app. I noticed that you have test the rsocket with freeflow. But I got a problem when I tried to test big data app , the ssh community cannot work normally with the rsocket. The prompt message as following:
Have you ever had a similar problem? I am wondering if you can give me some advices to solve this problem ? Or even just a few names you think we should talk to. Thank you very much !
Why do you want to run ssh with rsocket? rsocket has compatibility issues with many applications. For example, rsocket does not support epoll(), so any applications using epoll() won't work. It's possible that ssh uses something that rsocket does not support.
Anyways, if you really want to run ssh with rsocket, and you are really sure it can run, you need to check whether ssh server is started with rsocket. Usually ssh server is started by Linux service, and may not carry the correct environmental variables (like LD_PRELOAD)
Thank you for your reply. The reason why we want to run ssh with rsocket is we want to run big data app based on rsocket, like hadoop, spark, on freeflow. We have configured the env LD_PRELOAD as
So I think the critical point lies in whether or not ssh uses epoll. If ssh uses epoll, we can assert that big data app can not run based on rsocket.
Running big data (or whatever) app over rsocket does not mean you need to run ssh over rsocket. ssh is usually only used for control channel, while the actual data channel (where the heavy data goes) is usually not through ssh. For example, MPI control channel can go through TCP-based ssh, while the actual MPI APIs go through RDMA network.
You should read my reply again and carefully -- the process started by Linux service may not respect what you set in the environment. You can do some quick google search https://unix.stackexchange.com/questions/44370/how-to-make-unix-service-see-environment-variables
We attempted to capture packages when getting ssh connected. The following was captured without configuring env LD_PRELOAD
But when we configured env LD_PRELOAD, we got this
It occurs exception of rst ack. We preliminary infer the configuration of env LD_PRELOAD has influence on ssh connection, though ssh still go through normal tcp socket.
Do you have any more suggestions?
If LD_PRELOAD really lets rsocket hijack TCP socket's connect(), send(), etc., you should not be able to capture any TCP handshake or SSH handshake packets.. How did you infer that LD_PRELOAD has taken effects?
Honestly, I don't think this is related to Freeflow. In addition, in general, I suggest you not use rsocket at all in production.
Thank you for your suggestion, and we decided to accept your suggestion of not using rsocket after discussions. Then turning back to paper 'freeflow', we can see that there are two approaches to transform tcp socket to rdma, rsocket and sdp, but rsocket has problems of compatibility. So can we replace rsocket with sdp? or other ways?
There is basically no mature ways to convert TCP socket to RDMA, at least no public ways. Also, the performance of converted socket would be far from optimal, since it requires at least one-copy given the socket interface.
If you are serious about using RDMA to accelerate things, you can choose 1) re-implement everything using RDMA verbs or 2) switch to an RPC framework that has RDMA option, like https://github.com/accelio/accelio used by ceph. There might be other RPC options, too. 3) find an RDMA-version of the app, if there exists. Check whether this page has something you want http://hibd.cse.ohio-state.edu/
To be clear, the paper talks about rsocket just for demonstrating the capability of Freeflow; not for overcoming rsocket's own limitations. Converting socket to RDMA is out of the scope of the paper.