GiantVM/Linux-DSM

Using TCP/IP for DSM

moharaka opened this issue · 4 comments

Hi,

I am trying to compile using the TCP/IP for network communication. However, when I compile I get this error:

$ make
...
arch/x86/kvm/krdma.c: In function ‘krdma_connect_single’:
arch/x86/kvm/krdma.c:325:2: warning: ignoring return value of ‘kstrtol’, declared with attribute warn_unused_result [-Wunused-result]
  kstrtol(port, 10, &portdec);
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~
arch/x86/kvm/krdma.c: In function ‘krdma_listen’:
arch/x86/kvm/krdma.c:550:2: warning: ignoring return value of ‘kstrtol’, declared with attribute warn_unused_result [-Wunused-result]
  kstrtol(port, 10, &portdec);
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~
  CC      arch/x86/kvm/dsm.o
arch/x86/kvm/dsm.c: In function ‘kvm_dsm_init’:
arch/x86/kvm/dsm.c:573:19: error: assignment from incompatible pointer type [-Werror=incompatible-pointer-types]
  network_ops.send = ktcp_send;
                   ^
arch/x86/kvm/dsm.c:574:22: error: assignment from incompatible pointer type [-Werror=incompatible-pointer-types]
  network_ops.receive = ktcp_receive;
                      ^
arch/x86/kvm/dsm.c:553:2: warning: ignoring return value of ‘copy_from_user’, declared with attribute warn_unused_result [-Wunused-result]
  copy_from_user(user_cluster_iplist, params->cluster_iplist, sizeof(void *) *
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    params->cluster_iplist_len);
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~
arch/x86/kvm/dsm.c:561:3: warning: ignoring return value of ‘strncpy_from_user’, declared with attribute warn_unused_result [-Wunused-result]
   strncpy_from_user(kvm->arch.cluster_iplist[i], user_cluster_iplist[i], 20);
   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
cc1: some warnings being treated as errors
scripts/Makefile.build:293: recipe for target 'arch/x86/kvm/dsm.o' failed
make[2]: *** [arch/x86/kvm/dsm.o] Error 1
scripts/Makefile.build:544: recipe for target 'arch/x86/kvm' failed
make[1]: *** [arch/x86/kvm] Error 2
Makefile:995: recipe for target 'arch/x86' failed
make: *** [arch/x86] Error 2

Any idea on the issue?

Hi,

Unfortunately, we don't complete this part of the code. If you do want to use TCP, here's some advice:

Multiple vCPU threads share one communication channel backed by TCP/RDMA with another node. When a thread sends a request via the channel, you need to make sure the received response belongs to this thread. A mutex that protects send->receive is not recommended, unless you want to be drowned in the swamp of deadlocks. What we do for RDMA is that each send->receive pair is associated with a transaction id (tx_add->txid). The code in ivy.c guarantees that whenever DSM software issues network transmission, a txid is generated in send and DSM software tries to retrieve the response from receive with this txid. You may need to manage a buffer in ktcp.c. Consider how TCP handles disordered packets.

In addition, you may be disappointed to find TCP is too slow to boot a vanilla Linux like Ubuntu. (Light-weighted experimental OSes like sv6, Barrelfish are okay) The swap device booting may be timeout, soft lockup may be triggered, etc. You probably know the reason why few people research DSM in the 21st century.

TCP shouldn't be that much slower... is this just a current implementation limitation?

Well, for a single packet delivery, TCP is ~10 times slower than RDMA. And the e2e results might be even worse (think about the queuing theory). Some time-sensitive services for Linux (e.g., waiting for some devices) may fail without hacking the guest.

Assume you are referring to latency then?