SJTU-IPADS/wukong

Failed to modify RC to RTR state, No such device or address

lionty opened this issue · 6 comments

你好,我在虚拟机上安装了两台Ubuntu16.04,按照INSTALL.md的步骤安装完成后,运行出现了下面问题,想请教一下。谢谢。
root@server:/code/wukong/scripts# ./run.sh 2
[server:06477] Warning: could not find environment variable "CLASSPATH"
INFO: TOPO: 1nodes
INFO: node 0 cores: 0
INFO: #0: has 1 cores.
INFO: TOPO: 1nodes
INFO: node 0 cores: 0
INFO: #1: has 1 cores.
INFO: #0: allocate 1.04688GB memory
INFO: #1: allocate 1.04688GB memory
[librdma] : listener binding: tcp://:19344
[librdma] : listener binding: tcp://
:19344
[librdma] qp: Failed to modify RC to RTR state, No such device or address
[librdma] : recv thread exit!

这个是配置文档
cat config
global_num_proxies 1
global_num_engines 2
global_input_folder /code/wukong/datasets/id_lubm_2
global_data_port_base 5500
global_ctrl_port_base 9576
global_memstore_size_gb 1
global_rdma_buf_size_mb 8
global_rdma_rbf_size_mb 4
global_use_rdma 1
global_rdma_threshold 300
global_mt_threshold 2
global_enable_caching 0
global_enable_workstealing 0
global_silent 0
global_enable_planner 1
global_generate_statistics 1
global_enable_vattr 1

cat mpd.hosts
10.211.55.11
10.211.55.10

cat core.bind
0 1 2

虚拟机中应该不支持RDMA,你应该把CMakeLists.txt中的选项option (USE_RDMA "enable RDMA support" ON)置为OFF,使用TCP通信。

@lionty, I think your machines have no RDMA-enabled IB NIC, as mentioned by @StrikeW.
Thus, you should compile wukong without RDMA support.
Like,
$ ./build.sh -DUSE_RDMA=OFF
or
$ cmake .. -DUSE_RDMA=OFF
(no need to change CMakeLists.txt directly)

Please also check docs/INSTALL.md, which provides more details about options for building wukong.

Thank you very much!

When I ran sparql -f sparql_query/lubm/lubm_q1, I had some trouble.

wukong> sparql -f sparql_query/lubm/lubm_q1
INFO: Parsing a SPARQL query is done.
INFO: Parsing time: 804 usec
ERROR: Unsupported triple pattern [UNKNOWN|KNOWN|??]
ERROR: Assertion: /code/wukong/core/engine.hpp(execute_one_pattern:1064): 'false' failed
[ubuntu:10654] *** Process received signal ***
[ubuntu:10654] Signal: Aborted (6)
[ubuntu:10654] Signal code: (-6)
[ubuntu:10654] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x11390) [0x7f01297e5390]
[ubuntu:10654] [ 1] /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x38) [0x7f012943f428]
[ubuntu:10654] [ 2] /lib/x86_64-linux-gnu/libc.so.6(abort+0x16a) [0x7f012944102a]
[ubuntu:10654] [ 3] ../build/wukong(_ZN6Engine19execute_one_patternER11SPARQLQuery+0x353) [0x4dc063]
[ubuntu:10654] [ 4] ../build/wukong(_ZN6Engine16execute_patternsER11SPARQLQuery+0x13e) [0x4fbe6e]
[ubuntu:10654] [ 5] ../build/wukong(ZN6Engine20execute_sparql_queryER11SPARQLQueryPS+0x6e3) [0x4fc993]
[ubuntu:10654] [ 6] ../build/wukong(_ZN6Engine3runEv+0x118b) [0x4ff0bb]
[ubuntu:10654] [ 7] ../build/wukong(_Z13engine_threadPv+0x80) [0x49b860]
[ubuntu:10654] [ 8] /lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba) [0x7f01297db6ba]
[ubuntu:10654] [ 9] /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f012951141d]
[ubuntu:10654] *** End of error message ***

mpiexec noticed that process rank 1 with PID 10654 on node 10.211.55.11 exited on signal 6 (Aborted).

@lionty please create a new issue.

@lionty please create a new issue with more detailed information, including the script/config, sparql_query/lubm/lubm_q1, compile options, and previous commands if any. The output looks like the problem is some inconsistent configuration for SPARQL query parsing.