pubsub_zmq aborts when running within a container
rlenferink opened this issue · 2 comments
The pubsub_zmq tests fail (SEGV) when running within a container. This is due to the user in the container possibly being the root user (uid
= 0), which makes this check succeed:
celix/bundles/pubsub/pubsub_admin_zmq/src/pubsub_zmq_topic_receiver.c
Lines 643 to 649 in e7aee12
The gotPermission
is later on used to determine whether the scheduling priority can be set:
When this is called with the user root
within a container (uid
0), but the user outside the container being a rootless user, the tests segfault (unable to call pthread_setschedparam
).
This is the line where libzmq in the end crashes:
https://github.com/zeromq/libzmq/blob/4097855ddaaa65ed7b5e8cb86d143842a594eebd/src/thread.cpp#L345
libzmq doesn't handle this too nicely and I am not sure whether this can be solved.
I tried with the suggest libcap
and after that simply falling back to using the capsh
command, but there the cap_sys_nice
can be set:
root@fedora:/home/rlenferink/workspace/asf/celix/celix-container# capsh --print
Current: =ep
Bounding set =cap_chown,cap_dac_override,cap_dac_read_search,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_linux_immutable,cap_net_bind_service,cap_net_broadcast,cap_net_admin,cap_net_raw,cap_ipc_lock,cap_ipc_owner,cap_sys_module,cap_sys_rawio,cap_sys_chroot,cap_sys_ptrace,cap_sys_pacct,cap_sys_admin,cap_sys_boot,cap_sys_nice,cap_sys_resource,cap_sys_time,cap_sys_tty_config,cap_mknod,cap_lease,cap_audit_write,cap_audit_control,cap_setfcap,cap_mac_override,cap_mac_admin,cap_syslog,cap_wake_alarm,cap_block_suspend,cap_audit_read,cap_perfmon,cap_bpf,cap_checkpoint_restore
Any suggestions to solve this?
I would like to drop support for PubSub bundles for Apache Celix 3.0.0 and if we do that, IMO this does not need to be solved.
If we would like to keep the PubSub bundles, I think the best solution is only set ZMQ_THREAD_PRIORITY
or ZMQ_THREAD_SCHED_POLICY
if this is explicitly enabled through a config property.
It is said by the documentation that the host machine's kernel should be configured properly(CONFIG_RT_GROUP_SCHED
): https://docs.docker.com/config/containers/resource_constraints/#configure-the-realtime-scheduler
And my local Ubuntu does not support this.
PubSub correctly provides configuration options for this. It seems to me a pure testing configuration issue: an additional CMake option like RUN_IN_CONTAINER
(and corresponding Conan option) should be enough to control these tests to use another set of *.properties.