Start with root/systemd unit fails
artemklevtsov opened this issue · 14 comments
Describe the bug
Failed with NVIDIA control by root
user.
# LANG=C fancon
No protocol specified
20/05/06 20:27 [109117] <warning> $XAUTHORITY env variable(s) not set, set to enable NVIDIA support
20/05/06 20:27 [109117] <fatal> X11Display must be connected
terminate called after throwing an instance of 'std::exception'
what(): std::exception
But I can't run as normal user:
$ funcon
20/05/06 20:32 [109236] <fatal> Must be run as root
Seems service which needs running the Xorg should be running by non root user.
Steps to Reproduce
- build
fancon
with NVIDIA support - run
fancon
byroot
ro with systemd unit file
Expected behavior
Run without errors with systemd unit file.
Additional context
Config:
config {
update_interval: 1000
dynamic: true
smoothing_intervals: 3
top_stickiness_intervals: 2
temp_averaging_intervals: 3
}
devices {
fan {
type: SYS
label: "hwmon1/fan2"
sensor: "hwmon0/Tdie"
temp_to_rpm: "40: 0%, 50: 20%, 60: 50%, 70: 80%, 80: 100%"
rpm_to_pwm: "0: 30, 175: 34, 181: 36, 203: 38, 212: 40, 219: 42, 239: 44, 249: 46, 261: 48, 262: 50, 266: 52, 283: 54, 297: 56, 310: 58, 320: 60, 336: 62, 349: 64, 352: 66, 370: 68, 377: 70, 398: 72, 406: 74, 418: 76, 424: 78, 451: 80, 466: 82, 476: 84, 496: 86, 498: 88, 499: 90, 518: 92, 533: 94, 543: 96, 547: 98, 569: 100, 573: 102, 584: 104, 597: 106, 606: 108, 625: 110, 633: 112, 639: 114, 648: 116, 656: 118, 669: 120, 685: 122, 695: 124, 699: 126, 717: 128, 722: 130, 727: 132, 734: 134, 754: 136, 758: 138, 779: 140, 781: 142, 790: 144, 791: 146, 810: 148, 821: 150, 841: 152, 849: 154, 856: 156, 861: 158, 867: 160, 882: 162, 889: 164, 903: 166, 916: 168, 950: 170, 954: 172, 964: 174, 968: 176, 976: 178, 982: 180, 984: 182, 1000: 184, 1021: 186, 1038: 188, 1040: 190, 1042: 192, 1044: 194, 1051: 196, 1065: 198, 1094: 200, 1102: 202, 1109: 204, 1116: 206, 1128: 208, 1129: 210, 1147: 212, 1149: 214, 1159: 216, 1171: 218, 1185: 220, 1190: 222, 1198: 224, 1217: 226, 1220: 228, 1236: 230, 1250: 232, 1256: 234, 1261: 236, 1274: 238, 1275: 242, 1299: 244, 1305: 246, 1309: 250, 1323: 252, 1337: 254, 1356: 255"
pwm_path: "/sys/class/hwmon/hwmon1/pwm2"
rpm_path: "/sys/class/hwmon/hwmon1/fan2_input"
enable_path: "/sys/class/hwmon/hwmon1/pwm2_enable"
driver_flag: 5
}
fan {
type: NVIDIA
label: "1660_SUPER"
sensor: "1660_SUPER_temp"
temp_to_rpm: "60: 0%, 70: 30%, 80: 50%, 90: 100%"
rpm_to_pwm: "1315: 102, 2454: 188, 2646: 204, 3303: 255"
start_pwm: 102
}
sensor {
type: NVIDIA
label: "1660_SUPER_temp"
}
sensor {
label: "hwmon0/Tdie"
input_path: "/sys/class/hwmon/hwmon0/temp1_input"
max_path: "/sys/class/hwmon/hwmon0/temp1_max"
}
}
Try the latest commit. It should fix the crash but may result in NVIDIA control being disabled.
Please run sudo fancon log-lvl=debug
Then try:
fancon requires X11 access due to LibNVCtrl (NVIDIA control), but opening an X11 display requires XAUTHORITY and DISPLAY to be set. XAUTHORITY is set by the display manager (gdm, lightdm etc.)
Manually configure the unset environmental variable(s)
Inside /etc/profile
export XAUTHORITY=...
; You can find the XAuthority file by runningxauth info
xhost si:localuser:root
; May be necessary on Wayland
https://wiki.archlinux.org/index.php/Running_GUI_applications_as_root
Thank for the quick reply and fixes. I will test it soon. But now I want to note that fancon
should starts after Xorg (display manager) only if it needs control over NVIDIA.
Seems my system have not the graphical-session.target
. So I got the following error:
$ sudo systemctl start fancon.service
Failed to start fancon.service: Unit graphical-session.target not found.
Some details:
$ file /etc/systemd/system/display-manager.service
/etc/systemd/system/display-manager.service: symbolic link to /usr/lib/systemd/system/sddm.service
$ systemctl list-units --type=target
UNIT LOAD ACTIVE SUB DESCRIPTION
basic.target loaded active active Basic System
cryptsetup.target loaded active active Local Encrypted Volumes
getty.target loaded active active Login Prompts
graphical.target loaded active active Graphical Interface
local-fs-pre.target loaded active active Local File Systems (Pre)
local-fs.target loaded active active Local File Systems
multi-user.target loaded active active Multi-User System
network-online.target loaded active active Network is Online
network.target loaded active active Network
nss-lookup.target loaded active active Host and Network Name Lookups
paths.target loaded active active Paths
rpc_pipefs.target loaded active active rpc_pipefs.target
rpcbind.target loaded active active RPC Port Mapper
slices.target loaded active active Slices
sockets.target loaded active active Sockets
sound.target loaded active active Sound Card
swap.target loaded active active Swap
sysinit.target loaded active active System Initialization
timers.target loaded active active Timers
Can I start service as normal user with systemctl --user
?
Try the nvidia-testing branch, it adds an additional reload step for nvidia devices once the (now) graphical.target is triggered. Hopefully graphical.target is sufficient to open the X display without issues. Unfortunately I no longer have a nvidia gpu so I have been unable to test for a while, so thanks for all your help.
root is required for changing fan speeds.
I tried the nvidia-testing
branch. The fancon.service
still crashed:
$ journalctl -u fancon.service -o cat | tail
Started fancon.
No protocol specified
<warning> $XAUTHORITY env variable(s) not set, set to enable NVIDIA support
<fatal> X11Display must be connected
terminate called after throwing an instance of 'std::exception'
what(): std::exception
fancon.service: Main process exited, code=dumped, status=6/ABRT
fancon.service: Failed with result 'core-dump'.
I can investigate the specifics of the crash with that core dump. Also if you could provide the output of fancon -i
As it seems you're also on Wayland you will need to add xhost +si:localuser:root
to your /etc/profile as specified above, if not done already
I use KDE with SDDM (Xorg session). Still crash with xhost:
$ xhost si:localuser:root
localuser:root being added to access control list
$ LANG=C sudo su - -c 'fancon -v'
20/05/08 22:45 [179770] <debug> Guessing X11 env var $DISPLAY, consider setting
20/05/08 22:45 [179770] <debug> Guessing X11 env var $XAUTHORITY, consider setting
20/05/08 22:45 [179770] <warning> $XAUTHORITY env variable(s) not set, set explicitly to enable NVIDIA control
20/05/08 22:45 [179770] <debug> X11 display cannot be opened
20/05/08 22:45 [179770] <warning> NVIDIA sensor configured but NVIDIA control is disabled at this time
20/05/08 22:45 [179770] <fatal> X11 display couldn't be opened but is being used anyway!
terminate called after throwing an instance of 'std::exception' std::exception
Aborted (core dumped)
With root and XAUTHORITY
:
$ LANG=C sudo su -c 'export XAUTHORITY=/tmp/xauth-1000-_0; fancon -v'
WARNING: Unable to locate/open X configuration file.
nvidia-xconfig could not be found, either install it, or set the coolbits value manually
20/05/08 22:55 [182003] <error> Failed to query number of NVIDIA GPUs
20/05/08 22:55 [182003] <error> Failed to query number of NVIDIA GPUs
Starting controller
20/05/08 22:55 [182003] <warning> hwmon1/fan1: skipping - curve not configured & sensor not configured &
20/05/08 22:55 [182003] <warning> hwmon1/fan3: skipping - curve not configured & sensor not configured &
Why can not we skip the NVIDIA init when Xorg is not available? If I understand correct the fancon.service
should be started early without Xorg but it crash now.
Please re-test. Either way the program shouldn't crash when the init fails.
Eventually I hope to replace XNVCtrl with nvidia's newer library NVML which doesn't depend on X11, but unfortunately it's been incomplete for a couple of years
Sounds good about NVML.
I tried to enable fancon.service
and fancon-nvidia.service
. Here log after reboot:
$ LANG=C journalctl -b -u fancon.service
-- Logs begin at Sat 2020-05-09 11:34:25 +07, end at Sun 2020-05-10 19:28:32 +07. --
May 10 19:24:36 unikum-desktop systemd[1]: Started fancon.
May 10 19:24:36 unikum-desktop systemd[1]: fancon.service: Main process exited, code=killed, status=12/USR2
May 10 19:24:36 unikum-desktop systemd[1]: fancon.service: Failed with result 'signal'.
May 10 19:24:38 unikum-desktop systemd[1]: fancon.service: Scheduled restart job, restart counter is at 1.
May 10 19:24:38 unikum-desktop systemd[1]: Stopped fancon.
May 10 19:24:38 unikum-desktop systemd[1]: Started fancon.
May 10 19:24:38 unikum-desktop fancon[828]: <warning> NVIDIA sensor configured but NVIDIA control is disabled at this time
May 10 19:24:38 unikum-desktop fancon[828]: <warning> NVIDIA fan is configured but NVIDIA control is disabled at this time
May 10 19:24:38 unikum-desktop fancon[828]: Starting controller
May 10 19:24:38 unikum-desktop fancon[828]: <warning> hwmon1/fan1: skipping - curve not configured & sensor not configured
May 10 19:24:38 unikum-desktop fancon[828]: <warning> hwmon1/fan3: skipping - curve not configured & sensor not configured
$ LANG=C journalctl -b -u fancon-nvidia.service
-- Logs begin at Sat 2020-05-09 11:34:25 +07, end at Sun 2020-05-10 19:32:28 +07. --
May 10 19:24:36 unikum-desktop systemd[1]: Starting Reload fancon once NVIDIA control is possible...
May 10 19:24:36 unikum-desktop systemd[1]: fancon-nvidia.service: Succeeded.
May 10 19:24:36 unikum-desktop systemd[1]: Finished Reload fancon once NVIDIA control is possible.
Seems this appears when fancon-nvidia.service
starts:
fancon.service: Main process exited, code=killed, status=12/USR2
How can I check that NVIDIA fan controlled by fancon
?
Monitoring with NVML implemented in this repo: https://github.com/oblalex/nvidia-gpu-monitoring
It may be helpful.
Here is toy example:
main.cpp
:
#include <iostream>
#include <nvml.h>
void raise_nv_status(const nvmlReturn_t& st) {
if (st != nvmlReturn_t::NVML_SUCCESS) {
std::runtime_error(nvmlErrorString(st));
}
}
unsigned int get_fan_speed(unsigned int idx) {
nvmlReturn_t nv_status;
nvmlDevice_t handle;
nv_status = nvmlDeviceGetHandleByIndex(idx, &handle);
raise_nv_status(nv_status);
unsigned int fan_speed = 0;
nv_status = nvmlDeviceGetFanSpeed(handle, &fan_speed);
raise_nv_status(nv_status);
return fan_speed;
}
int main(int argc, char *argv[]) {
nvmlReturn_t nv_status;
nv_status = nvmlInit();
raise_nv_status(nv_status);
unsigned int device_count = 0;
nv_status = nvmlDeviceGetCount(&device_count);
raise_nv_status(nv_status);
std::cout << "Number of devices: " << device_count << std::endl;
std::cout << "Fun speeds:" << std::endl;
for (unsigned int i = 0; i < device_count; ++i) {
unsigned int speed = get_fan_speed(i);
std::cout << " Device" << i << ": " << speed << std::endl;
}
return 0;
}
CMakeLists.txt
:
cmake_minimum_required(VERSION 3.0)
project(nvtest)
add_executable(nvtest main.cpp)
include_directories(/opt/cuda/targets/x86_64-linux/include)
target_link_libraries(nvtest nvidia-ml)
install(TARGETS nvtest RUNTIME DESTINATION bin)
Works with root without Xorg.
sudo su - -c 'LANG=C /home/unikum/Projects/CPP/nvtest/build/nvtest'
Number of devices: 1
Fun speeds:
Device0: 0
Unfortunately it's missing crucial functionality, namely nvmlDeviceSetFanSpeed; see https://docs.nvidia.com/deploy/nvml-api/group__nvmlDeviceQueries.html
Their own controller is still using XNVCtrl - https://github.com/NVIDIA/nvidia-settings/blob/master/src/libXNVCtrlAttributes/NvCtrlAttributesNvml.c
There even exists some nvml code in fancon already but it's disabled by default due to requiring XNVCtrl anyway.
Thank you for the explanation. Now it's clear for me.
Seems to work without any pain we should separate the nvidia related code and run it with systemctl --user
after Xorg starts.