rasdaemon not logging
DonKatsu opened this issue · 9 comments
Distro: Fedora 37 KDE
Kernel: 6.1.8
rasdaemon version: 0.6.8
CPU: Ryzen 9 5900x
Due to the erroneous reporting of disk errors by rasdaemon bloating my log, I deleted the files ras-mc_event.db
and ras-mc_event.db-journal
in /var/lib/rasdaemon
. After restarting the rasdaemon service clean ones were created.
Since then I had noticed it stopped logging those false disk errors. Then eventually I got another MCE error, and noticed that one wasn't logged either. (Not sure if doing that was directly related, but the timing lined up.) I reinstalled rasdaemon and waited for another one to happen to be sure.
This latest one wasn't logged either, and those supposed disk errors still aren't as well even though the service still seems to be reporting them.
Screenshot.
Journal log of a systemctl restart rasdaemon
. Those core dumps happen on a fresh boot as well.
I have uninstalled mcelog, and I don't have the ras-mc-ctl service enabled since it fails and exits due to my system not having ECC memory.
There is a known regression with Kernel 6.1. The fix depends on both adding a patch to the Linux Kernel and a change in rasdaemon. See: 6986d81
The Kernel patch was already merged and backported to Kernel 6.1.12: https://lwn.net/Articles/923307/.
I merged today the rasdaemon patch and released version 0.8.0, but Fedora packages don't contain the regression fix yet.
I'm planning to cherry-pick the fix and apply for Fedora 36 and 37 later today.
Anyway if you want to check, you can either wait for 6.1.12 or download it from koji, and build rasdaemon from the sources using make mock
, and then install the package from the SPRMS/ directory.
I added a Fedora 37 package, based on version 0.6.8: https://bodhi.fedoraproject.org/updates/FEDORA-2023-e1ccb95257. Yet, I'd appreciate feedback on version 0.8.0 as well, as it is now using libtraceevent.
I now have both kernel 6.1.12, and rasdaemon 0.6.8 which hit Fedora's stable repo last night.
rasdaemon 0.6.8 hasn't segfaulted as expected. But now it gives a SELinux denial for attempting to access dac_override
when it's started. Still, the rasdaemon processes are alive and the service is active (running)
.
After getting rasdaemon 0.6.8 and checking ras-mc-ctl --errors
I saw these reported disk errors. I hadn't checked it since making this issue, so I have to assume they were made when stated. The last modified dates for ras-mc_event.db
and ras-mc_event.db-journal
are the 23rd and 24th respectively. An hour before the 8th's entries, I had upgraded to kernel 6.1.10 and likely immediately restarted.
Hello,
The dac_override capability is requested on an access attempt where DAC permission do not allow this access and usually indicate a problem with the permissions. Please use strace to locate the files or turn on full auditing to gather more information.
1) Open the /etc/audit/rules.d/audit.rules file in an editor.
2) Remove the following line if it exists:
-a task,never
3) Add the following line to the end of the file:
-w /etc/shadow -p w
4) Restart the audit daemon:
# service auditd restart
5) Re-run your scenario.
6) Collect AVC denials:
# ausearch -i -m avc,user_avc,selinux_err,user_selinux_err -ts today
I finally had another MCE event while still on Fedora 37 with rasdaemon 0.6.8
.
Immediately after the kernel notified of the MCE error, rasdaemon immediately crashed and restarted 5 times before finally settling down and throwing its selinux denial. (Though apparently there were ones for each crash.)
It did not log the MCE error when checking with ras-mc-ctl --errors
.
Here's the journal from the event. Gist
And for some reason, it's saying rasdaemon: Old kernel detected. Stop listening and fall back to pthread way.
despite being on kernel 6.2.11
there? It still says that on Fedora 38 with kernel 6.2.13
.
I've now updated to Fedora 38, and have rasdaemon 0.8.0
.
@zpytela This is what I get after following that and restarting rasdaemon 0.8.0
:
ausearch -i -m avc,user_avc,selinux_err,user_selinux_err -ts today
----
type=AVC msg=audit(05/02/23 13:11:25.905:1301) : avc: denied { dac_override } for pid=543881 comm=rasdaemon capability=dac_override scontext=system_u:system_r:rasdaemon_t:s0 tcontext=system_u:system_r:rasdaemon_t:s0 tclass=capability permissive=0
----
type=AVC msg=audit(05/02/23 13:11:26.409:1315) : avc: denied { dac_override } for pid=543931 comm=rasdaemon capability=dac_override scontext=system_u:system_r:rasdaemon_t:s0 tcontext=system_u:system_r:rasdaemon_t:s0 tclass=capability permissive=0
----
type=AVC msg=audit(05/02/23 13:11:26.896:1329) : avc: denied { dac_override } for pid=543984 comm=rasdaemon capability=dac_override scontext=system_u:system_r:rasdaemon_t:s0 tcontext=system_u:system_r:rasdaemon_t:s0 tclass=capability permissive=0
----
type=AVC msg=audit(05/02/23 13:11:27.405:1343) : avc: denied { dac_override } for pid=544029 comm=rasdaemon capability=dac_override scontext=system_u:system_r:rasdaemon_t:s0 tcontext=system_u:system_r:rasdaemon_t:s0 tclass=capability permissive=0
----
type=AVC msg=audit(05/02/23 13:11:27.911:1359) : avc: denied { dac_override } for pid=544087 comm=rasdaemon capability=dac_override scontext=system_u:system_r:rasdaemon_t:s0 tcontext=system_u:system_r:rasdaemon_t:s0 tclass=capability permissive=0
----
type=AVC msg=audit(05/02/23 17:23:30.819:111) : avc: denied { dac_override } for pid=3215 comm=rasdaemon capability=dac_override scontext=system_u:system_r:rasdaemon_t:s0 tcontext=system_u:system_r:rasdaemon_t:s0 tclass=capability permissive=0
----
type=PROCTITLE msg=audit(05/02/23 17:43:14.738:305) : proctitle=/usr/sbin/rasdaemon -f -r
type=PATH msg=audit(05/02/23 17:43:14.738:305) : item=0 name=/sys/kernel/debug/tracing/instances/rasdaemon/buffer_percent inode=56828 dev=00:0c mode=file,440 ouid=root ogid=root rdev=00:00 obj=system_u:object_r:tracefs_t:s0 nametype=NORMAL cap_fp=none cap_fi=none cap_fe=0 cap_fver=0 cap_frootid=0
type=CWD msg=audit(05/02/23 17:43:14.738:305) : cwd=/
type=SYSCALL msg=audit(05/02/23 17:43:14.738:305) : arch=x86_64 syscall=openat success=no exit=EACCES(Permission denied) a0=AT_FDCWD a1=0x7ffee456a810 a2=O_WRONLY a3=0x0 items=1 ppid=1 pid=14786 auid=unset uid=root gid=root euid=root suid=root fsuid=root egid=root sgid=root fsgid=root tty=(none) ses=unset comm=rasdaemon exe=/usr/sbin/rasdaemon subj=system_u:system_r:rasdaemon_t:s0 key=(null)
type=AVC msg=audit(05/02/23 17:43:14.738:305) : avc: denied { dac_override } for pid=14786 comm=rasdaemon capability=dac_override scontext=system_u:system_r:rasdaemon_t:s0 tcontext=system_u:system_r:rasdaemon_t:s0 tclass=capability permissive=0
Thank you, I can confirm that. I've created a kernel bz to make the file read-write.
https://bugzilla.redhat.com/show_bug.cgi?id=2192910
Since that kernel change was implemented, I've no longer seen any rasdaemon related selinux denials.
I am still getting repeated crashes from rasdaemon 0.8.0 however. This is from the start of my most recent session.
journal_snip.txt
coredumpctl_gdb_rasdaemon.txt
@DonKatsu The service was starting on my vm without errors, so please file a new bz on the ras component.
Sorry, I didn't mean to imply the crashing was to do with selinux.
Had an MCE event today after nothing for two months.
Didn't get picked up by rasdaemon again, ras-mc-ctl --errors
still shows No MCE errors
. rasdaemon had crashed at the same time the corrected error was reported by the kernel.
log.txt