ug! no event found for type 843
jhoblitt opened this issue · 8 comments
Building the rpm from c225517
on centos 7 results in the logs being spammed with ug! no event found for type 843
. The ug!
message is repeated 16468 times in the journal but there are also journal rate limit messages, so the total is probably much higher.
-- Logs begin at Sun 2022-05-15 21:28:23 UTC, end at Mon 2022-05-16 18:20:01 UTC. --
May 16 18:17:20 foo06.example.org systemd[1]: Starting RAS daemon to log the RAS events...
May 16 18:17:22 foo06.example.org rasdaemon[12671]: rasdaemon: Page offline choice on Corrected Errors is soft
May 16 18:17:22 foo06.example.org rasdaemon[12671]: rasdaemon: Threshold of memory Corrected Errors is 50 / 24h
May 16 18:17:22 foo06.example.org rasdaemon[12672]: rasdaemon: ras:mc_event event enabled
May 16 18:17:22 foo06.example.org rasdaemon[12671]: rasdaemon: ras:mc_event event enabled
May 16 18:17:22 foo06.example.org rasdaemon[12671]: rasdaemon: Enabled event ras:mc_event
May 16 18:17:22 foo06.example.org rasdaemon[12672]: rasdaemon: ras:aer_event event enabled
May 16 18:17:22 foo06.example.org rasdaemon[12672]: rasdaemon: mce:mce_record event enabled
May 16 18:17:22 foo06.example.org rasdaemon[12672]: rasdaemon: ras:extlog_mem_event event enabled
May 16 18:17:22 foo06.example.org rasdaemon[12672]: rasdaemon: Can't write to set_event
May 16 18:17:22 foo06.example.org rasdaemon[12672]: rasdaemon: Can't write to set_event
May 16 18:17:22 foo06.example.org rasdaemon[12672]: rasdaemon: Can't write to set_event
May 16 18:17:22 foo06.example.org rasdaemon[12672]: rasdaemon: block:block_rq_complete event enabled
May 16 18:17:22 foo06.example.org rasdaemon[12672]: rasdaemon: Can't write to set_event
May 16 18:17:22 foo06.example.org rasdaemon[12671]: rasdaemon: ras:aer_event event enabled
May 16 18:17:22 foo06.example.org rasdaemon[12671]: rasdaemon: Enabled event ras:aer_event
May 16 18:17:22 foo06.example.org rasdaemon[12671]: rasdaemon: Can't get ras:non_standard_event traces. Perhaps this feature is not supported on your system.
May 16 18:17:22 foo06.example.org rasdaemon[12671]: rasdaemon: Can't get traces from ras:non_standard_event
May 16 18:17:22 foo06.example.org rasdaemon[12671]: rasdaemon: Can't get ras:arm_event traces. Perhaps this feature is not supported on your system.
May 16 18:17:22 foo06.example.org rasdaemon[12671]: rasdaemon: Can't get traces from ras:arm_event
May 16 18:17:22 foo06.example.org rasdaemon[12671]: rasdaemon: mce:mce_record event enabled
May 16 18:17:22 foo06.example.org rasdaemon[12671]: rasdaemon: Enabled event mce:mce_record
May 16 18:17:22 foo06.example.org rasdaemon[12671]: rasdaemon: ras:extlog_mem_event event enabled
May 16 18:17:22 foo06.example.org rasdaemon[12671]: rasdaemon: Enabled event ras:extlog_mem_event
May 16 18:17:22 foo06.example.org rasdaemon[12671]: rasdaemon: Can't get net:net_dev_xmit_timeout traces. Perhaps this feature is not supported on your system.
May 16 18:17:22 foo06.example.org rasdaemon[12671]: rasdaemon: Can't get devlink:devlink_health_report traces. Perhaps this feature is not supported on your system.
May 16 18:17:22 foo06.example.org rasdaemon[12671]: rasdaemon: Can't get traces from devlink:devlink_health_report
May 16 18:17:22 foo06.example.org rasdaemon[12671]: rasdaemon: Can't write to filter file
May 16 18:17:22 foo06.example.org rasdaemon[12671]: rasdaemon: Can't get ras:memory_failure_event traces. Perhaps this feature is not supported on your system.
May 16 18:17:22 foo06.example.org rasdaemon[12671]: rasdaemon: Can't get traces from ras:memory_failure_event
May 16 18:17:22 foo06.example.org rasdaemon[12671]: rasdaemon: Listening to events for cpus 0 to 63
May 16 18:17:22 foo06.example.org systemd[1]: Started RAS daemon to log the RAS events.
May 16 18:17:22 foo06.example.org rasdaemon[12671]: rasdaemon: Recording mc_event events
May 16 18:17:22 foo06.example.org rasdaemon[12671]: rasdaemon: Recording aer_event events
May 16 18:17:22 foo06.example.org rasdaemon[12671]: rasdaemon: Recording extlog_event events
May 16 18:17:22 foo06.example.org rasdaemon[12671]: rasdaemon: Recording mce_record events
May 16 18:17:22 foo06.example.org rasdaemon[12671]: rasdaemon: Recording non_standard_event events
May 16 18:17:22 foo06.example.org rasdaemn[12671]: rasdaemon: Recording arm_event events
May 16 18:17:22 foo06.example.org rasdaemon[12671]: rasdaemon: Recording devlink_event events
May 16 18:17:22 foo06.example.org rasdaemon[12671]: rasdaemon: Recording disk_errors events
May 16 18:17:22 foo06.example.org rasdaemon[12671]: rasdaemon: Recording memory_failure_event events
May 16 18:17:22 foo06.example.org rasdaemon[12671]: trace-cmd: No such file or directory
May 16 18:17:22 foo06.example.org rasdaemon[12671]: ug! no event found for type 843
May 16 18:17:22 foo06.example.org rasdaemon[12671]: overriding event (968) ras:mc_event with new print handler
May 16 18:17:22 foo06.example.org rasdaemon[12671]: overriding event (967) ras:aer_event with new print handler
May 16 18:17:22 foo06.example.org rasdaemon[12671]: overriding event (82) mce:mce_record with new print handler
May 16 18:17:22 foo06.example.org rasdaemon[12671]: overriding event (969) ras:extlog_mem_event with new print handler
May 16 18:17:22 foo06.example.org rasdaemon[12671]: Calling ras_mc_event_opendb()
May 16 18:17:22 foo06.example.org rasdaemon[12671]: ug! no event found for type 843
May 16 18:17:22 foo06.example.org rasdaemon[12671]: ug! no event found for type 843
May 16 18:17:22 foo06.example.org rasdaemon[12671]: ug! no event found for type 843
I believe you have encountered a similar problem as described in issue #19
The type number was different but it basically boiled down to the fact that rasdaemon
is looking for hardware which is not able to identify and/or properly interact with. Only solution was to rebuild rasdaemon
from the sources, enabling only a subset of its features.
#19 looks extremely similar, including that I'm testing this on an epyc 7xx2 CPU. I'll try rebuilding with reduced feature flags.
Using the flags from #19 as a starting point, I was able to get a build that doesn't spam the log with the !ug
errors. It looks like --enable-diskerror
was the culprit.
--- a/misc/rasdaemon.spec.in
+++ b/misc/rasdaemon.spec.in
@@ -39,7 +39,8 @@ an utility for reporting current error counts from the EDAC sysfs files.
%setup -q
%build
-%configure --enable-all --with-sysconfdefdir=%{_sysconfdir}/sysconfig
+%configure --enable-sqlite3 --enable-aer --enable-non-standard --enable-mce --enable-extlog --enable-devlink \
+--enable-abrt-report --enable-hisi-ns-decode --enable-memory-ce-pfa --enable-memory-failure
make %{?_smp_mflags}
I may have spoken too soon. The ug!
messages are appearing now with the same build when restarting the daemon.
Even cutting the flags down to only --enable-sqlite3
doesn't resolve the log messages.
compile time options summary
============================
Sqlite3 : yes
AER : no
MCE : no
EXTLOG : no
CPER non-standard : no
ABRT report : no
HISI Kunpeng errors : no
ARM events : no
DEVLINK : no
Disk I/O errors : no
Memory Failure : no
Memory CE PFA : no
AMP RAS errors : no
It looks like the error messages are coming in a large batch every ~30s.
From looking at the libevent code it appears that the type # is coming from the kernel event message. If so, how does one map the type in back to the kernel call site?
Basically, libevent is an early version of a code that was later packaged as libtraceevent.
I updated it to use libtraceevent after version 0.7.0. This should hopefully help solving issues when decoding events, as such library is maintained altogether with the Kernel code.