Constant reports of "Hardware Error"
Opened this issue · 2 comments
PCEngines firmware version
v4.19.0.1
APU variant
apu2e
OS and OS Version
Linux 5.15.32, 6.0.7, 6.1.4
Affected component(s), peripheral(s) or functionality
The errors are reported as "corrected"… but there are a lot of them!
Brief summary
Very frequently (several times a day, at least) I'm seeing the following being logged via syslog():
[ T1030] [Hardware Error]: Corrected error, no action required.
[ T1030] [Hardware Error]: CPU:0 (16:30:1) MC0_STATUS[-|CE|-|AddrV|-|-|-]: 0x9400000000010015
[ T1030] [Hardware Error]: Error Addr: 0x00008ae14b79c000
[ T1030] [Hardware Error]: MC0 Error: L1 TLB multimatch.
[ T1030] [Hardware Error]: cache level: L1, tx: DATA
How reproducible
This has happened across a range of Linux kernels from early 5.x through to at least Linux 6.1 - and I'll test with more recent kernels shortly.
How to reproduce
Steps to reproduce the behavior:
- Boot Linux
- Watch the system logs
- Even if fully correctable, there's something amiss here...
Expected behavior
No L1 TLB multimatch
errors.
Actual behavior
Consistent errors over time.
ECC RAM detecting a memory error?
Potentially, but I assume the TLB multimatch
is something like this:
https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1575932.html
… which looks to have been fixed years ago in the general case, but still happening here?
(Plus, I'm booting with nopti
and still getting this)