Unaligned access by llockd in ext4_delete_entry()
abrodkin opened this issue · 8 comments
Boot log:
[ 0.000000] Linux version 4.19.14-yocto-standard (oe-user@oe-host) (gcc version 8.2.1 20180814 (GCC)) #1 SMP PREEMPT Thu Feb 7 12:43:04 UTC 2019
...
[ 4.151331] Misaligned Access
[ 4.155771] Path: /bin/busybox.nosuid
[ 4.159419] CPU: 3 PID: 174 Comm: rm Not tainted 4.19.14-yocto-standard #1
[ 4.166274]
[ 4.166274] [ECR ]: 0x000d0000 => Check Programmer's Manual
[ 4.173551] [EFA ]: 0xbeaec3fc
[ 4.173551] [BLINK ]: ext4_delete_entry+0xce/0x224
[ 4.173551] [ERET ]: ext4_delete_entry+0x176/0x224
[ 4.186363] [STAT32]: 0x80080002 : IE K
[ 4.190614] BTA: 0x9024795a SP: 0xbe375ec4 FP: 0x00000000
[ 4.196194] LPS: 0x9074b214 LPE: 0x9074b218 LPC: 0x00000000
[ 4.201759] r00: 0x00000000 r01: 0x0000090d r02: 0x00000001
[ 4.201759] r03: 0x00000000 r04: 0x00000000 r05: 0xbea8ecb0
[ 4.201759] r06: 0xbeaec3fc r07: 0x00000400 r08: 0x00000002
[ 4.201759] r09: 0x00000000 r10: 0x000002b4 r11: 0xbeaec32c
[ 4.201759] r12: 0x9024795a r13: 0x9004e574 r14: 0x0008e150
[ 4.201759] r15: 0x00098a68 r16: 0x0008cbec r17: 0x00097fe4
[ 4.201759] r18: 0x00097fe4 r19: 0x0008e150 r20: 0x0008f0f8
[ 4.201759] r21: 0x000000ae r22: 0x0008f0f8 r23: 0x00000000
[ 4.201759] r24: 0x00000000 r25: 0x00000000
Disassembly of problematic code:
90247a02:» 222f 1192 » llockd» r10,[r6]
90247a06:» 0a13 1081 » brne.nt»r10,r2,18» ;90247a16 <ext4_delete_entry+0x18a>
90247a0a:» 0b0f 10c1 » brne.nt»r11,r3,14» ;90247a16 <ext4_delete_entry+0x18a>
Note this kernel version (v4.19.14 as well as latest in 4.19.y series v4.19.20) doesn't have my patch that fixes Etnaviv GPU, see torvalds@a66d972.
Hm with vanilla Linux v4.19.19 and initramfs I cannot reproduce that problem.
Mentioned torvalds@a66d972 made no difference. Which means this is not statically allocated atomic64_t
and we need to look into it now.
Do note that LLOCKD by default needs data to be 64-bit aligned. I'm checking with hw folks if that restriction holds true even AD is enabled.
Both LLOCK and EX transactions need to be aligned regardless of the AD bit, i.e.:
LLOCK: 32-bit aligned
LLOCKD: 64-bit aligned
In ur case above, r6 is 0xbeaec3fc so it is not 64-bit aligned !
@vineetgarc we already knew all that since [1] which ended-up with torvalds@a66d972.
So reason for "Misaligned Access" is clear, what's not clear is:
1.* Which atomic64_t
causes this new failure
2. How to solve this once (1) above is done
Unfortunately my patch for devm_xxx()
doesn't help here.
[1] http://lists.infradead.org/pipermail/linux-snps-arc/2018-July/004009.html
So problematic atomic is inode->i_version
, see https://elixir.bootlin.com/linux/v4.19.14/source/include/linux/fs.h#L656
And failure happens in atomic64_cmpxchg()
, see https://elixir.bootlin.com/linux/v4.19.14/source/include/linux/iversion.h#L198
Stack Trace:
atomic64_cmpxchg
inode_maybe_inc_iversion
inode_inc_iversion
ext4_generic_delete_entry
ext4_delete_entry
ext4_rmdir
vfs_rmdir
do_rmdir
EV_Trap
What's worse obvious "fix" doesn't help:
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 7b6084854bfe..d1daa09c3bc6 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -653,7 +653,7 @@ struct inode {
struct hlist_head i_dentry;
struct rcu_head i_rcu;
};
- atomic64_t i_version;
+ atomic64_t i_version __aligned(sizeof(atomic64_t));
atomic_t i_count;
atomic_t i_dio_count;
atomic_t i_writecount;
We still get:
[ 4.015732] EXT4-fs (mmcblk0p2): re-mounted. Opts: (null)
[ 4.167881]
[ 4.167881] Misaligned Access
[ 4.172356] Path: /bin/busybox.nosuid
[ 4.176004] CPU: 2 PID: 171 Comm: rm Not tainted 4.19.14-yocto-standard #1
[ 4.182851]
[ 4.182851] [ECR ]: 0x000d0000 => Check Programmer's Manual
[ 4.190061] [EFA ]: 0xbeaec3fc
[ 4.190061] [BLINK ]: ext4_delete_entry+0x210/0x234
[ 4.190061] [ERET ]: ext4_delete_entry+0x13e/0x234
[ 4.202985] [STAT32]: 0x80080002 : IE K
[ 4.207236] BTA: 0x9009329c SP: 0xbe5b1ec4 FP: 0x00000000
[ 4.212790] LPS: 0x9074b118 LPE: 0x9074b120 LPC: 0x00000000
[ 4.218348] r00: 0x00000040 r01: 0x00000021 r02: 0x00000001
[ 4.218348] r03: 0x00000000 r04: 0x00000002 r05: 0x00000000
[ 4.218348] r06: 0x000000c6 r07: 0x00000000 r08: 0x9050f140
[ 4.218348] r09: 0x000000c6 r10: 0x0000000a r11: 0x00000000
[ 4.218348] r12: 0x90247a9c r13: 0x9004e574 r14: 0x0008e150
[ 4.218348] r15: 0x000989b8 r16: 0x0008cbec r17: 0x0009806c
[ 4.218348] r18: 0x0009806c r19: 0x0008e150 r20: 0x0008f0f8
[ 4.218348] r21: 0x000000ab r22: 0x0008f0f8 r23: 0x00000000
[ 4.218348] r24: 0x00000000 r25: 0x00000000
[ 4.218348]
[ 4.218348]
[ 4.270510]
[ 4.270510] Stack Trace:
[ 4.274510] ext4_delete_entry+0x13e/0x234
[ 4.278695] ext4_rmdir+0xe0/0x238
[ 4.282187] vfs_rmdir+0x50/0xf0
[ 4.285492] do_rmdir+0x9e/0x154
[ 4.288802] EV_Trap+0x110/0x114
The culprit was in slab allocator used for inodes.
Even though default ARCH_SLAB_MINALIGN
is set in quite a sensible way:
#ifndef ARCH_SLAB_MINALIGN
#define ARCH_SLAB_MINALIGN __alignof__(unsigned long long)
#endif
see https://elixir.bootlin.com/linux/latest/source/include/linux/slab.h#L213
The problem in case of ARC is __alignof__(unsigned long long)
= 4!
And then solution is as simple as to define ARCH_SLAB_MINALIGN
= 8 for us, see http://lists.infradead.org/pipermail/linux-snps-arc/2019-February/005423.html