foss-for-synopsys-dwc-arc-processors/linux

Unaligned access by llockd in ext4_delete_entry()

abrodkin opened this issue · 8 comments

Boot log:

[    0.000000] Linux version 4.19.14-yocto-standard (oe-user@oe-host) (gcc version 8.2.1 20180814 (GCC)) #1 SMP PREEMPT Thu Feb 7 12:43:04 UTC 2019

...

[    4.151331] Misaligned Access
[    4.155771] Path: /bin/busybox.nosuid
[    4.159419] CPU: 3 PID: 174 Comm: rm Not tainted 4.19.14-yocto-standard #1
[    4.166274]
[    4.166274] [ECR   ]: 0x000d0000 => Check Programmer's Manual
[    4.173551] [EFA   ]: 0xbeaec3fc
[    4.173551] [BLINK ]: ext4_delete_entry+0xce/0x224
[    4.173551] [ERET  ]: ext4_delete_entry+0x176/0x224
[    4.186363] [STAT32]: 0x80080002 : IE K
[    4.190614] BTA: 0x9024795a   SP: 0xbe375ec4  FP: 0x00000000
[    4.196194] LPS: 0x9074b214  LPE: 0x9074b218 LPC: 0x00000000
[    4.201759] r00: 0x00000000  r01: 0x0000090d r02: 0x00000001
[    4.201759] r03: 0x00000000  r04: 0x00000000 r05: 0xbea8ecb0
[    4.201759] r06: 0xbeaec3fc  r07: 0x00000400 r08: 0x00000002
[    4.201759] r09: 0x00000000  r10: 0x000002b4 r11: 0xbeaec32c
[    4.201759] r12: 0x9024795a  r13: 0x9004e574 r14: 0x0008e150
[    4.201759] r15: 0x00098a68  r16: 0x0008cbec r17: 0x00097fe4
[    4.201759] r18: 0x00097fe4  r19: 0x0008e150 r20: 0x0008f0f8
[    4.201759] r21: 0x000000ae  r22: 0x0008f0f8 r23: 0x00000000
[    4.201759] r24: 0x00000000  r25: 0x00000000

Disassembly of problematic code:

90247a02:»      222f 1192           »   llockd» r10,[r6]
90247a06:»      0a13 1081           »   brne.nt»r10,r2,18»      ;90247a16 <ext4_delete_entry+0x18a>
90247a0a:»      0b0f 10c1           »   brne.nt»r11,r3,14»      ;90247a16 <ext4_delete_entry+0x18a>

Note this kernel version (v4.19.14 as well as latest in 4.19.y series v4.19.20) doesn't have my patch that fixes Etnaviv GPU, see torvalds@a66d972.

Hm with vanilla Linux v4.19.19 and initramfs I cannot reproduce that problem.

Mentioned torvalds@a66d972 made no difference. Which means this is not statically allocated atomic64_t and we need to look into it now.

Do note that LLOCKD by default needs data to be 64-bit aligned. I'm checking with hw folks if that restriction holds true even AD is enabled.

Both LLOCK and EX transactions need to be aligned regardless of the AD bit, i.e.:
LLOCK: 32-bit aligned
LLOCKD: 64-bit aligned

In ur case above, r6 is 0xbeaec3fc so it is not 64-bit aligned !

@vineetgarc we already knew all that since [1] which ended-up with torvalds@a66d972.

So reason for "Misaligned Access" is clear, what's not clear is:
1.* Which atomic64_t causes this new failure
2. How to solve this once (1) above is done

Unfortunately my patch for devm_xxx() doesn't help here.

[1] http://lists.infradead.org/pipermail/linux-snps-arc/2018-July/004009.html

So problematic atomic is inode->i_version, see https://elixir.bootlin.com/linux/v4.19.14/source/include/linux/fs.h#L656

And failure happens in atomic64_cmpxchg(), see https://elixir.bootlin.com/linux/v4.19.14/source/include/linux/iversion.h#L198

Stack Trace:
  atomic64_cmpxchg
  inode_maybe_inc_iversion
  inode_inc_iversion
  ext4_generic_delete_entry
  ext4_delete_entry
  ext4_rmdir
  vfs_rmdir
  do_rmdir
  EV_Trap

What's worse obvious "fix" doesn't help:

diff --git a/include/linux/fs.h b/include/linux/fs.h
index 7b6084854bfe..d1daa09c3bc6 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -653,7 +653,7 @@ struct inode {
                struct hlist_head       i_dentry;
                struct rcu_head         i_rcu;
        };
-       atomic64_t              i_version;
+       atomic64_t              i_version __aligned(sizeof(atomic64_t));
        atomic_t                i_count;
        atomic_t                i_dio_count;
        atomic_t                i_writecount;

We still get:

[    4.015732] EXT4-fs (mmcblk0p2): re-mounted. Opts: (null)
[    4.167881]
[    4.167881] Misaligned Access
[    4.172356] Path: /bin/busybox.nosuid
[    4.176004] CPU: 2 PID: 171 Comm: rm Not tainted 4.19.14-yocto-standard #1
[    4.182851]
[    4.182851] [ECR   ]: 0x000d0000 => Check Programmer's Manual
[    4.190061] [EFA   ]: 0xbeaec3fc
[    4.190061] [BLINK ]: ext4_delete_entry+0x210/0x234
[    4.190061] [ERET  ]: ext4_delete_entry+0x13e/0x234
[    4.202985] [STAT32]: 0x80080002 : IE K
[    4.207236] BTA: 0x9009329c   SP: 0xbe5b1ec4  FP: 0x00000000
[    4.212790] LPS: 0x9074b118  LPE: 0x9074b120 LPC: 0x00000000
[    4.218348] r00: 0x00000040  r01: 0x00000021 r02: 0x00000001
[    4.218348] r03: 0x00000000  r04: 0x00000002 r05: 0x00000000
[    4.218348] r06: 0x000000c6  r07: 0x00000000 r08: 0x9050f140
[    4.218348] r09: 0x000000c6  r10: 0x0000000a r11: 0x00000000
[    4.218348] r12: 0x90247a9c  r13: 0x9004e574 r14: 0x0008e150
[    4.218348] r15: 0x000989b8  r16: 0x0008cbec r17: 0x0009806c
[    4.218348] r18: 0x0009806c  r19: 0x0008e150 r20: 0x0008f0f8
[    4.218348] r21: 0x000000ab  r22: 0x0008f0f8 r23: 0x00000000
[    4.218348] r24: 0x00000000  r25: 0x00000000
[    4.218348]
[    4.218348]
[    4.270510]
[    4.270510] Stack Trace:
[    4.274510]   ext4_delete_entry+0x13e/0x234
[    4.278695]   ext4_rmdir+0xe0/0x238
[    4.282187]   vfs_rmdir+0x50/0xf0
[    4.285492]   do_rmdir+0x9e/0x154
[    4.288802]   EV_Trap+0x110/0x114

The culprit was in slab allocator used for inodes.
Even though default ARCH_SLAB_MINALIGN is set in quite a sensible way:

#ifndef ARCH_SLAB_MINALIGN
#define ARCH_SLAB_MINALIGN __alignof__(unsigned long long)
#endif

see https://elixir.bootlin.com/linux/latest/source/include/linux/slab.h#L213

The problem in case of ARC is __alignof__(unsigned long long) = 4!

And then solution is as simple as to define ARCH_SLAB_MINALIGN = 8 for us, see http://lists.infradead.org/pipermail/linux-snps-arc/2019-February/005423.html