WARNING at lib/errseq.c:74 errseq_set+0x5d/0x70
transient-sepia opened this issue · 5 comments
Distribution
Arch Linux
Architecture
amd64
Kernel version
6.1.14-1-lts
Blksnap version
6.0.0.1060
Bug description
Upon starting the backup job the veeam process loads one of the CPUs to 100% as well as overloads the storage with write operations and continuously dumps stack traces to system journal. Sometimes the server can freeze, sometimes not.
An example of a stack trace:
Feb 28 10:56:10 lab-s01apl-n11 kernel: ------------[ cut here ]------------
Feb 28 10:56:10 lab-s01apl-n11 kernel: err = 2048
Feb 28 10:56:10 lab-s01apl-n11 kernel: WARNING: CPU: 0 PID: 907 at lib/errseq.c:74 errseq_set+0x5d/0x70
Feb 28 10:56:10 lab-s01apl-n11 kernel: Modules linked in: blksnap(OE) bdevfilter(OE) nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_tables ip6table_nat ip6ta>
Feb 28 10:56:10 lab-s01apl-n11 kernel: ata_piix floppy i8042 serio
Feb 28 10:56:10 lab-s01apl-n11 kernel: CPU: 0 PID: 907 Comm: peer local sock Tainted: G D W OE 6.1.14-1-lts #1 7925312b640a5b06828693b7718bd24fac67f308
Feb 28 10:56:10 lab-s01apl-n11 kernel: Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
Feb 28 10:56:10 lab-s01apl-n11 kernel: RIP: 0010:errseq_set+0x5d/0x70
Feb 28 10:56:10 lab-s01apl-n11 kernel: Code: 74 34 89 d8 f0 0f b1 17 89 c1 39 c3 74 04 39 c2 75 08 89 c8 5b c3 cc cc cc cc 89 c3 eb c9 48 c7 c7 8b 55 89 97 e8 59 f7 64 00 <0f> 0b 89 d9 5b 89 c8 c3 cc cc cc cc 89 d9 eb d8 0f 1f 00 f3 0f 1e
Feb 28 10:56:10 lab-s01apl-n11 kernel: RSP: 0018:ffff97d381e0fac8 EFLAGS: 00010282
Feb 28 10:56:10 lab-s01apl-n11 kernel: RAX: 0000000000000000 RBX: 0000000000000005 RCX: 0000000000000027
Feb 28 10:56:10 lab-s01apl-n11 kernel: RDX: ffff8a4bb7c21668 RSI: 0000000000000001 RDI: ffff8a4bb7c21660
Feb 28 10:56:10 lab-s01apl-n11 kernel: RBP: ffff97d381e0fb28 R08: 0000000000000000 R09: ffff97d381e0f950
Feb 28 10:56:10 lab-s01apl-n11 kernel: R10: 0000000000000003 R11: ffffffff980b4cc8 R12: 0000000000000800
Feb 28 10:56:10 lab-s01apl-n11 kernel: R13: ffff97d381e0fb28 R14: ffffbcd14424f9c0 R15: ffff97d381e0fb28
Feb 28 10:56:10 lab-s01apl-n11 kernel: FS: 00007fb518ff96c0(0000) GS:ffff8a4bb7c00000(0000) knlGS:0000000000000000
Feb 28 10:56:10 lab-s01apl-n11 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Feb 28 10:56:10 lab-s01apl-n11 kernel: CR2: 00007f15699f6028 CR3: 000000014a70c000 CR4: 00000000000006f0
Feb 28 10:56:10 lab-s01apl-n11 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Feb 28 10:56:10 lab-s01apl-n11 kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Feb 28 10:56:10 lab-s01apl-n11 kernel: Call Trace:
Feb 28 10:56:10 lab-s01apl-n11 kernel: <TASK>
Feb 28 10:56:10 lab-s01apl-n11 kernel: iomap_do_writepage+0x519/0x820
Feb 28 10:56:10 lab-s01apl-n11 kernel: write_cache_pages+0x187/0x4c0
Feb 28 10:56:10 lab-s01apl-n11 kernel: ? iomap_read_end_io+0x2d0/0x2d0
Feb 28 10:56:10 lab-s01apl-n11 kernel: iomap_writepages+0x20/0x40
Feb 28 10:56:10 lab-s01apl-n11 kernel: xfs_vm_writepages+0x7d/0xb0 [xfs 818216781e8df52a59a3af14ed6e45507d3e7091]
Feb 28 10:56:10 lab-s01apl-n11 kernel: do_writepages+0xc4/0x1d0
Feb 28 10:56:10 lab-s01apl-n11 kernel: filemap_fdatawrite_wbc+0x63/0x90
Feb 28 10:56:10 lab-s01apl-n11 kernel: __filemap_fdatawrite_range+0x5c/0x80
Feb 28 10:56:10 lab-s01apl-n11 kernel: file_write_and_wait_range+0x45/0xa0
Feb 28 10:56:10 lab-s01apl-n11 kernel: xfs_file_fsync+0x5e/0x2a0 [xfs 818216781e8df52a59a3af14ed6e45507d3e7091]
Feb 28 10:56:10 lab-s01apl-n11 kernel: __x64_sys_fsync+0x3b/0x70
Feb 28 10:56:10 lab-s01apl-n11 kernel: do_syscall_64+0x5f/0x90
Feb 28 10:56:10 lab-s01apl-n11 kernel: ? __fget_light+0x9d/0x100
Feb 28 10:56:10 lab-s01apl-n11 kernel: ? mutex_lock+0x12/0x30
Feb 28 10:56:10 lab-s01apl-n11 kernel: ? ksys_lseek+0x86/0xb0
Feb 28 10:56:10 lab-s01apl-n11 kernel: ? syscall_exit_to_user_mode+0x1b/0x40
Feb 28 10:56:10 lab-s01apl-n11 kernel: ? do_syscall_64+0x6b/0x90
Feb 28 10:56:10 lab-s01apl-n11 kernel: ? do_syscall_64+0x6b/0x90
Feb 28 10:56:10 lab-s01apl-n11 kernel: ? do_syscall_64+0x6b/0x90
Feb 28 10:56:10 lab-s01apl-n11 kernel: ? do_syscall_64+0x6b/0x90
Feb 28 10:56:10 lab-s01apl-n11 kernel: entry_SYSCALL_64_after_hwframe+0x63/0xcd
Feb 28 10:56:10 lab-s01apl-n11 kernel: RIP: 0033:0x7fb57511643a
Feb 28 10:56:10 lab-s01apl-n11 kernel: Code: 00 00 0f 05 48 3d 00 f0 ff ff 77 44 c3 0f 1f 00 48 83 ec 18 89 7c 24 0c e8 e3 4b f8 ff 8b 7c 24 0c 89 c2 b8 4a 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 36 89 d7 89 44 24 0c e8 43 4c f8 ff 8b 44 24
Feb 28 10:56:10 lab-s01apl-n11 kernel: RSP: 002b:00007fb518ff8210 EFLAGS: 00000293 ORIG_RAX: 000000000000004a
Feb 28 10:56:10 lab-s01apl-n11 kernel: RAX: ffffffffffffffda RBX: 000000000524f318 RCX: 00007fb57511643a
Feb 28 10:56:10 lab-s01apl-n11 kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000014
Feb 28 10:56:10 lab-s01apl-n11 kernel: RBP: 000000000005c8c0 R08: 0000000000000001 R09: 0000000000000000
Feb 28 10:56:10 lab-s01apl-n11 kernel: R10: 000000000524f288 R11: 0000000000000293 R12: 0000000000000002
Feb 28 10:56:10 lab-s01apl-n11 kernel: R13: 00007fb518ff8300 R14: 0000000005268560 R15: 0000000000000000
Feb 28 10:56:10 lab-s01apl-n11 kernel: </TASK>
Feb 28 10:56:10 lab-s01apl-n11 kernel: ---[ end trace 0000000000000000 ]---
I tried to decode it with decode_stacktrace.sh, but I'm not very familiar with debugging so I hope it can somehow help you identify the possible issue.
Decoded stack trace:
Feb 28 10:56:10 lab-s01apl-n11 kernel: ------------[ cut here ]------------
Feb 28 10:56:10 lab-s01apl-n11 kernel: err = 2048
Feb 28 10:56:10 lab-s01apl-n11 kernel: WARNING: CPU: 0 PID: 907 at lib/errseq.c:74 errseq_set (??:?)
Feb 28 10:56:10 lab-s01apl-n11 kernel: Modules linked in: blksnap(OE) bdevfilter(OE) nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_tables ip6table_nat ip6ta>
Feb 28 10:56:10 lab-s01apl-n11 kernel: ata_piix floppy i8042 serio
Feb 28 10:56:10 lab-s01apl-n11 kernel: CPU: 0 PID: 907 Comm: peer local sock Tainted: G D W OE 6.1.14-1-lts #1 7925312b640a5b06828693b7718bd24fac67f308
Feb 28 10:56:10 lab-s01apl-n11 kernel: Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
Feb 28 10:56:10 lab-s01apl-n11 kernel: RIP: 0010:errseq_set (??:?)
Feb 28 10:56:10 lab-s01apl-n11 kernel: Code: 74 34 89 d8 f0 0f b1 17 89 c1 39 c3 74 04 39 c2 75 08 89 c8 5b c3 cc cc cc cc 89 c3 eb c9 48 c7 c7 8b 55 89 97 e8 59 f7 64 00 <0f> 0b 89 d9 5b 89 c8 c3 cc cc cc cc 89 d9 eb d8 0f 1f 00 f3 0f 1e
All code
========
0: 74 34 je 0x36
2: 89 d8 mov %ebx,%eax
4: f0 0f b1 17 lock cmpxchg %edx,(%rdi)
8: 89 c1 mov %eax,%ecx
a: 39 c3 cmp %eax,%ebx
c: 74 04 je 0x12
e: 39 c2 cmp %eax,%edx
10: 75 08 jne 0x1a
12: 89 c8 mov %ecx,%eax
14: 5b pop %rbx
15: c3 ret
16: cc int3
17: cc int3
18: cc int3
19: cc int3
1a: 89 c3 mov %eax,%ebx
1c: eb c9 jmp 0xffffffffffffffe7
1e: 48 c7 c7 8b 55 89 97 mov $0xffffffff9789558b,%rdi
25: e8 59 f7 64 00 call 0x64f783
2a:* 0f 0b ud2 <-- trapping instruction
2c: 89 d9 mov %ebx,%ecx
2e: 5b pop %rbx
2f: 89 c8 mov %ecx,%eax
31: c3 ret
32: cc int3
33: cc int3
34: cc int3
35: cc int3
36: 89 d9 mov %ebx,%ecx
38: eb d8 jmp 0x12
3a: 0f 1f 00 nopl (%rax)
3d: f3 repz
3e: 0f .byte 0xf
3f: 1e (bad)
Code starting with the faulting instruction
===========================================
0: 0f 0b ud2
2: 89 d9 mov %ebx,%ecx
4: 5b pop %rbx
5: 89 c8 mov %ecx,%eax
7: c3 ret
8: cc int3
9: cc int3
a: cc int3
b: cc int3
c: 89 d9 mov %ebx,%ecx
e: eb d8 jmp 0xffffffffffffffe8
10: 0f 1f 00 nopl (%rax)
13: f3 repz
14: 0f .byte 0xf
15: 1e (bad)
Feb 28 10:56:10 lab-s01apl-n11 kernel: RSP: 0018:ffff97d381e0fac8 EFLAGS: 00010282
Feb 28 10:56:10 lab-s01apl-n11 kernel: RAX: 0000000000000000 RBX: 0000000000000005 RCX: 0000000000000027
Feb 28 10:56:10 lab-s01apl-n11 kernel: RDX: ffff8a4bb7c21668 RSI: 0000000000000001 RDI: ffff8a4bb7c21660
Feb 28 10:56:10 lab-s01apl-n11 kernel: RBP: ffff97d381e0fb28 R08: 0000000000000000 R09: ffff97d381e0f950
Feb 28 10:56:10 lab-s01apl-n11 kernel: R10: 0000000000000003 R11: ffffffff980b4cc8 R12: 0000000000000800
Feb 28 10:56:10 lab-s01apl-n11 kernel: R13: ffff97d381e0fb28 R14: ffffbcd14424f9c0 R15: ffff97d381e0fb28
Feb 28 10:56:10 lab-s01apl-n11 kernel: FS: 00007fb518ff96c0(0000) GS:ffff8a4bb7c00000(0000) knlGS:0000000000000000
Feb 28 10:56:10 lab-s01apl-n11 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Feb 28 10:56:10 lab-s01apl-n11 kernel: CR2: 00007f15699f6028 CR3: 000000014a70c000 CR4: 00000000000006f0
Feb 28 10:56:10 lab-s01apl-n11 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Feb 28 10:56:10 lab-s01apl-n11 kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Feb 28 10:56:10 lab-s01apl-n11 kernel: Call Trace:
Feb 28 10:56:10 lab-s01apl-n11 kernel: <TASK>
Feb 28 10:56:10 lab-s01apl-n11 kernel: iomap_do_writepage (??:?)
Feb 28 10:56:10 lab-s01apl-n11 kernel: write_cache_pages (??:?)
Feb 28 10:56:10 lab-s01apl-n11 kernel: ? iomap_read_end_io (??:?)
Feb 28 10:56:10 lab-s01apl-n11 kernel: iomap_writepages (??:?)
readelf: Error: Not an ELF file - it has the wrong magic bytes at the start
WARNING! Modules path isn't set, but is needed to parse this symbol
Feb 28 10:56:10 lab-s01apl-n11 kernel: xfs_vm_writepages+0x7d/0xb0 xfs
Feb 28 10:56:10 lab-s01apl-n11 kernel: do_writepages (??:?)
Feb 28 10:56:10 lab-s01apl-n11 kernel: filemap_fdatawrite_wbc (??:?)
Feb 28 10:56:10 lab-s01apl-n11 kernel: __filemap_fdatawrite_range (??:?)
Feb 28 10:56:10 lab-s01apl-n11 kernel: file_write_and_wait_range (??:?)
readelf: Error: Not an ELF file - it has the wrong magic bytes at the start
WARNING! Modules path isn't set, but is needed to parse this symbol
Feb 28 10:56:10 lab-s01apl-n11 kernel: xfs_file_fsync+0x5e/0x2a0 xfs
Feb 28 10:56:10 lab-s01apl-n11 kernel: __x64_sys_fsync (??:?)
Feb 28 10:56:10 lab-s01apl-n11 kernel: do_syscall_64 (??:?)
Feb 28 10:56:10 lab-s01apl-n11 kernel: ? __fget_light (??:?)
Feb 28 10:56:10 lab-s01apl-n11 kernel: ? mutex_lock (??:?)
Feb 28 10:56:10 lab-s01apl-n11 kernel: ? ksys_lseek (??:?)
Feb 28 10:56:10 lab-s01apl-n11 kernel: ? syscall_exit_to_user_mode (??:?)
Feb 28 10:56:10 lab-s01apl-n11 kernel: ? do_syscall_64 (??:?)
Feb 28 10:56:10 lab-s01apl-n11 kernel: ? do_syscall_64 (??:?)
Feb 28 10:56:10 lab-s01apl-n11 kernel: ? do_syscall_64 (??:?)
Feb 28 10:56:10 lab-s01apl-n11 kernel: ? do_syscall_64 (??:?)
Feb 28 10:56:10 lab-s01apl-n11 kernel: entry_SYSCALL_64_after_hwframe (??:?)
Feb 28 10:56:10 lab-s01apl-n11 kernel: RIP: 0033:0x7fb57511643a
Feb 28 10:56:10 lab-s01apl-n11 kernel: Code: 00 00 0f 05 48 3d 00 f0 ff ff 77 44 c3 0f 1f 00 48 83 ec 18 89 7c 24 0c e8 e3 4b f8 ff 8b 7c 24 0c 89 c2 b8 4a 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 36 89 d7 89 44 24 0c e8 43 4c f8 ff 8b 44 24
All code
========
0: 00 00 add %al,(%rax)
2: 0f 05 syscall
4: 48 3d 00 f0 ff ff cmp $0xfffffffffffff000,%rax
a: 77 44 ja 0x50
c: c3 ret
d: 0f 1f 00 nopl (%rax)
10: 48 83 ec 18 sub $0x18,%rsp
14: 89 7c 24 0c mov %edi,0xc(%rsp)
18: e8 e3 4b f8 ff call 0xfffffffffff84c00
1d: 8b 7c 24 0c mov 0xc(%rsp),%edi
21: 89 c2 mov %eax,%edx
23: b8 4a 00 00 00 mov $0x4a,%eax
28: 0f 05 syscall
2a:* 48 3d 00 f0 ff ff cmp $0xfffffffffffff000,%rax <-- trapping instruction
30: 77 36 ja 0x68
32: 89 d7 mov %edx,%edi
34: 89 44 24 0c mov %eax,0xc(%rsp)
38: e8 43 4c f8 ff call 0xfffffffffff84c80
3d: 8b .byte 0x8b
3e: 44 rex.R
3f: 24 .byte 0x24
Code starting with the faulting instruction
===========================================
0: 48 3d 00 f0 ff ff cmp $0xfffffffffffff000,%rax
6: 77 36 ja 0x3e
8: 89 d7 mov %edx,%edi
a: 89 44 24 0c mov %eax,0xc(%rsp)
e: e8 43 4c f8 ff call 0xfffffffffff84c56
13: 8b .byte 0x8b
14: 44 rex.R
15: 24 .byte 0x24
Feb 28 10:56:10 lab-s01apl-n11 kernel: RSP: 002b:00007fb518ff8210 EFLAGS: 00000293 ORIG_RAX: 000000000000004a
Feb 28 10:56:10 lab-s01apl-n11 kernel: RAX: ffffffffffffffda RBX: 000000000524f318 RCX: 00007fb57511643a
Feb 28 10:56:10 lab-s01apl-n11 kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000014
Feb 28 10:56:10 lab-s01apl-n11 kernel: RBP: 000000000005c8c0 R08: 0000000000000001 R09: 0000000000000000
Feb 28 10:56:10 lab-s01apl-n11 kernel: R10: 000000000524f288 R11: 0000000000000293 R12: 0000000000000002
Feb 28 10:56:10 lab-s01apl-n11 kernel: R13: 00007fb518ff8300 R14: 0000000005268560 R15: 0000000000000000
Feb 28 10:56:10 lab-s01apl-n11 kernel: </TASK>
Feb 28 10:56:10 lab-s01apl-n11 kernel: ---[ end trace 0000000000000000 ]---
Steps to reproduce
Start an existing backup job.
Expected behavior
The backup job completes successfully.
Additional informations
The same version of Veeam agent and blksnap works without issues on previous LTS kernel for Arch (5.15.94-1-lts). Also to add - blksnap.log is empty.
Hi @transient-sepia !
First, try fix.
And I think there was a situation of overflow of the snapshot.
This situation may occur, for example, if the file system with difference storage is highly fragmented.
The simplest thing to do is to place the difference storage on a separate block device.
And it is possible to contact the official support service.
Hello. Thank you for your reply @SergeiShtepa. Initially after your comment I tried 6.0.2.1168 release and the problem persisted, however with recent 6.0.3.1221 version now everything works fine. Just for clarification the kernel I used was 6.1.39-3-lts.
Closing this issue.
Thanks for the feedback.
@SergeiShtepa hi, I not see recent commits in "stable-v1.0" and "VAL-6.0" branches, is there missed push?
Hi.
Good question. I'll check it out.
I practically don't work in these branches right now.
The main job is to prepare the next patch. The difference storage will now be a file on the file system.