cmusatyalab/coda

kernel BUG at /build/linux-CFFAZ3/linux-4.10.0/mm/usercopy.c:75

krichter722 opened this issue · 6 comments

During copying files in the file manager nautilus I experienced the file transfer to never make any progress. codacon never shows any message, i.e. seems to hang right after start. dmesg contains

[ 1543.430037] ------------[ cut here ]------------
[ 1543.430092] kernel BUG at /build/linux-CFFAZ3/linux-4.10.0/mm/usercopy.c:75!
[ 1543.430152] invalid opcode: 0000 [#1] SMP
[ 1543.430191] Modules linked in: msr xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack libcrc32c xt_tcpudp bridge stp llc iptable_filter bbswitch(OE) binfmt_misc cdc_ether usbnet r8152 uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_core videodev media mii zfs(PO) zunicode(PO) zavl(PO) zcommon(PO) znvpair(PO) spl(O) nls_iso8859_1 intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel cryptd intel_cstate intel_rapl_perf joydev input_leds serio_raw arc4 iwldvm mac80211 iwlwifi cfg80211 lpc_ich mei_me shpchp mei ideapad_laptop sparse_keymap mac_hid ib_iser rdma_cm iw_cm ib_cm ib_core snd_hda_codec_hdmi configfs snd_hda_codec_conexant
[ 1543.430791]  snd_hda_codec_generic iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_pcm snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq snd_seq_device sunrpc snd_timer snd soundcore parport_pc ppdev lp parport coda ip_tables x_tables autofs4 btrfs xor raid6_pq hid_generic usbhid hid i915 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect psmouse sysimgblt ahci fb_sys_fops drm libahci wmi video fjes
[ 1543.431144] CPU: 3 PID: 2961 Comm: venus Tainted: P           OE   4.10.0-24-generic #28-Ubuntu
[ 1543.431214] Hardware name: LENOVO IdeaPad U410    /Lenovo          , BIOS 65CN90WW 09/25/2012
[ 1543.431282] task: ffff93be8bca4380 task.stack: ffffa652817fc000
[ 1543.431338] RIP: 0010:__check_object_size+0x77/0x1d7
[ 1543.431379] RSP: 0018:ffffa652817ffe08 EFLAGS: 00010286
[ 1543.431422] RAX: 0000000000000060 RBX: ffff93bd88530400 RCX: 0000000000000000
[ 1543.431477] RDX: 0000000000000000 RSI: ffff93beaf2cdc88 RDI: ffff93beaf2cdc88
[ 1543.431533] RBP: ffffa652817ffe28 R08: 0000000000068fae R09: 00000000000003ae
[ 1543.431588] R10: 0000000000000040 R11: ffffffffb9c487ed R12: 00000000000000c0
[ 1543.431641] R13: 0000000000000001 R14: ffff93bd885304c0 R15: 00000000000000c0
[ 1543.431698] FS:  00007f995f557b80(0000) GS:ffff93beaf2c0000(0000) knlGS:0000000000000000
[ 1543.431760] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1543.431805] CR2: 00007f49a18bd000 CR3: 000000021032b000 CR4: 00000000001406e0
[ 1543.431861] Call Trace:
[ 1543.431893]  coda_psdev_read+0x1a7/0x260 [coda]
[ 1543.431935]  ? wake_up_q+0x80/0x80
[ 1543.431969]  __vfs_read+0x18/0x40
[ 1543.431999]  vfs_read+0x96/0x130
[ 1543.432029]  SyS_read+0x55/0xc0
[ 1543.432060]  entry_SYSCALL_64_fastpath+0x1e/0xad
[ 1543.432098] RIP: 0033:0x7f995dda0890
[ 1543.432130] RSP: 002b:00007ffd95278588 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
[ 1543.432190] RAX: ffffffffffffffda RBX: 000000009b3dea88 RCX: 00007f995dda0890
[ 1543.432246] RDX: 0000000000002168 RSI: 0000563a31977730 RDI: 000000000000000a
[ 1543.432302] RBP: 0000000000000004 R08: 00007ffd952784b0 R09: 0000000000000000
[ 1543.432366] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
[ 1543.432422] R13: 00000000000001a4 R14: 00000000000003e8 R15: 0000000059512a24
[ 1543.432480] Code: 48 0f 44 d1 48 c7 c6 b6 10 6a b9 48 c7 c1 e7 60 69 b9 48 0f 44 f1 4d 89 e1 49 89 c0 48 89 d9 48 c7 c7 00 db 69 b9 e8 58 e0 f6 ff <0f> 0b e8 92 ba fb ff 85 c0 75 73 48 89 df e8 d6 48 e3 ff 84 c0 
[ 1543.432669] RIP: __check_object_size+0x77/0x1d7 RSP: ffffa652817ffe08
[ 1543.443761] ---[ end trace d0747e15526c0943 ]---

The issue is reproducible after each reboot.

experienced with 6.11.2-1+ubuntu16.10 on Ubuntu 17.04 with Linux 4.10.0-24-generic

That is a null pointer dereference in the kernel module, that's no good. Haven't seen that one before, but I'm running 3.10 on RHEL6 and 3.16 on Debian, maybe something changed for the worse in newer kernels. I'll try to reproduce with Ubuntu in a VM. The backtrace seems to indicate that this is when the Coda client tries to read an upcall from the kernel module.

Not a null pointer dereference. If i read the trace correctly there should be a kernel message immediately preceding that 'kernel BUG' message, something like "kernel memory exposure attempt detected from %p (%s) (%lu bytes)" which would indicate the kernel is trying to copy too much data in too small a buffer. It would be interesting to find out how much it is trying to copy.

Narrowed this down to the fsync upcall, it looks like it claims to pass up sizeof(union inputArgs) which is the maximum size any upcall can be, but actually passes sizeof(coda_fsync_in), which is one of the smaller upcalls. I'm rebuilding a kernel to test the patch now.

Patch works, emailed it to hopefully the right places to get it into both Linus and stable kernel trees.

It is probably going to take a while for the upstream fixes to get merged and then trickle down to the various distribution kernels. So I've revived the old linux-coda repository that was used to build the Coda kernel module outside of the main kernel tree.

There is a new linux4.x subdirectory that contains a DKMS buildable copy of the Coda kernel module from the current Linux development tree, including the patch for coda_fsync and any fixes needed to build against older Linux-4.x kernel releases (although I've only tested against 4.12 and 4.9 locally).

This fix was merged in Linux-4.15-rc1 with commit torvalds/linux@ca5b857