raspberrypi/linux

Memory leak in pi3 wifi driver?

Ealdwulf opened this issue · 14 comments

I am a leak of 'kmalloc-2048' slabs when running on a p3 and using ifplugd on wlan0. It runs at about 200k/minute, which means the pi runs out of memory after a day or so.
ifplugd is looking at '/proc/net/wireless`.

My original reproducer is:

sudo apt-get install ifplugd
sudo /etc/init.d/ifplugd start wlan0

But I also see a leak, probably the same one, by just doing:
while true; do cat /proc/net/wireless; done.

You can observe the leak by doing sudo watch grep kmalloc-2048 /proc/slabinfo.

I suspect the pi3 wifi driver, because:

  • The problem does not occur when running on a pi2 (using a wifi dongle)
  • The problem does not occur when not running ifplugd (or looking at /proc/net/wireless)
    (In fact there still possibly seems to be a slight leak on the pi2 when doing cat /proc/net/wireless but it's much less. I don't see it with ifplugd, which is only looking at /proc/net/wireless once per second).

It reproduces on raspbian 18/3/2016 with the latest kernel/firmware (.firmware_revision= 15ffab5493d74b12194e6bfc5bbb1c0f71140155)

This is the same problem I'm dealing with!

https://www.raspberrypi.org/forums/viewtopic.php?f=28&t=148595

While top/htop simply show the memory slowly dwindling, 'sudo slabtop' shows the same huge leak in 'kmalloc-2048' that will crash the Pi3 within a couple of days of idling (my Pi2, which is virtually identical in configuration but which uses an external WiFi adapter, has no such problems).

I also see the same "kernel: [######.######] brcmfmac: brcmf_sdio_hdparse: seq ##: sequence number error, expect ##" problem that appears to be unique to the Pi3's WiFi stack - and which seems to correlate to this memory leak.

Right before the Pi3 dies of memory starvation the OOM watchdog starts killing off processes in a last-ditch effort to keep the system running - another thing I've never seen in the syslog on my Pi2.

I was able to successfully build the kernel with kmemleak enabled - the output is interesting and is filled with the following (each pair of these leaks occurs about once per second):

unreferenced object 0xa0c1abc0 (size 2048):
  comm "ifplugd", pid 647, jiffies 458066 (age 816.840s)
  hex dump (first 32 bytes):
    <...local routing info...>
  backtrace:
    [<801497ac>] kmem_cache_alloc_trace+0x1c4/0x2a0
    [<7f257a3c>] brcmf_cfg80211_get_station+0x29c/0x344 [brcmfmac]
    [<7f150cb8>] cfg80211_wireless_stats+0xa8/0x2b0 [cfg80211]
    [<805b0338>] get_wireless_stats+0x70/0x7c
    [<805b0364>] iw_handler_get_iwstats+0x20/0x94
    [<805b0100>] ioctl_standard_call+0x2f0/0x4b8
    [<805b05c0>] wext_handle_ioctl+0x1b8/0x23c
    [<804f02d4>] dev_ioctl+0x53c/0x800
    [<804b97c4>] sock_ioctl+0x12c/0x2b0
    [<8016c5e8>] do_vfs_ioctl+0x424/0x614
    [<8016c81c>] SyS_ioctl+0x44/0x6c
    [<8000fb40>] ret_fast_syscall+0x0/0x1c
    [<ffffffff>] 0xffffffff

unreferenced object 0xa0c1c600 (size 2048):
  comm "ifplugd", pid 647, jiffies 458066 (age 816.840s)
  hex dump (first 32 bytes):
    <...local routing info...>
  backtrace:
    [<801497ac>] kmem_cache_alloc_trace+0x1c4/0x2a0
    [<7f257a3c>] brcmf_cfg80211_get_station+0x29c/0x344 [brcmfmac]
    [<7f150cb8>] cfg80211_wireless_stats+0xa8/0x2b0 [cfg80211]
    [<805b0338>] get_wireless_stats+0x70/0x7c
    [<805b07c0>] wireless_dev_seq_show+0x34/0x17c
    [<8017e5b8>] seq_read+0x3a0/0x4b8
    [<801bd5d0>] proc_reg_read+0x6c/0x94
    [<8015a31c>] __vfs_read+0x34/0xe0
    [<8015ab88>] vfs_read+0x8c/0x158
    [<8015b5b4>] SyS_read+0x54/0xb0
    [<8000fb40>] ret_fast_syscall+0x0/0x1c
    [<ffffffff>] 0xffffffff

At a leak rate of 4KB per pair, roughly every second, for a day, this is leaking about 350-400MB/day! This is very close to what I'm seeing in practice.

For reference, I've documented the full kernel build procedure including headers and (optionally) the kernel memory debug settings.

https://gist.github.com/MartyMacGyver/24be4a153fc5c02c84c1dec1c9835adb

ctc commented

This problem is not related with "Pi3B wifi brcmf_sdio_hdparse #1313" - thats a completely different issue.

ctc commented

Could you translate the addresses to line-numbers?
(http://serverfault.com/questions/605946/kernel-stack-trace-to-source-code-lines)
[<801497ac>] kmem_cache_alloc_trace+0x1c4/0x2a0
[<7f257a3c>] brcmf_cfg80211_get_station+0x29c/0x344 [brcmfmac]

Thanks for the debugging, @MartyMacGyver - I'm pretty sure you'll find the leak is from the allocation of buf on line 2410 of drivers/net/wireless/brcm80211/brcmfmac/cfg80211.c. If brcmf_fil_cmd_data_get fails then the buffer is leaked. The same bug is still present in 4.6.

Might be worth double checking all the occurrences of that call - there
appears to be another occurrence where this can happen, and I'm not sure
about the usage on line 5952 where the buf parameters comes from stack
memory.

On 19 May 2016 at 15:14, Phil Elwell notifications@github.com wrote:

Thanks for the debugging, @MartyMacGyver
https://github.com/MartyMacGyver - I'm pretty sure you'll find the leak
is from the allocation of buf on line 2410 of
drivers/net/wireless/brcm80211/brcmfmac/cfg80211.c
https://github.com/raspberrypi/linux/blob/rpi-4.4.y/drivers/net/wireless/brcm80211/brcmfmac/cfg80211.c#L2410.
If brcmf_fil_cmd_data_get fails then the buffer is leaked. The same bug is
still present in 4.6.


You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub
#1471 (comment)

ctc commented

I don't see a free at all of buf in brcmf_fill_bss_param

I know - I'm following it through to see if the pointer is saved somewhere.

The obvious fix has the obvious effect - after several minutes there have been no kernel memory leaks. I've pushed a patch.

rpi-update kernel now includes @pelwell's fix.

@pelwell - Thanks for the quick turnaround!

@popcornmix - I'll pull directly and give it a try (if there's still a leak I'll have kmemleak to look at - plus kernel headers to build against). Then I'll try the rpi-update version (I'm not quite sure yet if I'll need to force it to replace the one I created and manually installed or if it'll do it automatically).

Note that addr2line is not working for me (it doesn't give useful information for my binaries) - perhaps I don't have the right debugging enabled? It'd be useful to know how to use that correctly on the Pi, if only for future reference.

An example:

$ addr2line -f -e vmlinux 801497ac
    kmem_cache_alloc_trace
    .tmp_kallsyms2.o:?

I couldn't find any useful info for [<7f257a3c>] brcmf_cfg80211_get_station+0x29c/0x344 [brcmfmac]... I tried the .ko for that module and a few other things and it just wouldn't work.

In an effort to help devs and users become more effective at debugging and reporting bugs with better detail, I've proposed a doc page for the subject (specific to the Pi platform):

raspberrypi/documentation#363

It'd also be educational for the next generation of kernel devs!

Followup: based on the build I just did with this fix in it, this bug is fixed!

There are still a few small memory leaks seen when the system first boots up (hci_uart and bluetooth) but they aren't persistent (you lose a couple K of memory during what appears to be service startup and that's all - they aren't reappearing).

Thanks again for the quick turnaround - I expect this will dramatically improve things for other users as well!

This seems to be fixed - I've had it running overnight and kmem-2048 is still at only 176
Thanks guys!