Blobporter might have memory management issues (leak/crash on Linux)
udf2457 opened this issue · 5 comments
Hi,
I left a long running process going in the background .... blobporter worked happily away for about an hour before it crashed whilst trying to upload a 6.3GB file (I had a 100MB block size set).
I found the attached in my logs.
System is CentOS Linux release 7.3.1611 (Core)
3.10.0-514.16.1.el7.x86_64 #1 SMP Wed Apr 12 15:04:24 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
Blobporter 0.5.01
go version go1.8.1 linux/amd64
I can easily replicate this if I just try the 6GB file on its own, nothing else is happening on this machine, so the memory you see being eaten up on vmstat is blobporter !
sudo vmstat -SM 1 10000 > mem.txt
blobporter -b 100MB -c testonly -f "/path/to/my/file" -n "/dest/path"
BlobPorter
Copyright (c) Microsoft Corporation.
Version: 0.5.01
---------------
Info! The container doesn't exist. Creating it...
Transfer Task: file-blockblob
Files to Transfer:
Source: /path/to/file Size:6670024704
Killed
$ cat mem.txt
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
0 1 145 1557 0 186 0 0 145 48 6 5 1 0 99 0 0
0 0 145 1557 0 186 0 0 32 24 146 169 0 0 100 0 0
3 21 145 1179 0 332 0 0 149724 0 5297 1650 2 8 75 15 0
0 24 145 104 0 852 0 0 532480 0 80924 585 0 49 0 51 0
1 23 145 84 0 199 0 0 736084 0 51615 791 0 38 0 62 0
10 21 284 50 0 204 1 139 873912 142812 111994 1780 0 66 0 34 0
12 16 421 51 0 185 0 137 1312200 140328 133062 2728 0 85 0 15 0
2 24 598 51 0 220 0 179 933460 183624 162575 2496 0 82 0 18 0
4 25 703 50 0 182 4 204 1328120 209204 165776 2516 0 83 0 16 0
1 28 896 68 0 209 10 108 1261832 111596 131452 3366 0 81 1 18 0
4 27 1094 70 0 221 9 186 861392 190636 181771 3158 0 82 0 18 0
1 22 1217 50 0 204 8 190 1074868 195112 179606 3617 3 84 0 13 0
21 20 1300 73 0 182 10 71 1476860 73140 60032 2897 2 67 7 24 0
6 26 1558 73 0 220 3 213 899848 218750 231021 3553 1 76 0 23 0
4 22 1643 52 0 184 4 87 1226264 89140 123544 2601 2 81 2 15 0
13 26 1823 51 0 231 3 181 830652 185656 195640 2809 1 89 0 10 0
1 28 1911 69 0 204 4 89 937036 92140 111168 2482 1 67 3 29 0
15 24 1951 54 0 174 1 40 240948 41852 48043 2010 0 30 0 70 0
0 8 1973 76 0 198 3 23 68340 24052 24493 1409 0 19 1 80 0
8 25 1993 51 0 163 4 76 441952 78280 77997 6828 0 29 0 71 0
28 12 2047 51 0 122 5 2 1910140 2916 118675 1980 1 98 0 1 0
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
0 0 196 1528 0 179 6 52 990300 54244 161260 2734 0 51 34 16 0
0 0 196 1528 0 179 0 0 52 0 119 152 0 0 99 0 0
0 0 196 1528 0 179 0 0 96 0 130 167 0 0 100 0 0
Thanks for the feedback. There are a few design choices that will impact memory. The size of the read parts channel (queue between readers and workers/writers) is equal to the number of readers, but caps at 30. So a block size of 100MB could mean 3GB in memory, if the readers fill the buffer.
We have a work item to make this cap smaller, but for now you can try with a smaller block size and lower number of readers (-r e.g. 10).
Thanks for the information. I guess you could maybe also, in addition to whatever work item you're working on, introduce another command parameter to enable people to define a cap ?
Thanks for the suggestion, also consider that with the number of readers you already "control" the buffer size, if you reduce this number, the cap will be "reduced" as a result. In short, give it a try with a smaller reader count and a smaller block size, for a 6GB file the default block size should suffice.
Updated cap in v0.5.02 and updated documentation.