Azure/blobporter

Blobporter might have memory management issues (leak/crash on Linux)

udf2457 opened this issue · 5 comments

Hi,

I left a long running process going in the background .... blobporter worked happily away for about an hour before it crashed whilst trying to upload a 6.3GB file (I had a 100MB block size set).

I found the attached in my logs.

System is CentOS Linux release 7.3.1611 (Core)
3.10.0-514.16.1.el7.x86_64 #1 SMP Wed Apr 12 15:04:24 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
Blobporter 0.5.01
go version go1.8.1 linux/amd64

blobporter_mem.txt

I can easily replicate this if I just try the 6GB file on its own, nothing else is happening on this machine, so the memory you see being eaten up on vmstat is blobporter !

sudo vmstat -SM 1 10000 > mem.txt 
blobporter -b 100MB -c testonly -f "/path/to/my/file" -n "/dest/path"
BlobPorter 
Copyright (c) Microsoft Corporation. 
Version: 0.5.01
---------------
Info! The container doesn't exist. Creating it...
Transfer Task: file-blockblob
Files to Transfer:
Source: /path/to/file Size:6670024704 
Killed

$ cat mem.txt 
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 0  1    145   1557      0    186    0    0   145    48    6    5  1  0 99  0  0
 0  0    145   1557      0    186    0    0    32    24  146  169  0  0 100  0  0
 3 21    145   1179      0    332    0    0 149724     0 5297 1650  2  8 75 15  0
 0 24    145    104      0    852    0    0 532480     0 80924  585  0 49  0 51  0
 1 23    145     84      0    199    0    0 736084     0 51615  791  0 38  0 62  0
10 21    284     50      0    204    1  139 873912 142812 111994 1780  0 66  0 34  0
12 16    421     51      0    185    0  137 1312200 140328 133062 2728  0 85  0 15  0
 2 24    598     51      0    220    0  179 933460 183624 162575 2496  0 82  0 18  0
 4 25    703     50      0    182    4  204 1328120 209204 165776 2516  0 83  0 16  0
 1 28    896     68      0    209   10  108 1261832 111596 131452 3366  0 81  1 18  0
 4 27   1094     70      0    221    9  186 861392 190636 181771 3158  0 82  0 18  0
 1 22   1217     50      0    204    8  190 1074868 195112 179606 3617  3 84  0 13  0
21 20   1300     73      0    182   10   71 1476860 73140 60032 2897  2 67  7 24  0
 6 26   1558     73      0    220    3  213 899848 218750 231021 3553  1 76  0 23  0
 4 22   1643     52      0    184    4   87 1226264 89140 123544 2601  2 81  2 15  0
13 26   1823     51      0    231    3  181 830652 185656 195640 2809  1 89  0 10  0
 1 28   1911     69      0    204    4   89 937036 92140 111168 2482  1 67  3 29  0
15 24   1951     54      0    174    1   40 240948 41852 48043 2010  0 30  0 70  0
 0  8   1973     76      0    198    3   23 68340 24052 24493 1409  0 19  1 80  0
 8 25   1993     51      0    163    4   76 441952 78280 77997 6828  0 29  0 71  0
28 12   2047     51      0    122    5    2 1910140  2916 118675 1980  1 98  0  1  0
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 0  0    196   1528      0    179    6   52 990300 54244 161260 2734  0 51 34 16  0
 0  0    196   1528      0    179    0    0    52     0  119  152  0  0 99  0  0
 0  0    196   1528      0    179    0    0    96     0  130  167  0  0 100  0  0

Thanks for the feedback. There are a few design choices that will impact memory. The size of the read parts channel (queue between readers and workers/writers) is equal to the number of readers, but caps at 30. So a block size of 100MB could mean 3GB in memory, if the readers fill the buffer.

We have a work item to make this cap smaller, but for now you can try with a smaller block size and lower number of readers (-r e.g. 10).

Thanks for the information. I guess you could maybe also, in addition to whatever work item you're working on, introduce another command parameter to enable people to define a cap ?

Thanks for the suggestion, also consider that with the number of readers you already "control" the buffer size, if you reduce this number, the cap will be "reduced" as a result. In short, give it a try with a smaller reader count and a smaller block size, for a 6GB file the default block size should suffice.

Updated cap in v0.5.02 and updated documentation.