juicedata/juicesync

"out of memory" when sync from s3 to bos

Closed this issue · 12 comments

The used percent of memory will rise continuously.
You can start a process of juicesync, which do synchronizing objects from s3(or s3 compatible, like ceph and minio) to bos. And confirm it by print MEM output.

while true; do ps aux|grep "juice\|MEM"|grep -v grep; sleep 1; done

And after a while, the process will be killed by system.

What's the objects looks like? small or big?

All of those object is media file, about 100MB each one.

I think it has nothing todo with file size.

When I run juicesync with s3 to s3, the percent of used memory is normal.

BOS support s3 compatible API. Replace domain from bj.bcebos.com to s3.bj.bcebos.com.

Maybe the problem occurred in BOS go sdk. I can submit a internal ticket to BOS team, if you can ensure the problem.

Sure, we will try to reproduce it and identify the problem.

I think it has nothing todo with file size.

When I run juicesync with s3 to s3, the percent of used memory is normal.

BOS support s3 compatible API. Replace domain from bj.bcebos.com to s3.bj.bcebos.com.

@fakeyanss I couldn't reproduce this problem, I use the latest juicedata/juicesync code, the built version is:

juicesync -V
v0.7.0-6-g168ff93, commit 168ff93, built at 2021-06-15

I tried sync 50 100MiB local data files to bos, the throught is about 100MiB/s,and the memory usage of juicesync process is about 320MiB and doesn't increase.

I've also synced s3://{{bos-bucket}}.s3.bj.bcebos.com/temp/ to bos://{{bos-bucket}}.bj.bcebos.com/temp2/, and the throughput is about 100MiB/s and the juicesync process memory is about 260MiB.

I will try to reproduce it later.

And the version of my juicesync is v0.6.2. I found some problem with the latest version (juicesync_0.7.0_Linux_x86_64.tar.gz) .

I reproduce it with this command:

nohup ./juicesync -v --threads=20 s3://ak:sk@bkt.hz-hbxycm01.becbos.baidu.com/ bos://ak:sk@bkt.bj.bcebos.com/sync_test/ >juice.log 2>&1 &

This is my verbose log print: log download
The link is temporary and will expired in 12 hours. @davies @chnliyong

search runtime stack:

grep -A 1000 'out of memory' juice.log | less

Extrally, my machine environment like this:

OS: CentOS Linux release 7.8.2003
CPU: Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz x2
Memory: Inspur SA5212M4R 32GB x6

@fakeyanss juicesync v0.6.2 use a very old version of bce-sdk-go (2018.04.01), please use 0.7 or build with latest master branch. What's the problem you face with v0.7?

Based on the logging, there is a leak of goroutines (should be leak from bce-sdk-go):

goroutine 26974 [select]:
net/http.(*persistConn).writeLoop(0xc98178cea0)
        /opt/hostedtoolcache/go/1.13.15/x64/src/net/http/transport.go:2210 +0x123
created by net/http.(*Transport).dialConn
        /opt/hostedtoolcache/go/1.13.15/x64/src/net/http/transport.go:1581 +0xb32

Sorry, I cannot reproduce the memory leak with juicesync v0.7.0. Maybe I have downloaded some wrong package with proxy tool in the repository release page(download speed from github is very very slow as you know).
Fortunately, it works good now. And Thank you again for your help.