Azure/blobporter

Exception downloading 2M files from Blob storage

seguler opened this issue · 2 comments

Tested with the last version. It took a long time to start the job. Does blobporter list the entire container before it can start the transfer ?

Here is the exception I got:

F:\dir2>BlobPorter -c bigdata -t blob-file
BlobPorter
Copyright (c) Microsoft Corporation.
Version: 0.5.25

Batch transfer (blob-file).
Files per Batch: 500.
Batch: 1 of 2986
--> 4 % [|........................] Committed Count: 9 Buffer Level: 000%panic: send on closed channel

goroutine 1159 [running]:
github.com/Azure/blobporter/targets.(*fileHandleManager).returnHandle(0xc0899d0d20, 0xc04218a624, 0x7, 0xc04207a180, 0x0, 0x0)
/home/travis/gopath/src/github.com/Azure/blobporter/targets/multifile.go:201 +0xc7
github.com/Azure/blobporter/targets.(*MultiFile).closeOrKeepHandle(0xc056e777a0, 0xc086d68a20, 0xc04207a180, 0xfffe00, 0x0)
/home/travis/gopath/src/github.com/Azure/blobporter/targets/multifile.go:63 +0x57
github.com/Azure/blobporter/targets.(*MultiFile).WritePart(0xc056e777a0, 0xc086d68a20, 0xc0899d2301, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
/home/travis/gopath/src/github.com/Azure/blobporter/targets/multifile.go:80 +0x280
github.com/Azure/blobporter/transfer.(*Worker).startWorker(0xc09a7b3710, 0xc0420514f0)
/home/travis/gopath/src/github.com/Azure/blobporter/transfer/transfer.go:476 +0x43b
created by github.com/Azure/blobporter/transfer.(*Transfer).startWorkers
/home/travis/gopath/src/github.com/Azure/blobporter/transfer/transfer.go:387 +0x64

Are there two or more files with the same file name (last segment of the blob name)? Blobporter by default does not keep the folder structure, so foo/bar and foo/foo1/bar will be attempted to be downloaded as bar. If so, this could be a race condition in the file handle pool that uses the filename to identify a pool of file handles for an specific file. If so this issue can be avoided by using the -p option as with this Blobporter will keep the directory structure when downloading. However, it should not panic, so I will take a look and make the experience better.

Regarding the listing performance, yes, blobporter lists the blobs to obtain information required to define the shape of the transfer. Optimization for the listing to occur per batch instead upfront for the entire transfer, is already work in progress, as part of the storage SDK upgrade. I will break this into two issues for better tracking.

Fixed race condition in #87, listing performance is still open.