lukechampine/user

Error on resuming upload

grigzy28 opened this issue · 6 comments

root@sia-test:~# user upload -m 10 fedora30.tar.xz.50hosts
fedora30.tar.xz.50hosts                                                                                                                                                                                                                        100%   44.17 MB    2.02 MB/s    
root@sia-test:~# cp DOGP.zip DOGP.zip.50hosts
root@sia-test:~# user upload -m 10 DOGP.zip.50hosts
DOGP.zip.50hosts                                                                                                                                                                                                                               38%   217.04 MB   749.8 KB/s    
Upload failed: could not upload to some hosts:
76f9101f: read tcp 192.168.1.4:59218->136.61.3.89:9982: i/o timeout
root@sia-test:~# user upload -m 10 DOGP.zip.50hosts
DOGP.zip.50hosts                                                                                                                                                                                                                               19%   217.04 MB        0 B/s    
Upload failed: file is not writeable

I am going to assume that the file that is not writable is the DOGP.zip.50hosts.usa file. Is that correct?

Could the usa file have remained locked after the error occurred?

As you can see the first upload to the 50hosts with the 44 mb file was successfully completed.
The second file is 217 mb and failed with the i/o time error and then the second failure was that the file is not writable. The file was uploaded consecutively after the first had finished.

Deleted the usa file, so it wasn't still locked. Successfully uploaded the third time without error. The second got an i/o timeout after 77% and then the same error of file not writable. When trying to resume it.

Looks like the problem lies here:

user/meta.go

Line 157 in f78fc35

pf, err := fs.OpenFile(name, os.O_APPEND, 0, 0)

When the file is opened for resuming, the wrong permissions are used -- it has O_APPEND, but it needs O_WRONLY as well.

I should probably add an fs.OpenWriteable method (or some better name), since it's always bugged me that you need to use the fully generic OpenFile method just to reopen a file with write permissions.

Should be fixed by 14f5388.
Can you build from master, and close this issue if the problem is resolved?

Downloaded and recompiled, verified it was 14f5388 build.

root@sia-test:~# user upload -m 10 DOGP.zip.test50 
DOGP.zip.test50                                                                                                                                                                                                                                66%   125.83 MB  406.92 MB/s    panic: runtime error: invalid memory address or nil pointer dereference
	panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x5fe64b]

goroutine 1 [running]:
lukechampine.com/us/renter.(*SectorBuilder).Append(0x0, 0xc030c00000, 0x400000, 0x400000, 0x2589f369a29a1673, 0x3800bfa62ea77a67, 0xf98d775f3b12fba3, 0xad0e52a2f70dbca7)
	/root/go/src/lukechampine.com/us/renter/upload.go:50 +0x4b
lukechampine.com/us/renter/renterutil.(*PseudoFS).fillSectors(0xc00006e1e0, 0xc00006e4e0, 0xc0001cc920, 0x415bff)
	/root/go/src/lukechampine.com/us/renter/renterutil/fileops.go:242 +0x6a1
lukechampine.com/us/renter/renterutil.(*PseudoFS).flushSectors(0xc00006e1e0, 0x8a2a80, 0xc00006e230)
	/root/go/src/lukechampine.com/us/renter/renterutil/fileops.go:262 +0x178
lukechampine.com/us/renter/renterutil.(*PseudoFS).Close(0xc00006e1e0, 0x0, 0x0)
	/root/go/src/lukechampine.com/us/renter/renterutil/filesystem.go:279 +0x91
panic(0x8051e0, 0xc08840)
	/usr/local/go/src/runtime/panic.go:522 +0x1b5
lukechampine.com/us/renter.(*SectorBuilder).Append(0x0, 0xc024c00000, 0x400000, 0x400000, 0x2589f369a29a1673, 0x3800bfa62ea77a67, 0xf98d775f3b12fba3, 0xad0e52a2f70dbca7)
	/root/go/src/lukechampine.com/us/renter/upload.go:50 +0x4b
lukechampine.com/us/renter/renterutil.(*PseudoFS).fillSectors(0xc00006e1e0, 0xc00006e4e0, 0xc0001ccf00, 0x0)
	/root/go/src/lukechampine.com/us/renter/renterutil/fileops.go:242 +0x6a1
lukechampine.com/us/renter/renterutil.(*PseudoFS).flushSectors(0xc00006e1e0, 0xc00006e4e0, 0x400000)
	/root/go/src/lukechampine.com/us/renter/renterutil/fileops.go:262 +0x178
lukechampine.com/us/renter/renterutil.(*PseudoFS).fileWriteAt(0xc00006e1e0, 0xc00006e4e0, 0xc019008000, 0x2800000, 0x2800000, 0x5000000, 0xc00006a1c0, 0x0, 0xc0001cd0b0)
	/root/go/src/lukechampine.com/us/renter/renterutil/fileops.go:500 +0x356
lukechampine.com/us/renter/renterutil.(*PseudoFS).fileWrite(0xc00006e1e0, 0xc00006e4e0, 0xc019008000, 0x2800000, 0x2800000, 0xc000000300, 0xc0001cd108, 0x43621c)
	/root/go/src/lukechampine.com/us/renter/renterutil/fileops.go:339 +0x60
lukechampine.com/us/renter/renterutil.PseudoFile.Write(0xc000094260, 0xf, 0x0, 0x401, 0xc00006e1e0, 0xc019008000, 0x2800000, 0x2800000, 0x0, 0x0, ...)
	/root/go/src/lukechampine.com/us/renter/renterutil/filesystem.go:399 +0x23e
main.(*trackWriter).Write(0xc00006e660, 0xc019008000, 0x2800000, 0x2800000, 0x2800000, 0x0, 0x0)
	/root/go/src/lukechampine.com/user/progress.go:33 +0xab
io.copyBuffer(0x928220, 0xc00006e660, 0x928660, 0xc000096090, 0xc019008000, 0x2800000, 0x2800000, 0x0, 0x0, 0xc000032000)
	/usr/local/go/src/io/io.go:404 +0x1fb
io.CopyBuffer(...)
	/usr/local/go/src/io/io.go:375
main.trackUpload(0xc0001d02d0, 0xc000096090, 0x0, 0x2800000)
	/root/go/src/lukechampine.com/user/progress.go:97 +0x43c
main.resumeuploadmetafile(0xc000096090, 0xc00008e180, 0x24, 0xc000094260, 0x13, 0x0, 0x0)
	/root/go/src/lukechampine.com/user/meta.go:166 +0x413
main.main()
	/root/go/src/lukechampine.com/user/main.go:403 +0xe64
root@sia-test:~# 

Got this after it tried to resume an upload at 66% ---- could this error be caused by the following?

root@sia-test:~# user upload -m 10 DOGP.zip.test50 
DOGP.zip.test50                                                                                                                                                                                                                                66%   125.83 MB    2.89 MB/s    DOGP.zip.test50                                                                                                                                                                                                                                66%   125.83 MB    1.57 MB/s    
Upload failed: could not upload to some hosts:
f037506e: contract has insufficient collateral to support modification
root@sia-test:~# user contracts disable f037506e
Disabled contract by removing symlink /root/.config/user/contracts-enabled/f037506e-8fd7e6a3.contract

I had removed a contract due to lack of funds (I guess) during the middle of an upload.

ok. Seems like the code is assuming that it still has the contract you disabled. I think the right thing to do is to immediately return a "no contract for host" error. You would then need to migrate the file (in order to replace the missing host with a new one) or delete the metafile and start over with one fewer host.

I pushed a fix for this in lukechampine/us@bd4f473.

Okay, created a 1.2gb file to ensure that the upload would get an error during upload(i/o timeout) and it did. Tried to resume upload at 33% and it continued to 41% where it got another error(out of funds.) So the initial bug is fixed where it can re-open the usa file for resuming upload. The second bug discussed was also corrected by wording saying that it is missing a contract when attempting to upload a partial file with a missing contract.