Random test error from GetApproximateSizes()
albertjin opened this issue · 17 comments
When I run test on Ubuntu 12 (amd64) with leveldb built with -lsnappy.
$ go test github.com/jmhodges/levigo
Here are the errors
leveldb_test.go:128: First size range was 0 leveldb_test.go:131: Second size range was 0
This error is not persist, but just shows up sometimes. Is this really a problem?
That's a good question. I've not been using -lsnappy. I can take a look.
Hm, if this is a bug, it seems to be in the underlying leveldb implementation. Perhaps, follow up with the leveldb authors?
Seems like the writes no longer being synchronous may be related but that's the same setting as the original db/c_test.c. Otherwise, I can't find it.
I also got this error with snappy.
This can be wrong, but as far as I understand, GetApproximateSizes does not take a database log into account. Only immutable table files are used for measurements.
When the log file reaches a certain size, it is converted to a immutable table file and a new log file is created for future updates. For GetApproximateSizes to return non-zero value in this test, the amount of data written to the database must be much higher.
The error has gone after setting n to a much bigger value at line 110 of leveldb_test.go insted of n := 20000
One strange thing is that this is a random error. I tried quite some times, but db/c_test.c did not even fail once. db/c_test.c and leveldb_test.go have the same testing parameters.
In my case, the error was appearing constantly.
Are you sure you use the fresh test database every time? When I run the test, the database directory is not deleted at the end of it, so I do it manually.
Yeah, the test wasn't deleting the directory at the end of the test. However, I've corrected that in b1e2bdd. I'm going to get this going on a ubuntu system, but it might be a bit to get snappy in there. If one of you could run the latest and see what's up, I'd be much obliged.
Ubuntu 12.04, 64-bit version.
I've got the usual results. I get them every time, with no exception:
$ go test github.com/jmhodges/levigo
--- FAIL: TestC (1.55 seconds)
leveldb_test.go:128: First size range was 0
leveldb_test.go:131: Second size range was 0
leveldb_test.go:319: repair, expected Get value [], got [104 101 108 108 111]
FAIL
FAIL github.com/jmhodges/levigo 1.672s
Jeff, I just shared the scripts I use on Ubuntu to build levigo and leveldb with snappy:
https://github.com/milaz/leveldb-build
Hope that will help in building and testing.
@milaz, just to be super clear, did you run go get -u github.com/jmhodges/levigo
first? I don't believe go test
updates the checkout and the "-u" has to be provided to get explicitly.
I'll take a look myself (including a look at your pull request and build scripts) tomorrow. I'm, unfortunately, on-call right now and it's been a hectic day. Thank you for your patience.
Actually, I don't run go get -u github.com/jmhodges/levigo
. I always set up a new and clean Go workspace. Then with the scripts I'm referring above the latest repo versions of snappy, leveldb and levigo are checked out and built.
As I wrote in comment #3, the errors at lines 128 and 131 are perfectly understandable: any records residing in the log are not measured. You have to write enough data to fill the database log. The data you provide in the test can be compressed very well, so, with Snappy really enabled, this test fails.
For original author of this issue, @albertjin, the error does not always happen, as I guess, because the test (1) did not remove database directory and (2) ran always in the same directory -- see my pull request for that. So, after several runs, log filled and error has gone away.
What about the last error (line 319), I don't have any good idea why this happens. The last delete marker in a log is lost if you run database repair soon enough after you close it. Must be something related to LevelDB internals. Anyway, that error goes away if you use synchronous writes -- i.e. wo.SetSync(true)
These issues absolutely don't stop me from using levigo for my project for a couple of months, and I am very grateful to you for this awesome package.
I'm unable to reproduce on a Ubuntu 10.4 VM run on top of OS X with both the new dir deletion code in master (8c160b3) or the old code without it (7b4fdcd). That doesn't mean this isn't real edge case, just that I can't reproduce.
I don't understand how snappy figures into this as the test code disables compression explicitly. What do you mean by "really enabled"?
One more: you mention the build scripts check out a new levigo, but I don't see that in there.
Also, what kind of machine are you running on? Is it a local laptop, a linux VM on some OS, or a remote machine?
I'll try to get this up on a real linux machine soon. What version of go are you running?
I'm running Go 1.0.3 (release) on Ubuntu 12.04 64-bit on a desktop machine.
The filesystem type is ext4.
You are right about compression being disabled, I didn't notice that in the test file. Therefore, my presumptions about Snappy are wrong.
"Really enabled" is referring to some strange situation when LevelDB reports that it is building with Snappy, but does not link Snappy correctly. I found this out only after inspecting database files. That's what Makefile.patch
in my scripts for.
What about building fresh levigo with thesescripts, they allow to get and build Snappy and LevelDB with ./ctl/setup
Then, you have to set some environment variables with source ./ctl/go-env
and build/test Levigo as usual: go get github.com/jmhodges/levigo
I always do that in the fresh directory.
Just out of interest, I set up Debian 6.0.5 in VirtualBox, and tried the following there:
set -x
# Installing packages required for the build
su -c 'apt-get install mercurial subversion git build-essential autotools-dev autoconf automake'
# Building Go
mkdir gotest
cd gotest/
hg clone -u release https://code.google.com/p/go
cd go/src/
./all.bash
export PATH=$PATH:$HOME/gotest/go/bin
go version
# Building Snappy and LevelDB
cd ../../
git clone https://github.com/milaz/leveldb-build.git
cd ./leveldb-build/
./ctl/setup
# Building Levigo
source ./ctl/go-env
go get github.com/jmhodges/levigo
When running go test github.com/jmhodges/levigo
, I don't get leveldb_test.go:319: repair, expected Get value [], got [104 101 108 108 111]
error, and the following happen only once in about ten runs:
$ go test github.com/jmhodges/levigo
--- FAIL: TestC (1.30 seconds)
leveldb_test.go:128: First size range was 0
leveldb_test.go:131: Second size range was 0
FAIL
FAIL github.com/jmhodges/levigo 1.318s
What I also noticed is that maximum run time for a non-failing test for me is not more than 0.4s, while failing test takes more than a second.
The only difference from my usual desktop environment is that the filesystem is ext3, and is on virtual VDI file [that is itself on ext4 filesystem of my desktop].
Oh, jeez, I never responded here. Could you give me an strace dump (with strace -tTvf
) for both of those runs with the latest levigo code? If you can, a profile with pprof would be nice.
Is anyone still seeing this?
I got this error without snappy.
As milaz said in comment #3, I change n from 20000 to 50000, then the test case pass.
n := 50000
FYI:
I run in Ubuntu 12.10, 8G RAM, maybe it relates to ram.
I too had "First size range" issue. As mentioned in comment #3 and comment above the error disappears after setting n to a large value.
System settings: 4GB RAM, Ubuntu 12.04.