Random test error from GetApproximateSizes()

Question

Random test error from GetApproximateSizes()

albertjin opened this issue 13 years ago · 17 comments

When I run test on Ubuntu 12 (amd64) with leveldb built with -lsnappy.

$ go test github.com/jmhodges/levigo

Here are the errors

   leveldb_test.go:128: First size range was 0
   leveldb_test.go:131: Second size range was 0

This error is not persist, but just shows up sometimes. Is this really a problem?

Answer 1 · 2012-09-22T04:40:07.000Z

That's a good question. I've not been using -lsnappy. I can take a look.

Answer 2 · 2012-09-22T06:42:21.000Z

Hm, if this is a bug, it seems to be in the underlying leveldb implementation. Perhaps, follow up with the leveldb authors?

Seems like the writes no longer being synchronous may be related but that's the same setting as the original db/c_test.c. Otherwise, I can't find it.

Answer 3 · 2012-09-22T14:55:36.000Z

I also got this error with snappy.

This can be wrong, but as far as I understand, GetApproximateSizes does not take a database log into account. Only immutable table files are used for measurements.

When the log file reaches a certain size, it is converted to a immutable table file and a new log file is created for future updates. For GetApproximateSizes to return non-zero value in this test, the amount of data written to the database must be much higher.

The error has gone after setting n to a much bigger value at line 110 of leveldb_test.go insted of n := 20000

Answer 4 · 2012-09-22T15:59:04.000Z

One strange thing is that this is a random error. I tried quite some times, but db/c_test.c did not even fail once. db/c_test.c and leveldb_test.go have the same testing parameters.

Answer 5 · 2012-09-22T16:36:27.000Z

In my case, the error was appearing constantly.

Are you sure you use the fresh test database every time? When I run the test, the database directory is not deleted at the end of it, so I do it manually.

Answer 6 · 2012-09-22T19:02:43.000Z

Yeah, the test wasn't deleting the directory at the end of the test. However, I've corrected that in b1e2bdd. I'm going to get this going on a ubuntu system, but it might be a bit to get snappy in there. If one of you could run the latest and see what's up, I'd be much obliged.

Answer 7 · 2012-09-22T21:05:28.000Z

Ubuntu 12.04, 64-bit version.
I've got the usual results. I get them every time, with no exception:

$ go test github.com/jmhodges/levigo
--- FAIL: TestC (1.55 seconds)
    leveldb_test.go:128: First size range was 0
    leveldb_test.go:131: Second size range was 0
    leveldb_test.go:319: repair, expected Get value [], got [104 101 108 108 111]
FAIL
FAIL    github.com/jmhodges/levigo  1.672s

Answer 8 · 2012-09-23T00:42:02.000Z

Jeff, I just shared the scripts I use on Ubuntu to build levigo and leveldb with snappy:
https://github.com/milaz/leveldb-build

Hope that will help in building and testing.

Answer 9 · 2012-09-23T02:54:38.000Z

@milaz, just to be super clear, did you run go get -u github.com/jmhodges/levigo first? I don't believe go test updates the checkout and the "-u" has to be provided to get explicitly.

I'll take a look myself (including a look at your pull request and build scripts) tomorrow. I'm, unfortunately, on-call right now and it's been a hectic day. Thank you for your patience.

Answer 10 · 2012-09-23T10:29:55.000Z

Actually, I don't run go get -u github.com/jmhodges/levigo. I always set up a new and clean Go workspace. Then with the scripts I'm referring above the latest repo versions of snappy, leveldb and levigo are checked out and built.

As I wrote in comment #3, the errors at lines 128 and 131 are perfectly understandable: any records residing in the log are not measured. You have to write enough data to fill the database log. The data you provide in the test can be compressed very well, so, with Snappy really enabled, this test fails.

For original author of this issue, @albertjin, the error does not always happen, as I guess, because the test (1) did not remove database directory and (2) ran always in the same directory -- see my pull request for that. So, after several runs, log filled and error has gone away.

What about the last error (line 319), I don't have any good idea why this happens. The last delete marker in a log is lost if you run database repair soon enough after you close it. Must be something related to LevelDB internals. Anyway, that error goes away if you use synchronous writes -- i.e. wo.SetSync(true)

These issues absolutely don't stop me from using levigo for my project for a couple of months, and I am very grateful to you for this awesome package.

Answer 11 · 2012-09-27T07:56:19.000Z

I'm unable to reproduce on a Ubuntu 10.4 VM run on top of OS X with both the new dir deletion code in master (8c160b3) or the old code without it (7b4fdcd). That doesn't mean this isn't real edge case, just that I can't reproduce.

I don't understand how snappy figures into this as the test code disables compression explicitly. What do you mean by "really enabled"?

One more: you mention the build scripts check out a new levigo, but I don't see that in there.

Also, what kind of machine are you running on? Is it a local laptop, a linux VM on some OS, or a remote machine?

Answer 12 · 2012-09-27T07:57:49.000Z

I'll try to get this up on a real linux machine soon. What version of go are you running?

Answer 13 · 2012-09-27T12:41:21.000Z

I'm running Go 1.0.3 (release) on Ubuntu 12.04 64-bit on a desktop machine.
The filesystem type is ext4.

You are right about compression being disabled, I didn't notice that in the test file. Therefore, my presumptions about Snappy are wrong.

"Really enabled" is referring to some strange situation when LevelDB reports that it is building with Snappy, but does not link Snappy correctly. I found this out only after inspecting database files. That's what Makefile.patch in my scripts for.

What about building fresh levigo with thesescripts, they allow to get and build Snappy and LevelDB with ./ctl/setup
Then, you have to set some environment variables with source ./ctl/go-env and build/test Levigo as usual: go get github.com/jmhodges/levigo

I always do that in the fresh directory.

Just out of interest, I set up Debian 6.0.5 in VirtualBox, and tried the following there:

set -x

# Installing packages required for the build
su -c 'apt-get install mercurial subversion git build-essential autotools-dev autoconf automake'

# Building Go
mkdir gotest
cd gotest/
hg clone -u release https://code.google.com/p/go
cd go/src/
./all.bash 
export PATH=$PATH:$HOME/gotest/go/bin
go version

# Building Snappy and LevelDB
cd ../../
git clone https://github.com/milaz/leveldb-build.git
cd ./leveldb-build/
./ctl/setup

# Building Levigo
source ./ctl/go-env
go get github.com/jmhodges/levigo

When running go test github.com/jmhodges/levigo, I don't get leveldb_test.go:319: repair, expected Get value [], got [104 101 108 108 111] error, and the following happen only once in about ten runs:

$ go test github.com/jmhodges/levigo
--- FAIL: TestC (1.30 seconds)
leveldb_test.go:128:    First size range was 0
leveldb_test.go:131:    Second size range was 0
FAIL
FAIL    github.com/jmhodges/levigo  1.318s

What I also noticed is that maximum run time for a non-failing test for me is not more than 0.4s, while failing test takes more than a second.

The only difference from my usual desktop environment is that the filesystem is ext3, and is on virtual VDI file [that is itself on ext4 filesystem of my desktop].

Answer 14 · 2012-12-15T07:49:09.000Z

Oh, jeez, I never responded here. Could you give me an strace dump (with strace -tTvf) for both of those runs with the latest levigo code? If you can, a profile with pprof would be nice.

Answer 15 · 2014-01-05T06:53:23.000Z

Is anyone still seeing this?

Answer 16 · 2014-03-17T13:48:57.000Z

I got this error without snappy.
As milaz said in comment #3, I change n from 20000 to 50000, then the test case pass.

n := 50000

FYI:
I run in Ubuntu 12.10, 8G RAM, maybe it relates to ram.

Answer 17 · 2014-03-25T17:44:51.000Z

I too had "First size range" issue. As mentioned in comment #3 and comment above the error disappears after setting n to a large value.

System settings: 4GB RAM, Ubuntu 12.04.