godzie44/go-uring

Tests do not pass (fails & timeout)

Closed this issue · 7 comments

Accordingly to CI, the tests do not run on regular basis (to have an idea whether that a local issue or a global one). Running tests locally leads to failed checks because of unmet conditions and timeouts (see output below). The same behavior is observed when the tests are run with enabled x64 atomic flag (go test -v -tags amd64_atomic ./...).

Revision:

~/go-uring$ git rev-parse HEAD
79884120fd55186446a1f4215f31244c51b5cf62

OS:

~/go-uring$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 21.10
Release:        21.10
Codename:       impish

Linux:

~/go-uring$ uname -a
Linux ubuntu.localdomain 5.13.0-22-generic #22-Ubuntu SMP Fri Nov 5 13:21:36 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

Output:

~/go-uring$ go test -v ./...
?       github.com/godzie44/go-uring/example/echo-server        [no test files]
?       github.com/godzie44/go-uring/example/echo-server-multi-thread   [no test files]
?       github.com/godzie44/go-uring/example/http-server        [no test files]
=== RUN   TestListenerAccept
--- PASS: TestListenerAccept (0.61s)
PASS
ok      github.com/godzie44/go-uring/net        (cached)
=== RUN   TestNetworkReactor
=== RUN   TestNetworkReactor/TestCancelOperation
    net_test.go:90:
                Error Trace:    net_test.go:90
                Error:          An error is expected but got nil.
                Test:           TestNetworkReactor/TestCancelOperation
                Messages:       operation canceled
=== RUN   TestNetworkReactor/TestExecuteWithDeadline
    net_test.go:71:
                Error Trace:    net_test.go:71
                Error:          Should be true
                Test:           TestNetworkReactor/TestExecuteWithDeadline
--- FAIL: TestNetworkReactor (1.06s)
    --- FAIL: TestNetworkReactor/TestCancelOperation (0.49s)
    --- FAIL: TestNetworkReactor/TestExecuteWithDeadline (0.56s)
=== RUN   TestRequestID
--- PASS: TestRequestID (0.00s)
=== RUN   TestReactor
=== RUN   TestReactor/TestCancelOperation
=== RUN   TestReactor/TestExecuteWithDeadline
=== RUN   TestReactor/TestReactorExecuteReadV
--- PASS: TestReactor (2.11s)
    --- PASS: TestReactor/TestCancelOperation (1.04s)
    --- PASS: TestReactor/TestExecuteWithDeadline (1.04s)
    --- PASS: TestReactor/TestReactorExecuteReadV (0.04s)
=== RUN   TestRegistry
--- PASS: TestRegistry (0.57s)
FAIL
FAIL    github.com/godzie44/go-uring/reactor    3.739s
=== RUN   TestProbe
--- PASS: TestProbe (0.00s)
=== RUN   TestIOWQMaxWorkers
    register_test.go:42: Skipped, IORING_REGISTER_IOWQ_MAX_WORKERS not supported
--- SKIP: TestIOWQMaxWorkers (0.02s)
=== RUN   TestAccept
--- PASS: TestAccept (0.00s)
=== RUN   TestAcceptWithSQPoll
--- PASS: TestAcceptWithSQPoll (0.00s)
=== RUN   TestAcceptCancel
--- PASS: TestAcceptCancel (0.01s)
=== RUN   TestAcceptMany
--- PASS: TestAcceptMany (0.06s)
=== RUN   TestAcceptLink
--- PASS: TestAcceptLink (0.20s)
=== RUN   TestAcceptAddr
panic: test timed out after 10m0s

goroutine 15 [running]:
testing.(*M).startAlarm.func1()
        /usr/lib/go-1.17/src/testing/testing.go:1788 +0x8e
created by time.goFunc
        /usr/lib/go-1.17/src/time/sleep.go:180 +0x31

goroutine 1 [chan receive]:
testing.(*T).Run(0xc0000856c0, {0x5d2bde, 0x46b013}, 0x5e3ab0)
        /usr/lib/go-1.17/src/testing/testing.go:1307 +0x375
testing.runTests.func1(0xc000095290)
        /usr/lib/go-1.17/src/testing/testing.go:1598 +0x6e
testing.tRunner(0xc0000856c0, 0xc00010bd18)
        /usr/lib/go-1.17/src/testing/testing.go:1259 +0x102
testing.runTests(0xc000102080, {0x741ec0, 0x18, 0x18}, {0x47c30d, 0x5d2e00, 0x747140})
        /usr/lib/go-1.17/src/testing/testing.go:1596 +0x43f
testing.(*M).Run(0xc000102080)
        /usr/lib/go-1.17/src/testing/testing.go:1504 +0x51d
main.main()
        _testmain.go:89 +0x14b

goroutine 13 [syscall]:
syscall.Syscall6(0x1aa, 0xf, 0x0, 0x1, 0x1, 0x0, 0x8)
        /usr/lib/go-1.17/src/syscall/asm_linux_amd64.s:43 +0x5
github.com/godzie44/go-uring/uring.sysEnter2(0xc00019c220, 0x612280, 0x0, 0x1a0340, 0x0, 0x0, 0x0)
        /home/vagrant/go-uring/uring/syscall.go:68 +0xa5
github.com/godzie44/go-uring/uring.(*Ring).getCQEvents(0xc00019c220, {0x0, 0x1, 0x0, 0x0, 0x8})
        /home/vagrant/go-uring/uring/ring.go:316 +0x10b
github.com/godzie44/go-uring/uring.(*Ring).WaitCQEvents(...)
        /home/vagrant/go-uring/uring/ring.go:380
github.com/godzie44/go-uring/uring.TestAcceptAddr(0xc0001a0340)
        /home/vagrant/go-uring/uring/ring_accept_test.go:358 +0x2d2
testing.tRunner(0xc0001a0340, 0x5e3ab0)
        /usr/lib/go-1.17/src/testing/testing.go:1259 +0x102
created by testing.(*T).Run
        /usr/lib/go-1.17/src/testing/testing.go:1306 +0x35a

goroutine 14 [chan send]:
github.com/godzie44/go-uring/uring.dial(0xc0001a0340, 0x5e3ab0)
        /home/vagrant/go-uring/uring/ring_accept_test.go:23 +0x95
created by github.com/godzie44/go-uring/uring.TestAcceptAddr
        /home/vagrant/go-uring/uring/ring_accept_test.go:349 +0x1b4
FAIL    github.com/godzie44/go-uring/uring      600.148s
FAIL

Thanks for participating

Accordingly to CI, the tests do not run on regular basis (to have an idea whether that a local issue or a global one)

It's not a global issue, I had some trouble with github CI on any io_uring syscall, so local tests must be good

About your test results:

TestAcceptAddr

This and any network test use hardcoded address localhost:8080, can you check that this port is not in use on test running? I think this may be a problem

TestCancelOperation and TestExecuteWithDeadline

This test uses 8080 too, please try to check these ports are not in use when tests start.

Also, can you try to run all tests sequentially with -p 1?

If my assumptions are confirmed, I will try to make the necessary edits to conveniently run tests excluding port conflicts ASAP.

Also please make sure that io_uring support IORING_FEAT_EXT_ARG feature, it totally require for reactors correct work (to be more precise for the correct work of timeouts and cancellations)

This and any network test use hardcoded address localhost:8080, can you check that this port is not in use on test running? I think this may be a problem

That is completely clean environment. So I double-checked ports open anyway:

vagrant@ubuntu:~/go-uring$ sudo netstat -putan
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        0      0 127.0.0.53:53           0.0.0.0:*               LISTEN      557/systemd-resolve
tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN      661/sshd: /usr/sbin
tcp        0      0 10.0.2.15:60248         140.82.121.4:443        TIME_WAIT   -
tcp        0     44 10.0.2.15:22            10.0.2.2:55332          ESTABLISHED 6691/sshd: vagrant
tcp6       0      0 :::22                   :::*                    LISTEN      661/sshd: /usr/sbin
udp        0      0 127.0.0.53:53           0.0.0.0:*                           557/systemd-resolve
udp        0      0 10.0.2.15:68            0.0.0.0:*                           555/systemd-network

vagrant@ubuntu:~/go-uring$ python3 -c 'import socket; socket.socket().connect(("localhost", 8080))'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
ConnectionRefusedError: [Errno 111] Connection refused

Also please make sure that io_uring support IORING_FEAT_EXT_ARG feature, it totally require for reactors correct work (to be more precise for the correct work of timeouts and cancellations)

As far as I understand, accordingly to the check, the test should fail explicitly with a corresponding error. Or some other hidden traps are possible?

Also, can you try to run all tests sequentially with -p 1?

And here a weird thing comes on the scene. With -p 1 all the tests pass, and when the tests are re-run again but concurrently, they do pass again and again (sic!). Once you delete go's cache, an original issue could be observed. I guess that is kind of hidden trap :).

Thanks! This behavior looks like parallel test conflicts cause they try to use the same ports, I think need some fix of it (detect free port on every test?). Also, the original liburing test's run sequentially (and maybe I ported them too literally).

I was able to reproduce these issue in my local env, and unique ports solve this problem. Would be great if you check it on your environment now.