ssbc/go-ssb

TestStartup is flaky

Closed this issue · 4 comments

During #259, failed in https://github.com/ssbc/go-ssb/actions/runs/3725019672/jobs/6317671995:

    github.com/ssbc/go-ssb/tests

2022/12/18 13:41:52 [margaret/indexes/badger] overwrote batch limit 0
2022/12/18 13:41:52 saved identity @YloRx1QFtJiWa0etvN9iN3pVIS6rFuOe7MKTNiS/Mko=.ed25519 to testrun/TestBlobToJS/secret
TAP version 13

TestBlobToJS0

sbot spawned, running before

dialing:net:127.0.0.1:40309~shs:YloRx1QFtJiWa0etvN9iN3pVIS6rFuOe7MKTNiS/Mko=

ok 1 connected
level=debug t=441.099408ms plugin=ebt event=replicating version=3
ok 2 got blob
ok 3 no err

closed sbot

1..3

tests 3

pass 3

ok

level=debug t=1.453971889s plugin=ebt r=<@6VgT.ed25519> event="loop exited"

JS Sbot process returned
level=debug t=1.454212993s event="network listen loop exited"
level=warn t=1.454244393s unit=gossip event="live qry on rxlog exited"
level=debug t=1.454279094s event="sbot closing" msg="connections closed"
level=debug t=1.454285894s event="sbot closing" msg="waited for indexes to close"
level=info t=1.460750401s event="sbot closing" msg="closers closed"
2022/12/18 13:41:54 saved identity @UfjRJPAtDhepXvxQxUjYse6dDwpiVbXhXAF1z47llQ0=.ed25519 to testrun/TestBlobFromJS/secret
TAP version 13

TestBlobFromJS1

sbot spawned, running before

ok 1 added err
ok 2 blob id

dialing:net:127.0.0.1:46159~shs:UfjRJPAtDhepXvxQxUjYse6dDwpiVbXhXAF1z47llQ0=

ok 3 connected
ok 4 should have blob
ok 5 has err
level=debug t=428.489599ms plugin=ebt event=replicating version=3
level=info t=515.921547ms module=WantManager event=blobs.get ref=<&w6uP.sha256> msg=stored ref=<&w6uP.sha256> sz=6

closed sbot

1..5

tests 5

pass 5

ok

JS Sbot process returned
level=debug t=1.899074662s event="network listen loop exited"
level=warn t=1.899110563s unit=gossip event="live qry on rxlog exited"
level=debug t=1.899167364s plugin=ebt r=<@cw5U.ed25519> event="loop exited"
t=1.899239765s event=warning msg="still open connections" count=1
level=debug t=1.899321566s event="sbot closing" msg="connections closed"
level=debug t=1.899340767s event="sbot closing" msg="waited for indexes to close"
level=info t=1.909133029s event="sbot closing" msg="closers closed"
2022/12/18 13:41:55 saved identity @BarQLh9J29zZ/kzZWE9iM9s6/fyMiMlJRb3BBTsjHoA=.ed25519 to testrun/TestBlobWithHop/secret
TAP version 13

TestBlobWithHop2

sbot spawned, running before

ok 1 null

got hash:&X5jnorJyi0ta5WEFx7DQU8hbef/RYA0qkCZ4ptn6FX8=.sha256

ok 2 null

size12

ok 3 null

leaked blob addr in:%fq9lnY1igTYW6CgP/mkVFiGSElIJCYrCd4N4Kb+ENqA=.sha256

dialing:net:127.0.0.1:40383~shs:BarQLh9J29zZ/kzZWE9iM9s6/fyMiMlJRb3BBTsjHoA=

ok 4 connected
level=debug t=935.191294ms plugin=ebt event=replicating version=3
TAP version 13

TestBlobWithHop3

sbot spawned, running before

dialing:net:127.0.0.1:40383~shs:BarQLh9J29zZ/kzZWE9iM9s6/fyMiMlJRb3BBTsjHoA=

ok 1 connected
level=debug t=1.394880097s plugin=ebt event=replicating version=3
level=info t=1.483094955s module=WantManager event=blobs.get ref=<&X5jn.sha256> msg=stored ref=<&X5jn.sha256> sz=12
level=debug t=1.483313859s module=WantManager remote=<@fyCV.ed25519> event=blobStoreNotify op=put ref=<&X5jn.sha256> cause="broadcasting received blob" sz=12
ok 2 want err?

closed sbot

1..2

tests 2

pass 2

ok

JS Sbot process returned
level=debug t=1.493595029s event="network listen loop exited"
level=warn t=1.493616829s unit=gossip event="live qry on rxlog exited"
level=debug t=1.493686431s plugin=ebt r=<@fyCV.ed25519> event="loop exited"
level=debug t=1.493806033s plugin=ebt r=<@n3Ym.ed25519> event="loop exited"
t=1.493856533s event=warning msg="still open connections" count=1
level=debug t=1.493882234s event="sbot closing" msg="connections closed"
level=debug t=1.493888434s event="sbot closing" msg="waited for indexes to close"

closed sbot

1..4

tests 4

pass 4

ok

level=info t=1.499216822s event="sbot closing" msg="closers closed"

JS Sbot process returned
2022/12/18 13:41:57 saved identity @QZBgS5r8fCjK3Pg8P8CFkqNa1YcCl6Hub1O7Avgn0Vo=.ed25519 to testrun/TestBlobTooBigWantedByJS/secret
TAP version 13

TestBlobTooBigWantedByJS4

sbot spawned, running before

dialing:net:127.0.0.1:35133~shs:QZBgS5r8fCjK3Pg8P8CFkqNa1YcCl6Hub1O7Avgn0Vo=

level=warn t=432.807356ms event="no stored feeds - attempting re-sync with trust-on-first-use"
ok 1 connected
level=debug t=481.102755ms plugin=ebt event=replicating version=3
ok 2 got small blob
ok 3 no err
no stream for incoming msg { req: -3, stream: true, end: true, value: true, length: 4, type: 2 }

closed sbot

1..3

tests 3

pass 3

ok

JS Sbot process returned
level=debug t=6.876755303s event="network listen loop exited"
level=warn t=6.876812004s unit=gossip event="live qry on rxlog exited"
level=debug t=6.876824104s plugin=ebt r=<@PUfS.ed25519> event="loop exited"
level=debug t=6.876852205s event="sbot closing" msg="connections closed"
level=debug t=6.876966507s event="sbot closing" msg="waited for indexes to close"
level=info t=6.882398296s event="sbot closing" msg="closers closed"
2022/12/18 13:42:04 saved identity @d4O7LuzDaSA9uCYTwNG+AleCr/awmHU24iqpIAm02Ys=.ed25519 to testrun/TestBlobTooBigWantedByGo/secret
TAP version 13

TestBlobTooBigWantedByGo5

sbot spawned, running before

ok 1 null

got hash:&SqtVEGDZDsEI53s3k9lHORXAgFjYmRK7pFDcAuYoo2c=.sha256

ok 2 null

got hash:&BcIXzoawDrJrq7ETyZxLmbL/cJzPnRCu76l5Qlgw1T4=.sha256

ok 3 null

published test-data

dialing:net:127.0.0.1:34865~shs:d4O7LuzDaSA9uCYTwNG+AleCr/awmHU24iqpIAm02Ys=

ok 4 connected
level=debug t=918.873312ms plugin=ebt event=replicating version=3
level=info t=1.590924069s module=WantManager event=blobs.get ref=<&SqtV.sha256> msg=stored ref=<&SqtV.sha256> sz=5242870
level=debug t=3.963710262s event=update-replicate feed-want-count=1 hops=0 took=75.801µs
level=debug t=6.687413781s plugin=ebt r=<@IH8a.ed25519> event="loop exited"

closed sbot

1..4

tests 4

pass 4

ok

JS Sbot process returned
level=debug t=6.693969089s event="network listen loop exited"
level=warn t=6.69400749s unit=gossip event="live qry on rxlog exited"
level=debug t=6.694031191s event="sbot closing" msg="connections closed"
level=debug t=6.694037891s event="sbot closing" msg="waited for indexes to close"
level=info t=6.698856371s event="sbot closing" msg="closers closed"
2022/12/18 13:42:11 saved identity @9FqRPwt4x8lWOpPAT4wrCVUKjDkXrELZCr/X4PBs+UQ=.ed25519 to testrun/TestFeedFromJS/secret
TAP version 13

claire

sbot spawned, running before

closed sbot

1..0

tests 0

pass 0

ok

JS Sbot process returned
TAP version 13

alice

sbot spawned, running before

ok 1 parallel of publish
ok 2 message count

dialing:net:127.0.0.1:38401~shs:9FqRPwt4x8lWOpPAT4wrCVUKjDkXrELZCr/X4PBs+UQ=

ok 3 connected
level=debug t=1.300164886s plugin=ebt event=replicating version=3
level=debug t=4.354739451s event=update-replicate feed-want-count=1 hops=0 took=133.302µs
level=debug t=6.462802508s plugin=ebt r=<@31RM.ed25519> event="loop exited"

closed sbot

1..3

tests 3

pass 3

ok

JS Sbot process returned
TAP version 13

claire

sbot spawned, running before

shouldnt have alices feed:@31RMZu7FBk3d4P1QJZW+MjwkXF/4L7Vk5zBjUopOWT4=.ed25519

ok 1 follow:@31RMZu7FBk3d4P1QJZW+MjwkXF/4L7Vk5zBjUopOWT4=.ed25519
ssb-friends: get legacy api used
ok 2 friends.get of new contact
ok 3 is following
ok 4 null
ok 5 should be strictly equal

dialing:net:127.0.0.1:38401~shs:9FqRPwt4x8lWOpPAT4wrCVUKjDkXrELZCr/X4PBs+UQ=

ok 6 connected
level=debug t=7.379768314s plugin=ebt event=replicating version=3
level=debug t=7.424821861s unit=gossip fr=<@31RM.ed25519> event=gossiptx n=24 starting=0
level=debug t=7.484752854s unit=gossip fr=<@31RM.ed25519> event=gossiptx n=24 starting=0
level=debug t=10.426117492s event=update-replicate feed-want-count=2 hops=0 took=61.701µs
level=debug t=12.539685061s plugin=ebt r=<@L1Gy.ed25519> event="loop exited"

now should have feed:@31RMZu7FBk3d4P1QJZW+MjwkXF/4L7Vk5zBjUopOWT4=.ed25519

ok 7 query worked
ok 8 got all the messages
ok 9 latest sequence
ok 10 latest keys match
ok 11 latest sequence

closed sbot

1..11

tests 11

pass 11

ok

JS Sbot process returned
level=warn t=12.552339648s unit=gossip event="live qry on rxlog exited"
level=debug t=12.552352348s event="network listen loop exited"
level=debug t=12.552389649s event="sbot closing" msg="connections closed"
level=debug t=12.552411749s event="sbot closing" msg="waited for indexes to close"
level=info t=12.557356222s event="sbot closing" msg="closers closed"
2022/12/18 13:42:23 saved identity @fEr1XSmhnfc5HBOjrMIByuO4InYIspEVuBi8ID1Al24=.ed25519 to testrun/TestFeedFromGoNotLive/secret
TAP version 13

TestFeedFromGoNotLive9

sbot spawned, running before

ok 1 follow:@fEr1XSmhnfc5HBOjrMIByuO4InYIspEVuBi8ID1Al24=.ed25519
ssb-friends: get legacy api used
ok 2 friends.get of new contact
ok 3 is following

shouldnt have bobs feed:@fEr1XSmhnfc5HBOjrMIByuO4InYIspEVuBi8ID1Al24=.ed25519

ok 4 null
ok 5 should be strictly equal
ok 6 about:%MfcrG5hI0LbGzI0C6md9lMyufMvcEwQSUCP4GFrvmTk=.sha256

dialing:net:127.0.0.1:44685~shs:fEr1XSmhnfc5HBOjrMIByuO4InYIspEVuBi8ID1Al24=

ok 7 connected
level=debug t=1.978564472s unit=gossip fr=<@fEr1.ed25519> event=gossiptx n=3 starting=0
level=debug t=1.979301283s plugin=gossip event=gossiprx fr=<@n6kD.ed25519> starting=0 received=2 took=44.536259ms
level=debug t=4.97996002s event=update-replicate feed-want-count=1 hops=0 took=163.903µs

now should have feed:@fEr1XSmhnfc5HBOjrMIByuO4InYIspEVuBi8ID1Al24=.ed25519

ok 8 query worked
ok 9 got all the messages
ok 10 sequence:0
ok 11 sequence:1
ok 12 sequence:2

closed sbot

1..12

tests 12

pass 12

ok

JS Sbot process returned
level=debug t=7.105659444s event="network listen loop exited"
level=warn t=7.105696444s unit=gossip event="live qry on rxlog exited"
level=debug t=7.105711745s event="sbot closing" msg="connections closed"
level=debug t=7.105715345s event="sbot closing" msg="waited for indexes to close"
level=info t=7.110856329s event="sbot closing" msg="closers closed"
level=warn t=7.129209029s unit=gossip event="live qry on rxlog exited"
level=debug t=7.12922813s event="network listen loop exited"
level=debug t=7.12925033s event="sbot closing" msg="connections closed"
level=debug t=7.12925603s event="sbot closing" msg="waited for indexes to close"
level=info t=7.13411031s event="sbot closing" msg="closers closed"
2022/12/18 13:42:30 saved identity @5Zqfbx1bbiWfBchFnEX9a4bL3/ni4w7mwgNpzpFzGe8=.ed25519 to testrun/TestFeedFromGoLive/secret
TAP version 13

TestFeedFromGoLive10

sbot spawned, running before

ok 1 follow:@5Zqfbx1bbiWfBchFnEX9a4bL3/ni4w7mwgNpzpFzGe8=.ed25519
ssb-friends: get legacy api used
ok 2 friends.get of new contact
ok 3 is following

shouldnt have bobs feed:@5Zqfbx1bbiWfBchFnEX9a4bL3/ni4w7mwgNpzpFzGe8=.ed25519

ok 4 null
ok 5 should be strictly equal
ok 6 about:%ckTv3CvvFE/w+pnp6WJBvhGV7Je/yZOQfAhB5robRzY=.sha256

dialing:net:127.0.0.1:43609~shs:5Zqfbx1bbiWfBchFnEX9a4bL3/ni4w7mwgNpzpFzGe8=

ok 7 connected
t=1.925022912s plugin=gossip remote=<@KiUu.ed25519> event=gossiprx live=false handleConnect="oops - dont have feed of remote peer. requesting..."
level=debug t=1.968505924s unit=gossip fr=<@5Zqf.ed25519> event=gossiptx n=4 starting=0
level=debug t=1.969101934s plugin=gossip event=gossiprx fr=<@KiUu.ed25519> starting=0 received=2 took=44.056022ms
t=1.969116534s plugin=gossip remote=<@KiUu.ed25519> event=gossiprx live=false msg="done fetching callee"

got message!1

got message!2

got message!3

got message!4

send late msg

got message!5

waited

closed sbot

1..7

tests 7

pass 7

ok

JS Sbot process returned
level=debug t=2.378949843s event="network listen loop exited"
level=warn t=2.378986743s unit=gossip event="live qry on rxlog exited"
level=debug t=2.379009744s event="sbot closing" msg="connections closed"
level=debug t=2.379016444s event="sbot closing" msg="waited for indexes to close"
level=info t=2.384527634s event="sbot closing" msg="closers closed"
2022/12/18 13:42:33 saved identity @+SO47x38dQhF+DazKc1f4vli+v9xAkoLbUtbZHy5KRg=.ed25519 to testrun/TestStartup/secret
level=warn t=19.983727ms unit=gossip event="live qry on rxlog exited"
level=debug t=30.020893384s event="network listen loop exited"
level=debug t=30.020918985s event="sbot closing" msg="connections closed"
level=debug t=30.020978687s event="sbot closing" msg="waited for indexes to close"
level=info t=30.026641635s event="sbot closing" msg="closers closed"
--- FAIL: TestStartup (30.04s)
interop_test.go:123: go-sbot: @+SO47x38dQhF+DazKc1f4vli+v9xAkoLbUtbZHy5KRg=.ed25519
interop_test.go:272: timeout
interop_test.go:123: go-sbot: @+SO47x38dQhF+DazKc1f4vli+v9xAkoLbUtbZHy5KRg=.ed25519
local_fork_test.go:48:
Error Trace: /home/runner/work/go-ssb/go-ssb/tests/local_fork_test.go:48
Error: Not equal:
expected: int(2)
actual : int64(1)
Test: TestStartup
Messages: maggie seqno of log with 3 messages should be 2
local_fork_test.go:52:
Error Trace: /home/runner/work/go-ssb/go-ssb/tests/local_fork_test.go:52
Error: Not equal:
expected: int(3)
actual : int64(1)
Test: TestStartup
Messages: maggie seqno of log with 4 messages should be 3
FAIL
FAIL github.com/ssbc/go-ssb/tests 70.591s
FAIL
make: *** [Makefile:22: test] Error 1

I suspect #308 fixed this. Will have to re-test later.

I finally got a failure out of 684 runs. So it's not fixed yet, but it's really rare.

2023/01/29 04:52:00 [margaret/indexes/badger] overwrote batch limit 0
=== RUN   TestStartup
2023/01/29 04:52:00 saved identity @WkV0DEbbnTTFXxi9Rq4VXjJLfOLoaHSZlgi8tYGLhgg=.ed25519 to testrun/TestStartup/secret
    interop_test.go:123: go-sbot: @WkV0DEbbnTTFXxi9Rq4VXjJLfOLoaHSZlgi8tYGLhgg=.ed25519
level=warn t=47.790705ms unit=gossip event="live qry on rxlog exited"
    interop_test.go:272: timeout
level=debug t=30.050335447s event="sbot closing" msg="connections closed"
level=debug t=30.050420557s event="sbot closing" msg="waited for indexes to close"
level=debug t=30.050294511s event="network listen loop exited"
level=info t=30.0705814s event="sbot closing" msg="closers closed"
    interop_test.go:123: go-sbot: @WkV0DEbbnTTFXxi9Rq4VXjJLfOLoaHSZlgi8tYGLhgg=.ed25519
    local_fork_test.go:53:
                Error Trace:    /home/go-ssb/go-ssb/tests/local_fork_test.go:53
                Error:          Not equal:
                                expected: int(2)
                                actual  : int64(1)
                Test:           TestStartup
                Messages:       maggie seqno of log with 3 messages should be 2
--- FAIL: TestStartup (30.09s)
level=warn t=30.087880018s unit=gossip event="live qry on rxlog exited"
FAIL
FAIL    github.com/ssbc/go-ssb/tests    30.116s
FAIL

Okay, I think I found it. It was doing Publish()es and then immediately calling Seq() directly on the feed log, which is based on the indexes. The feed log has no mechanism to ask the bot's indexing if the indexes are up-to-date, so we had a race condition. Going to let this re-run again for a while with that problem fixed and see if it fixes the test.