Non-deterministic test failures
Closed this issue · 5 comments
On main, running cabal test
I've seen a couple different failures.
Here are the results of 3 consecutive cabal test
s. A fourth run passed all tests.
Running 1 test suites...
Test suite test: RUNNING...
Tests: grpc-mqtt
<...>
Test.Service.ClientStream
Test.Service.ClientStream.Unbatched: FAIL (1.52s)
test/Test/Service.hs:431:
expected: Just (OneInt {oneIntResult = 15})
but got: Just (OneInt {oneIntResult = 13})
Batched: SKIP
Test.Service.ServerStream
Test.Service.ServerStream.Unbatched: SKIP
Test.Service.ServerStream.Batched: SKIP
Test.Service.BiDiStream
Test.Service.BiDiStream.Unbatched: SKIP
Batched: SKIP
Test.Service.Errors
Test.Service.Errors.Timeout: SKIP
Test.Service.Errors.Missing: SKIP
Test.Service.Errors.Malform: SKIP
9 out of 39 tests failed (13.95s)
Test suite test: FAIL
Running 1 test suites...
Test suite test: RUNNING...
Tests: grpc-mqtt
<...>
Test.Service
Normal
LongBytes: FAIL
Exception: MQTTException "disconnected"
Call: SKIP
Test.Service.ClientStream
Test.Service.ClientStream.Unbatched: SKIP
Batched: SKIP
Test.Service.ServerStream
Test.Service.ServerStream.Unbatched: SKIP
Test.Service.ServerStream.Batched: SKIP
Test.Service.BiDiStream
Test.Service.BiDiStream.Unbatched: SKIP
Batched: SKIP
Test.Service.Errors
Test.Service.Errors.Timeout: SKIP
Test.Service.Errors.Missing: SKIP
Test.Service.Errors.Malform: SKIP
11 out of 39 tests failed (20.43s)
Test suite test: FAIL
Running 1 test suites...
Test suite test: RUNNING...
Tests: grpc-mqtt
<..>
Test.Service
Normal
LongBytes: malloc_consolidate(): unaligned fastbin chunk detected
Test suite test: FAIL
@riz0id is this related (or at least the last failure) to the bug you mentioned the other day that is being masked by having compression turned on?
It looks like the failures started with a9a7c5f.
Running 1 test suites... Test suite test: RUNNING... Tests: grpc-mqtt <...> Test.Service Normal LongBytes: FAIL Exception: MQTTException "disconnected" Call: SKIP Test.Service.ClientStream Test.Service.ClientStream.Unbatched: SKIP Batched: SKIP Test.Service.ServerStream Test.Service.ServerStream.Unbatched: SKIP Test.Service.ServerStream.Batched: SKIP Test.Service.BiDiStream Test.Service.BiDiStream.Unbatched: SKIP Batched: SKIP Test.Service.Errors Test.Service.Errors.Timeout: SKIP Test.Service.Errors.Missing: SKIP Test.Service.Errors.Malform: SKIP 11 out of 39 tests failed (20.43s) Test suite test: FAIL
Currently the "LongBytes" test makes a total of 16 calls each receiving a 8mb response:
_ <- Async.async do
withMQTTGRPCClient logger clientConfig \client ->
Async.replicateConcurrently 8 do
uuid <- UUID.nextRandom
let msg = Message.OneInt 8
let rqt = GRPC.MQTT.MQTTNormalRequest msg 8 (GRPC.Client.MetadataMap (Map.fromList [("rqt-uuid", [UUID.toASCIIBytes uuid])]))
testServicecallLongBytes (testServiceMqttClient client baseTopic) rqt
withMQTTGRPCClient logger clientConfig \client ->
Async.replicateConcurrently 8 do
-- For uniquely identifying requests to the server.
uuid <- UUID.nextRandom
-- NB: 2022-08-02 we discovered a bug with concurrent client
-- requests that send responses which, when sent back by the
-- server trigger a GRPCIOTimeout error in some of the clients.
let msg = Message.OneInt 8
let rqt = GRPC.MQTT.MQTTNormalRequest msg 8 (GRPC.Client.MetadataMap (Map.fromList [("rqt-uuid", [UUID.toASCIIBytes uuid])]))
testServicecallLongBytes (testServiceMqttClient client baseTopic) rqt
MQTTDisconnected
is from a timeout result being eaten. The first of these two MQTTGRPCClient
s (the one that is ignored) timing out which is causing the second call to be disconnected. Anyway, the test needs to be re-written, the ignored result is an artifact from late night debugging. The correct test code would just be:
withMQTTGRPCClient logger clientConfig \client ->
Async.replicateConcurrently 8 do
-- For uniquely identifying requests to the server.
uuid <- UUID.nextRandom
-- NB: 2022-08-02 we discovered a bug with concurrent client
-- requests that send responses which, when sent back by the
-- server trigger a GRPCIOTimeout error in some of the clients.
let msg = Message.OneInt 8
let rqt = GRPC.MQTT.MQTTNormalRequest msg 8 (GRPC.Client.MetadataMap (Map.fromList [("rqt-uuid", [UUID.toASCIIBytes uuid])]))
testServicecallLongBytes (testServiceMqttClient client baseTopic) rqt
I can fix this, I'm still looking at the other issues.
@Friede80 @ixmatus I'm unable to replicate the two remaining issues after fixing the test in #41. I think its very likely that in the case the second "LongBytes" call finished before the first call (made with Async.async
) was left running and resulted in some undefined weirdness. If this is true, it would explain why test results are non-deterministic, since the specific way in which the stray async caused issues in the test would be a race condition between the first and second "LongBytes" client calls.