grpc/grpc-ios

Bazel ObjC Test Flake Investigation

dennycd opened this issue · 3 comments

We currently had a large ratio of test flakes for bazel based iOS/ObjC test on Kokoro. PR pre-submit appears to have a much higher failure rate than master/post-submit tests. A sample failure looks like this

Test Case '-[InteropTestsLocalCleartext test4MBResponsesAreAccepted]' started.
D0617 14:09:47.856992000 4393952704 ev_posix.cc:171]                   Using polling engine: poll
D0617 14:09:47.857842000 4393952704 lb_policy_registry.cc:49]          registering LB policy factory for "grpclb"
D0617 14:09:47.857928000 4393952704 lb_policy_registry.cc:49]          registering LB policy factory for "priority_experimental"
D0617 14:09:47.857952000 4393952704 lb_policy_registry.cc:49]          registering LB policy factory for "weighted_target_experimental"
D0617 14:09:47.857975000 4393952704 lb_policy_registry.cc:49]          registering LB policy factory for "pick_first"
D0617 14:09:47.857987000 4393952704 lb_policy_registry.cc:49]          registering LB policy factory for "round_robin"
D0617 14:09:47.858009000 4393952704 lb_policy_registry.cc:49]          registering LB policy factory for "ring_hash_experimental"
D0617 14:09:47.863413000 4393952704 dns_resolver.cc:195]               Using native dns resolver
I0617 14:09:47.988950000 123145317609472 http_connect_handshaker.cc:345] Connecting to server localhost:13185 via HTTP proxy ipv6:%5B::1%5D:13185
I0617 14:09:47.989610000 123145317609472 subchannel.cc:911]            subchannel 0x7fed8a09b490 {address=ipv6:%5B::1%5D:13185, args={grpc.client_channel_factory=0x7fed87e058f0, grpc.default_authority=localhost:13185, grpc.http_connect_server=localhost:13185, grpc.inhibit_health_checking=1, grpc.internal.channel_credentials=0x7fed87e02e90, grpc.internal.channelz_channel_node=0x7fed8a070060, grpc.internal.security_connector=0x7fed8a0961d0, grpc.internal.subchannel_pool=0x7fed8a081400, grpc.primary_user_agent=grpc-objc-cfstream/1.48.0-dev, grpc.resource_quota=0x7fed87d19070, grpc.server_uri=dns:///localhost:13185}}: connect failed ({"created":"@1655500187.989235000","description":"Socket closed","file":"src/core/lib/iomgr/endpoint_cfstream.cc","file_line":178,"grpc_status":14,"target_address":"ipv6:%5B::1%5D:13185"}), backing off for 880 ms
I0617 14:09:47.991154000 123145317609472 http_connect_handshaker.cc:345] Connecting to server localhost:13185 via HTTP proxy ipv4:127.0.0.1:13185
I0617 14:09:47.991629000 123145313316864 subchannel.cc:911]            subchannel 0x7fed8a09c320 {address=ipv4:127.0.0.1:13185, args={grpc.client_channel_factory=0x7fed87e058f0, grpc.default_authority=localhost:13185, grpc.http_connect_server=localhost:13185, grpc.inhibit_health_checking=1, grpc.internal.channel_credentials=0x7fed87e02e90, grpc.internal.channelz_channel_node=0x7fed8a070060, grpc.internal.security_connector=0x7fed8a09b870, grpc.internal.subchannel_pool=0x7fed8a081400, grpc.primary_user_agent=grpc-objc-cfstream/1.48.0-dev, grpc.resource_quota=0x7fed87d19070, grpc.server_uri=dns:///localhost:13185}}: connect failed ({"created":"@1655500187.991501000","description":"Socket closed","file":"src/core/lib/iomgr/endpoint_cfstream.cc","file_line":178,"grpc_status":14,"target_address":"ipv4:127.0.0.1:13185"}), backing off for 998 ms
src/objective-c/tests/InteropTests/InteropTests.m:857: error: -[InteropTestsLocalCleartext test4MBResponsesAreAccepted] : ((error) == nil) failed: "Error Domain=io.grpc Code=14 "{"created":"@1655500187.991727000","description":"Failed to pick subchannel","file":"src/core/ext/filters/client_channel/client_channel.cc","file_line":3172,"referenced_errors":[{"created":"@1655500187.991726000","description":"failed to connect to all addresses; last error: UNAVAILABLE: Socket closed","file":"src/core/lib/transport/error_utils.cc","file_line":167,"grpc_status":14}]}" UserInfo={NSDebugDescription={"created":"@1655500187.991727000","description":"Failed to pick subchannel","file":"src/core/ext/filters/client_channel/client_channel.cc","file_line":3172,"referenced_errors":[{"created":"@1655500187.991726000","description":"failed to connect to all addresses; last error: UNAVAILABLE: Socket closed","file":"src/core/lib/transport/error_utils.cc","file_line":167,"grpc_status":14}]}, NSLocalizedDescription=failed to connect to all addresses; last error: UNAVAILABLE: Socket closed, io.grpc.TrailersKey={
}}" - Finished with unexpected error: Error Domain=io.grpc Code=14 "{"created":"@1655500187.991727000","description":"Failed to pick subchannel","file":"src/core/ext/filters/client_channel/client_channel.cc","file_line":3172,"referenced_errors":[{"created":"@1655500187.991726000","description":"failed to connect to all addresses; last error: UNAVAILABLE: Socket closed","file":"src/core/lib/transport/error_utils.cc","file_line":167,"grpc_status":14}]}" UserInfo={NSDebugDescription={"created":"@1655500187.991727000","description":"Failed to pick subchannel","file":"src/core/ext/filters/client_channel/client_channel.cc","file_line":3172,"referenced_errors":[{"created":"@1655500187.991726000","description":"failed to connect to all addresses; last error: UNAVAILABLE: Socket closed","file":"src/core/lib/transport/error_utils.cc","file_line":167,"grpc_status":14}]}, NSLocalizedDescription=failed to connect to all addresses; last error: UNAVAILABLE: Socket closed, io.grpc.TrailersKey={
}}
Test Case '-[InteropTestsLocalCleartext test4MBResponsesAreAccepted]' failed (0.144 seconds).

with the following run context

Name: New-iPhone 11-13.3
OS: iOS 13.3
Type: iPhone 11

with local interop server running at localhost:13185

+ objc_bazel_tests/bazel_wrapper --bazelrc=tools/remote_build/include/test_locally_with_resultstore_results.bazelrc test --google_credentials=/tmpfs/src/gfile/GrpcTesting-d0eeee2db331.json --remote_upload_local_results=true --remote_default_exec_properties=grpc_cache_silo_key1=83d8e488-1ca9-40fd-929e-d37d13529c99 --remote_default_exec_properties=grpc_cache_silo_key2=20211014mojave-MacService --test_env HOST_PORT_LOCAL=localhost:13185 --test_env HOST_PORT_LOCALSSL=localhost:26052 -- //src/objective-c/examples:Sample //src/objective-c/examples:tvOS-sample //src/objective-c/tests:InteropTestsLocal //src/objective-c/tests:InteropTestsRemote //src/objective-c/tests:MacTests //src/objective-c/tests:UnitTests //src/objective-c/tests:objc_codegen_plugin_test //src/objective-c/tests:objc_codegen_plugin_option_test

Full run log attached in this issue for future debugging

kokoro-sisyphus_resultstore_prod_grpc_core_pull_request_macos_grpc_objc_bazel_test_f2343ac5-45b3-4ee4-aaea-915b0e2b6132_build.log


cc @HannahShiSFB @jtattermusch

As pointed out in grpc/grpc#30869 (comment), I believe this "mysterious" flakiness has been resolved by successfully migrating to MacOS monterey test workers (which seems to be fixing the underlying CFStream test issue).

Let's close this issue as resolved.