FTL issues with uniform sharding and high test count
Opened this issue · 1 comments
Describe the bug
We use flank to run our Android FTL tests. And we use uniform sharding (num-uniform-shards setting) instead of smart sharding because we need access to a valid shardIndex to support shard-specific users, and smart sharding sets the shardIndex to 0 for all shards.
Our test runs just recently started failing on FTL, and a quick summary of the reason that they fail is that we seem to have hit a limit (roughly 970) of the number of tests that we can include in a run. The reason for that is that flank produces a command like this for each shard:
adb shell am instrument -w -r \
-e numShards 40 \
-e shardIndex 2 \
-e class com.onepeloton.callisto.instrumentationTest.flows.ActivityFeedMobileTests#test_C13590563_feed_workout_cards_class_thumbnail_JWO,class com.onepeloton.callisto.instrumentationTest.flows.CircuitsFiltersMobileTests#test_C18008311_gym_filter_length...(970 test targets)"
In other words, each shard gets passed all 970 tests, instead of just the 25 or so tests that that shard needs to run. And that super-long list causes an error in FTL:
/sdcard/run_command_7539975812237112782.sh[2]: /system/bin/app_process: Argument list too long
So, I was hoping that you could either (1) fix this problem by not passing all tests to each shard, (2) come up with a workaround, or (3) report a valid shardIndex when smart-sharding is used so that I don't have to use uniform sharding.
To Reproduce
Steps to reproduce the behavior:
-
Create 970 tests (see the android_shards.json file below that flank produced for us).
matrix-1rmnzi34f1nkc_android_shards.json -
Run on FTL with uniform sharding. I don't think that the shard count even matters, but we used 40.
Expected behavior
The test run completes.
Observed behavior
The test run never gets off the ground. All shards report
/sdcard/run_command_7539975812237112782.sh[2]: /system/bin/app_process: Argument list too long
Details (please complete the following information):
Have you tested on the latest Flank snapshot?
Would it make a difference?
Post the output of
flank --version
.
We are using flank v23.10.1
Additional context
Add any other context about the problem here.
We are able to work around this problem by excluding some test modules to reduce the total test count. But once the test count hits a certain number (~970), we get the "Argument list too long" failure message.