Gitlab runner/patrol hangs and does not release the pipeline. TestRunner could be the culprit, not sure 100% need more ideas advice for debugging

Steps to reproduce

I can reproduce it in my own particular setup but I don't know so far precisely what triggers it. I have a week since I'm digging for the answer. What I can tell so far:

I have a rather unusual setup. I decided for performance reasons to run all tests in one giant patrol test. By doing so I can drop from 30-40 mins of runtime down to 7. It was a very important change for my project. So in essence all tests run in one very long lived instance. We made special effort to ensure each test is 100% self cleaning and does not disturb the other ones.
I'm running patrol inside of waydroid which runs in wayland which runs in debian on a digital ocean VPS. Adb is still involved in this setup.
This problem seems to manifest when a certain number of tests is added in the master test. As soon as I start commenting some of these tests then the problem goes away. I'm still trying to pin point exactly which tests contribute to this behaviour. But the process is slow since my pipeline takes around 15 mins for the entire run. So far I noticed that almost any test that I remove will disable the hanging behaviour. I can't trim it down further without losing the pipeline hanging behaviour. I will try shuffling the order to see if that changes anything.
If I try to reproduce this behaviour locally it does not happen. both patrol develop and patrol test work just fine.
I tried splitting the tests in smaller test files. Same behavior, the pipeline remains hanging.
When the pipeline is frozen, the VPS does seem to have sufficient spare RAM. I don't think it's an OOM issue. I have 16GB and 8 cores
If I comment out almost any chunck of the tests cases, the behavior is gone the pipeline completes succesfully and is released. So it seems as if it is a problem of number of tests, not which tests.

Extra clarifications

I run both FE and BE + real db all at once to get maximum coverage. So far I'm really happy with the setup.
The tests are runnig on VPS without GPU. don't think it matters since I can see via VNC that flutter renders just fine. (though a bit slower than on my mac M1)
I restart waydroid on every single pipeline to ensure the emulator is not tainted somehow
I rebuild the BE and proto interfaces to make sure there's no problem there as well
I restart the apis on each pipeline
I often recreate my kafka topics to ensure nothing is blocked there either
I tried upgrading all libs, flutter, patrol, including SDK to latest. No change. Still hanging. I upgraded from SDK 31 to 34.
I tried compiling a custom version of the patrol_cli with extra logs. I could not spot anything unusual in the logs. It appears that the issue manifests lower in the callstack, possible even lower than patrol. I see in logcat a class named TestRunner as the last active code before the pipeline freezes
- I tried searching one of the messages from logcat that are generated by TestRunner: 14:12:05.445 I run started: 1 tests. I searched run started in both patrol and flutter codebases. Can't find it at all.
I have only one "master_test.dart" file and it looks like this:

void main() {
  patrolTest(
    'Master Test',
    config: _extendedAwaitCfg(30),
    ($) async {
      await dotenv.load(fileName: ".env");
      await resetCacheAndDb();
      await $.pumpWidgetAndSettle(MyApp(), timeout: Duration(minutes: 1)); // Done only once
      disableOnScreenKeyboard();
      await _ensureBackendIsReady($); // make sure the BE is ready so we don't have a failed run because API is not ready.

      // About
      // Thse used to be separate test files. But the reload time was insanely long, that's why I moved them all in one giant test. 
      await goToAbout($);
      await testAboutAnnouncementsCard($);
      await testAboutAnnouncementsCardList($);
      await testAboutAnnouncementsCardDetail($);
      await testAboutContributorCard($);
      await testAboutEventCard($);
      // Over 250 of them of all sorts of complexity. All in one giant test.

Actual results

I start a gitlab pipeline. Tests are completed. They all go green. I even get a green html report.
Final screen remains painted on the waydroid simulator (I see via wayvnc). If I tap it then it goes black and says "Tests finished"
No error is generated at all
Gitlab runner remains hanging without releasing the pipeline. I can kill it manually or it is discarded automatically by the gitlab 1h timeout

Logs

Note the following:

On a healthy run you get all the logs bellow:
When the hanging behavior occurs, logcat stops precisely at: 04-25 22:30:57.680 9616 9693 I TestRunner: finished: runDartTest[master_test Master Test](app.myapp.myapp.MainActivityTest)
I tried searching for thist TestRunner, including in flutter repo. Can't find it. This is where I'm stuck. I need ideas/advice. What next?

Logs

04-25 22:30:55.037  9616  9616 I AndroidJUnitRunner: onCreate Bundle[{testTimeoutSeconds=31536000, clearPackageData=true, shellExecKey=91676a2470ccd9bc, listTestsForOrchestrator=true, orchestratorService=OrchestratorService, additionalTestOutputDir=/sdcard/Android/media/app.myapp.myapp/additional_test_output}]
04-25 22:30:55.055  9616  9693 I TestRequestBuilder: Scanning classpath to find tests in paths [/data/app/~~D6UZZ3S7yXkCeU0l9dLe9A==/app.myapp.myapp.test-dJL_FUYb-WVFUl0RUUajdg==/base.apk]
04-25 22:30:55.065  9616  9693 I PatrolServer: PatrolJUnitRunner.setUp(): activityClass = app.myapp.myapp.MainActivity
04-25 22:30:55.066   243   640 I ActivityTaskManager: START u0 {act=android.intent.action.MAIN flg=0x10000000 cmp=app.myapp.myapp/.MainActivity} from uid 10159
04-25 22:30:55.119  9616  9616 D LifecycleMonitor: Lifecycle status change: app.myapp.myapp.MainActivity@84b6d27 in: PRE_ON_CREATE
04-25 22:30:55.941  9616  9616 D LifecycleMonitor: Lifecycle status change: app.myapp.myapp.MainActivity@84b6d27 in: CREATED
04-25 22:30:55.945  9616  9616 D LifecycleMonitor: Lifecycle status change: app.myapp.myapp.MainActivity@84b6d27 in: STARTED
04-25 22:30:55.947  9616  9616 D LifecycleMonitor: Lifecycle status change: app.myapp.myapp.MainActivity@84b6d27 in: RESUMED
04-25 22:30:57.680  9616  9693 I TestRunner: started: runDartTest[master_test Master Test](app.myapp.myapp.MainActivityTest)
04-25 22:30:57.680  9616  9693 I TestRunner: finished: runDartTest[master_test Master Test](app.myapp.myapp.MainActivityTest)
04-25 22:30:57.687   243   258 I ActivityManager: Force stopping app.myapp.myapp appid=10159 user=0: finished inst
04-25 22:30:57.688   243   258 I ActivityManager: Killing 9616:app.myapp.myapp/u0a159 (adj 0): stop app.myapp.myapp due to finished inst
04-25 22:30:57.690   243   258 W ActivityTaskManager: Force removing ActivityRecord{456b4d u0 app.myapp.myapp/.MainActivity t390 f}}: app died, no saved state
04-25 22:30:58.157   243   258 I ActivityManager: Force stopping app.myapp.myapp appid=10159 user=0: clear data
04-25 22:30:58.158   243   270 I ActivityManager: Force stopping app.myapp.myapp appid=10159 user=-1: clearApplicationUserData
04-25 22:30:58.192  1696  4443 I MediaProvider: Begin Intent { act=android.intent.action.PACKAGE_DATA_CLEARED dat=package:app.myapp.myapp flg=0x1000010 cmp=com.android.providers.media.module/com.android.providers.media.MediaService (has extras) }
04-25 22:30:58.193  1696  4443 I MediaProvider: End Intent { act=android.intent.action.PACKAGE_DATA_CLEARED dat=package:app.myapp.myapp flg=0x1000010 cmp=com.android.providers.media.module/com.android.providers.media.MediaService (has extras) }
04-25 22:30:58.201  2034  2233 I Icing   : doRemovePackageData app.myapp.myapp
04-25 22:30:58.212   243   640 I ActivityManager: Force stopping app.myapp.myapp.test appid=10158 user=0: clear data
04-25 22:30:58.213   243   270 I ActivityManager: Force stopping app.myapp.myapp.test appid=10158 user=-1: clearApplicationUserData
04-25 22:30:58.230  1696  4443 I MediaProvider: Begin Intent { act=android.intent.action.PACKAGE_DATA_CLEARED dat=package:app.myapp.myapp.test flg=0x1000010 cmp=com.android.providers.media.module/com.android.providers.media.MediaService (has extras) }
04-25 22:30:58.230  1696  4443 I MediaProvider: End Intent { act=android.intent.action.PACKAGE_DATA_CLEARED dat=package:app.myapp.myapp.test flg=0x1000010 cmp=com.android.providers.media.module/com.android.providers.media.MediaService (has extras) }
04-25 22:30:58.241  2034  2233 I Icing   : doRemovePackageData app.myapp.myapp.test
04-25 22:30:58.462   243   650 I ActivityManager: Force stopping app.myapp.myapp appid=10159 user=0: start instr

Patrol version

patrol: 3.6.1
patrol_cli v2.7.0

Patrol Doctor output

Patrol doctor:
Patrol CLI version: 2.7.0
Flutter command: flutter 
  Flutter 3.19.3 • channel stable
Android: 
• Program adb found in /usr/local/android-sdk/platform-tools/adb
• Env var $ANDROID_HOME set to /usr/local/android-sdk

Flutter Doctor output

Doctor summary (to see all details, run flutter doctor -v):
[✓] Flutter (Channel stable, 3.19.3, on Debian GNU/Linux 12 (bookworm) 6.1.0-20-amd64, locale C.UTF-8)
[✓] Android toolchain - develop for Android devices (Android SDK version 34.0.0-rc3)
[✓] Chrome - develop for the web
[✓] Linux toolchain - develop for Linux desktop
[!] Android Studio (not installed)
[✓] Connected device (3 available)
[✓] Network resources

So far I managed to find the source code that generated the logcat messages labeled as TestRunner. It's in AndroidJUnitRunner I'm studying further to see what could be the issue.

I did more digging and I found out:

There's a TestRunner in patrol but no longer used - I have to search more until I find TestRunner
- Searched in: patrol, flutter, test_api, android-test - Not there
- Found it in AndroidJunitRunner.java / LogRunListener.java from android-test lib - This is the androidx lib. RunListener is instantiated in 2 places addListenersLegacyOrder() and addListenersNewOrder(). I assume we use the latter. Even so the code seems similar.
Too deep in the stack - After digging more I concluded the problem is way too deep in the stack: patrol-cli / patrol / PatrolJunitRunner / AndroidJunitRunner / JunitRunner. My current belief is that junitRunner was not made to run tests that have 5K lines of code. Probably there's a OOM error or smth. Even if found, patching it wouldn't be trivial at all.
Duplicated About 10x - Still hangs - Apparently it does not matter which block of code we disable. It will always release the hanging behavior. Following this observation I tried to dupe About tests 10x. Apparently that is enough to trigger the hanging behavior. That strongly hints to a OOM issue or something akin. Meaning it does not matter what we are testing, but the volume of test code per test.
Split in one file/one main per api, Not working, Still hangs - Tried splitting the monolit test in 12 smaller monolits, same thing, still hangs.
Change order of apis, Still hangs - Same issue no change.
Split the gitlab pipeline in 2 jobs, fails again - Ideally it should reuse the cache such that we don't build twice. Managed to do so. Still remains hanging despite green tests.
Reset waydroid on a stuck pipeline - exits with process 1 - We tried to trigger a waydroid reset after the pipeline remains hanging. The problem is that the process exits with code 1 for error. So all builds will be red. This is a no go.
Split pipeline in 2 jobs + Reset waydroid before 2nd job - works - Resetting waydroid for both pipeline jobs did the trick. The problem is that we just added around 5 mins of runtime in the pipeline. It's not a sustainable solution. Maybe if we split on two runners and run in parallel.

The waydroid reset indicates that the problem involves also waydroid. Something gets stored in waydroid memory/storage and only by doing a simulator reset we can get rid of that issue. At the moment, this is beyond my technical understanding of the patrol stack to be able to track the issue in a reasonable amount of time. I'll leave it open, maybe somebody has some insights to share.

It would be a great issue to fix since waydroid provides a cheap potentially free solution for testing flutter apps. All these services that do testing in cloud are quite expensive as soon a project needs testing in large volume. We run our test on every single commit/push to be able to spot bugs early. And our tests take quite some minutes to run. Therefore on the long run the invoice would become prohibitive. That's why I insisted so hard on making waydroid work. I'll share the recipe to install waydroid on a debian host one of these days.

Does this occur only on Waydroid, and never on plain old Android emulator?

My current belief is that junitRunner was not made to run tests that have 5K lines of code. Probably there's a OOM error or smth. Even if found, patching it wouldn't be trivial at all.

I doubt it's the case. I assume that by "5k lines of code" you mean your Dart test. Even if lines of code mattered, then from "JUnitRunner's point of view", the test is literally a few lines of code:

patrol/packages/patrol/example/android/app/src/androidTest/java/pl/leancode/patrol/example/MainActivityTest.java

Lines 26 to 30 in da4fe10

    
           @Test 
        
           public void runDartTest() { 
        
               PatrolJUnitRunner instrumentation = (PatrolJUnitRunner) InstrumentationRegistry.getInstrumentation(); 
        
               instrumentation.runDartTest(dartTestName); 
        
           }

And if it was OOM, you'd see one in logcat.

I'd bet that it's some nasty race condition that only appears when test(s) take a long time - either in Patrol (more likely), or in android-test's TestOrchestrator.

Would be really hard to check also Android emulator in this setup. I originally tried to use android emu on the VPS but it's slow as a snail without a GPU and extremely unstable. It has trouble finishing a full set of tests. Fails randomly. I used waydroid because it's a lot less demanding on the resources side + it seems a lot more stable compared to android emu. Still not stellar performance but it's miles ahead in comparison to android emu. So I can't compare the two of them.

As for race condition, it's hard to say, between what and what. At the moment I have zero idea what could be the cause. I'd need to be able to run PatrolJUnitRunner and AndroidJUnitRunner in debug mode and become a lot more familiar with android runner overall. I just scratched the surface there.

If you want rock-solid and fast Android emulators, then emulator.wtf may useful for you (assuming you don't need to run some shell scripts, start a backend locally etc.). I have only good experiences with that service.

EDIT I saw "Adb is still involved in this setup" in OP. Well, too bad.

Maybe try using some more beefed-up VPS?

That was my beef with using any of these services. The amount of testing we are running is already quite large and it's expected to grow further. Yes, I could be looking at some other VPS provider that also provisions a GPU. That would improve the speed. And yes, I'm running a few scripts and a backend all together to get maximum coverage. So I'm strongly inclined to keep the testing inhouse.

	@Test
	public void runDartTest() {
	PatrolJUnitRunner instrumentation = (PatrolJUnitRunner) InstrumentationRegistry.getInstrumentation();
	instrumentation.runDartTest(dartTestName);
	}