Sporadic test errors since upgrading to Native 0.5
Opened this issue · 10 comments
Most recently https://github.com/typelevel/cats/actions/runs/10556403923/job/29242594012
[error] Error: Total 12066, Failed 0, Errors 12, Passed 12054
[error] Error during tests:
[error] cats.tests.MapSuite
[error] cats.tests.ReducibleSuiteAdditional
[error] cats.tests.BoundedEnumerableSuite
[error] cats.tests.TraverseListSuiteUnderlying
[error] cats.tests.NonEmptyAlternativeSuite
[error] cats.tests.FunctorSuite
[error] cats.tests.SemigroupKSuite
[error] cats.tests.FunctionKLiftSuite
[error] cats.tests.KleisliSuite
[error] cats.tests.TupleSuite
[error] cats.tests.IorSuite
[error] cats.tests.WriterSuite
But this has been haunting us since we upgraded.
Alright. What if we divide tests into two groups for Native build?
@danicheg Unfortunately the problem is likely a bug with MUnit or Scala Native itself. MUnit was hastily upgraded to multithreading.
I wonder – are there some other projects that suffer from the similar issue after upgrading to ScalaNative v5.x?
I only see it for Cats for now, but I may not be aware of all the projects around.
It seems that when some tests fail, they do not fail because of any particular error – they just do not start:
2024-11-14T09:18:52.7986145Z [error] Error: Total 13429, Failed 0, Errors 8, Passed 13421
2024-11-14T09:18:52.7987715Z [error] Error during tests:
2024-11-14T09:18:52.7988920Z [error] cats.tests.FoldableOneAndSuite
2024-11-14T09:18:52.7990196Z [error] cats.tests.FoldableListSuite
2024-11-14T09:18:52.7991508Z [error] cats.tests.ReducibleNonEmptyListSuite
2024-11-14T09:18:52.7992926Z [error] cats.tests.FunctionKLiftCrossBuildSuite
2024-11-14T09:18:52.7994481Z [error] cats.tests.AlgebraInvariantSuite
2024-11-14T09:18:52.7995762Z [error] cats.tests.BifoldableSuite
2024-11-14T09:18:52.7996962Z [error] cats.tests.PartialOrderSuite
2024-11-14T09:18:52.7998191Z [error] cats.tests.MonadErrorSuite
I see these errors in one of the last runs, but I cannot find any clue on what caused those failures. Those are just failed 🤷
I'm not sure about how munit failed tests are reported, but when using JUnit in Scala Native project itself we got 2 categories:
- Failed tests - basically failed assertions
- Erronous tests - tests during execution of which a fatal error occoured, eg. segmenation fault
Based on the log above I think it touches the later category of fatal errors. It means we might need to investigate on the munit - scala native boundary
Here's another recent one. https://github.com/typelevel/cats/actions/runs/12486161618/job/34845910566#step:14:16127
[error] Error: Total 13010, Failed 0, Errors 19, Passed 12991
[error] Error during tests:
[error] cats.tests.EvalSuite
[error] cats.tests.RepresentableStoreTSuite
[error] cats.tests.TraverseFilterListSuite
[error] cats.tests.DeprecatedEitherSuite
[error] cats.tests.PartialOrderSuite
[error] cats.tests.TraverseListSuiteUnderlying
[error] cats.tests.FoldableVectorSuite
[error] cats.tests.FoldableLazyListSuite
[error] cats.tests.PartialFunctionSuite
[error] cats.tests.ShowSuite2
[error] cats.tests.AndThenSuite
[error] cats.tests.CokleisliSuite
[error] cats.tests.ParallelSuite
[error] cats.tests.ApplicativeErrorSuite
[error] cats.tests.DeprecatedNonEmptyListSuite
[error] cats.tests.ArraySeqSuite
[error] cats.tests.MonadErrorSuite
[error] cats.tests.EitherKSuite
[error] cats.tests.TraverseSuiteAdditional
It seems that when some tests fail, they do not fail because of any particular error – they just do not start:
Yes, I believe what is happening is that there are a number of test runners executing in parallel. Each runner is assigned some subset of the total test suites. It seems like if a test runner encounters a fatal error (eg segfault) in one of its suites, it dies, causing all other suites assigned to that runner and not yet completed to also be considered "errored".
@armanbilge , I wonder can those SEGFAULT errors be somehow related to tests that check for stack safety issues?
I mean, there are a plenty of tests in Cats that run quite memory-intense calculations just in order to make sure that there are no stack overflow errors. So I'm wondering – could such tests be actually the culprits?
If those tests were failing, and they were stackoverflowing, then that could manifest as a segfault. But if the tests are using constant stack space (as they should) then it shouldn't be an issue. It's worth checking out :)
That might be possible. Currently SN lacks proper handling for StackOverflowExceptions and OutOfMemoryError. The first one should be fixable by introducing canaries / signal handlers to recover from stack overflow. Similary we could try to handle OOM errors.
I'll try to work on the prototype this weekend