tests failed for armv7hl and aarch64

Question

tests failed for armv7hl and aarch64

muayyad-alsadi opened this issue 5 years ago · 18 comments

tests/child_processes_test.py::test_all_processes_receive_term_on_exit_if_setsid[1] FAILED [  2%]
tests/child_processes_test.py::test_all_processes_receive_term_on_exit_if_setsid[0] FAILED [  3%]
....
        # read a line from print_signals, figure out its pid
        line = proc.stdout.readline()
        match = re.match(b'ready \(pid: ([0-9]+)\)\n', line)
>       assert match, line
E       AssertionError: b''
E       assert None
tests/child_processes_test.py:95: AssertionError

Answer 1 · 2019-11-13T23:08:33.000Z

guessing: these architectures are often emulated which makes them more sensitive to the race that this test tries to avoid via sleep .1 -- could you try making that sleep larger (or simply retry) and seeing if it fixes the issue?

Answer 2 · 2019-11-13T23:12:31.000Z

I have retried, as you can see I've built it on koji and on copr and it failed on both.

how can I adjust the sleep?

Answer 3 · 2019-11-13T23:13:40.000Z

there's a sleep .1 in those tests, make it a bigger number?

Answer 4 · 2019-11-13T23:15:35.000Z

I'll try that.

Answer 5 · 2019-11-13T23:22:38.000Z

these architectures are often emulated

In Fedora, we build on native hardware.

Answer 6 · 2019-11-13T23:27:06.000Z

Is there any way for us to test in that build environment? It's always really hard to try to debug these issues since they don't reproduce anywhere we can access. Even being able to tweak the code a bit (e.g. to add debugging output) and submit a test run or something would be helpful.

It could be the case that the systems are too slow or overloaded that the tests fail, but it's kind of surprising that they fail with such consistency if that's the case.

Answer 7 · 2019-11-13T23:44:20.000Z

In Fedora, we build on native hardware.

Correction, all the links that start with https://koji are from a native environment.
The links that start with https://copr are from an emulated environment.

Is there any way for us to test in that build environment?

Yes, there is. All Fedora packagers (that includes both Muayyad and me) have access to the build system and can submit builds of the Fedora dumb-init package with customized sources - by adding patches or generating a custom tarball. Upstream maintainers don't have that access (unless they are also Fedora packages), but we can coordinate that (e.g. by an open Pull Request here with the debugging changes).

We also have access to Test Machine Resources For Package Maintainers - that includes some real HW with Fedora on armv7hl.

You can find me as mhroncok on freenode IRC #fedora-python channel, uually being online during Europe/Prague afternoons/evenings for a synchronized communication (I'm done for today though).

BTW I don't know anything about the issue here, got diverged here from a Fedora procedural thing.

Can try to reproduce it on the arm test machine tomorrow.

Answer 8 · 2019-11-13T23:56:06.000Z

I've nerd slipped a bit. This problem does not show if I run tox on current master on a Fedora 30 armv7hl. Neither with Python 3.7 nor 3.8.

Answer 9 · 2019-11-14T00:02:28.000Z

#155 also looks like the same traceback on a similar architecture but from Debian, though it looks like currently the build is working on all Debian architectures (including arm64 which is the same as aarch64 in Fedora). Here's the patch Debian applied: https://sources.debian.org/src/dumb-init/1.2.2-1.1/debian/patches/increase-test-sleep-time.patch/

We also applied the 1 second patch in 0708f27, but it looks like Fedora still has 0.1 seconds in its tests.

Answer 10 · 2019-11-14T08:57:33.000Z

Is there any way for us to test in that build environment?

yes, please create a FAS account.

building is as simple as clicking "pull request" on github

pagure is an opensource github alternative

as you can see, we host a SPEC file (and maybe patches)

https://src.fedoraproject.org/rpms/dumb-init

Fedora does not encourage down-stream patches, if the patch is good for fedora and debian, then it's going to be good for everybody like OpenSuSE.

We also applied the 1 second

if you like I can make it an environment variable that defaults to 0.1

'{python} -m testing.print_signals & sleep {sleep}'.format(
                python=sys.executable, sleep=os.environ.get('TEST_SLEEP', '0.1')
            ),

Answer 11 · 2019-11-14T09:29:43.000Z

this PR will help us adjust/tune sleep without patching

#196

Answer 12 · 2019-11-14T14:46:27.000Z

I don't think the PR is necessary, we already changed it to 1 second, fedora is using an old copy of that test?

Answer 13 · 2019-11-14T14:50:40.000Z

I prefer to ship latest stable released version, instead of random commits.
dynamic is better than hard-coded
why waste 1 second per test, if most platforms would work with 0.1, I can make the SPEC file pass proper value based on architecture
what if MIPS or whatever added that needs 5 seconds ..etc.

Answer 14 · 2019-11-14T14:55:14.000Z

you're not ok taking an already committed patch but instead you're ok taking another uncommitted patch?
I disagree, complexity in tests is 👎
it's the test suite, not a performance critical path
if python can't start in 1 second something is probably very very very wrong

Answer 15 · 2019-11-14T15:17:21.000Z

you're not ok taking an already committed patch but instead you're ok taking another uncommitted patch?

I'm ok shipping a merged upstream patch, it's just a suggestion.

I disagree, complexity in tests is -1

reading env with sane default is part of 12 factor, and it's an old unix practice since 1970
it's a well-established way for such use case.

it's the test suite, not a performance critical path

dumb-init have 178 test, so it's 178 seconds instead of 17 seconds, dumb-init develops when doing hotfixes will hate you!

if python can't start in 1 second something is probably very very very wrong

why wait 1 second if 0.1 works.

Answer 16 · 2019-11-14T16:09:24.000Z

I'm not sure why you're getting defensive about this and I'm not comfortable continuing this conversation :( this is a question of 2 tests not the entire suite.

Answer 17 · 2019-11-14T16:09:48.000Z

Mention me directly if you need help with the environments ;)

Answer 18 · 2019-11-20T12:36:47.000Z

it's now fixed after applying the mentioned patch

https://src.fedoraproject.org/rpms/dumb-init/tree/master
https://koji.fedoraproject.org/koji/buildinfo?buildID=1414770

@asottile I'm not familiar with what each test do or how many times it's called. As I said before I'm ok shipping an upstream patch. the conversation that just happened should happen so that people benefit from it being documented in the ticket.